diff options
Diffstat (limited to 'Documentation')
423 files changed, 19583 insertions, 3676 deletions
diff --git a/Documentation/ABI/obsolete/sysfs-gpio b/Documentation/ABI/obsolete/sysfs-gpio index b8b0fd341c17..da1345d854b4 100644 --- a/Documentation/ABI/obsolete/sysfs-gpio +++ b/Documentation/ABI/obsolete/sysfs-gpio @@ -28,5 +28,5 @@ Description: /label ... (r/o) descriptive, not necessarily unique /ngpio ... (r/o) number of GPIOs; numbered N to N + (ngpio - 1) - This ABI is deprecated and will be removed after 2020. It is - replaced with the GPIO character device. + This ABI is obsoleted by Documentation/ABI/testing/gpio-cdev and will be + removed after 2020. diff --git a/Documentation/ABI/testing/debugfs-intel-iommu b/Documentation/ABI/testing/debugfs-intel-iommu new file mode 100644 index 000000000000..2ab8464504a9 --- /dev/null +++ b/Documentation/ABI/testing/debugfs-intel-iommu @@ -0,0 +1,276 @@ +What: /sys/kernel/debug/iommu/intel/iommu_regset +Date: December 2023 +Contact: Jingqi Liu <Jingqi.liu@intel.com> +Description: + This file dumps all the register contents for each IOMMU device. + + Example in Kabylake: + + :: + + $ sudo cat /sys/kernel/debug/iommu/intel/iommu_regset + + IOMMU: dmar0 Register Base Address: 26be37000 + + Name Offset Contents + VER 0x00 0x0000000000000010 + GCMD 0x18 0x0000000000000000 + GSTS 0x1c 0x00000000c7000000 + FSTS 0x34 0x0000000000000000 + FECTL 0x38 0x0000000000000000 + + [...] + + IOMMU: dmar1 Register Base Address: fed90000 + + Name Offset Contents + VER 0x00 0x0000000000000010 + GCMD 0x18 0x0000000000000000 + GSTS 0x1c 0x00000000c7000000 + FSTS 0x34 0x0000000000000000 + FECTL 0x38 0x0000000000000000 + + [...] + + IOMMU: dmar2 Register Base Address: fed91000 + + Name Offset Contents + VER 0x00 0x0000000000000010 + GCMD 0x18 0x0000000000000000 + GSTS 0x1c 0x00000000c7000000 + FSTS 0x34 0x0000000000000000 + FECTL 0x38 0x0000000000000000 + + [...] + +What: /sys/kernel/debug/iommu/intel/ir_translation_struct +Date: December 2023 +Contact: Jingqi Liu <Jingqi.liu@intel.com> +Description: + This file dumps the table entries for Interrupt + remapping and Interrupt posting. + + Example in Kabylake: + + :: + + $ sudo cat /sys/kernel/debug/iommu/intel/ir_translation_struct + + Remapped Interrupt supported on IOMMU: dmar0 + IR table address:100900000 + + Entry SrcID DstID Vct IRTE_high IRTE_low + 0 00:0a.0 00000080 24 0000000000040050 000000800024000d + 1 00:0a.0 00000001 ef 0000000000040050 0000000100ef000d + + Remapped Interrupt supported on IOMMU: dmar1 + IR table address:100300000 + Entry SrcID DstID Vct IRTE_high IRTE_low + 0 00:02.0 00000002 26 0000000000040010 000000020026000d + + [...] + + **** + + Posted Interrupt supported on IOMMU: dmar0 + IR table address:100900000 + Entry SrcID PDA_high PDA_low Vct IRTE_high IRTE_low + +What: /sys/kernel/debug/iommu/intel/dmar_translation_struct +Date: December 2023 +Contact: Jingqi Liu <Jingqi.liu@intel.com> +Description: + This file dumps Intel IOMMU DMA remapping tables, such + as root table, context table, PASID directory and PASID + table entries in debugfs. For legacy mode, it doesn't + support PASID, and hence PASID field is defaulted to + '-1' and other PASID related fields are invalid. + + Example in Kabylake: + + :: + + $ sudo cat /sys/kernel/debug/iommu/intel/dmar_translation_struct + + IOMMU dmar1: Root Table Address: 0x103027000 + B.D.F Root_entry + 00:02.0 0x0000000000000000:0x000000010303e001 + + Context_entry + 0x0000000000000102:0x000000010303f005 + + PASID PASID_table_entry + -1 0x0000000000000000:0x0000000000000000:0x0000000000000000 + + IOMMU dmar0: Root Table Address: 0x103028000 + B.D.F Root_entry + 00:0a.0 0x0000000000000000:0x00000001038a7001 + + Context_entry + 0x0000000000000000:0x0000000103220e7d + + PASID PASID_table_entry + 0 0x0000000000000000:0x0000000000800002:0x00000001038a5089 + + [...] + +What: /sys/kernel/debug/iommu/intel/invalidation_queue +Date: December 2023 +Contact: Jingqi Liu <Jingqi.liu@intel.com> +Description: + This file exports invalidation queue internals of each + IOMMU device. + + Example in Kabylake: + + :: + + $ sudo cat /sys/kernel/debug/iommu/intel/invalidation_queue + + Invalidation queue on IOMMU: dmar0 + Base: 0x10022e000 Head: 20 Tail: 20 + Index qw0 qw1 qw2 + 0 0000000000000014 0000000000000000 0000000000000000 + 1 0000000200000025 0000000100059c04 0000000000000000 + 2 0000000000000014 0000000000000000 0000000000000000 + + qw3 status + 0000000000000000 0000000000000000 + 0000000000000000 0000000000000000 + 0000000000000000 0000000000000000 + + [...] + + Invalidation queue on IOMMU: dmar1 + Base: 0x10026e000 Head: 32 Tail: 32 + Index qw0 qw1 status + 0 0000000000000004 0000000000000000 0000000000000000 + 1 0000000200000025 0000000100059804 0000000000000000 + 2 0000000000000011 0000000000000000 0000000000000000 + + [...] + +What: /sys/kernel/debug/iommu/intel/dmar_perf_latency +Date: December 2023 +Contact: Jingqi Liu <Jingqi.liu@intel.com> +Description: + This file is used to control and show counts of + execution time ranges for various types per DMAR. + + Firstly, write a value to + /sys/kernel/debug/iommu/intel/dmar_perf_latency + to enable sampling. + + The possible values are as follows: + + * 0 - disable sampling all latency data + + * 1 - enable sampling IOTLB invalidation latency data + + * 2 - enable sampling devTLB invalidation latency data + + * 3 - enable sampling intr entry cache invalidation latency data + + Next, read /sys/kernel/debug/iommu/intel/dmar_perf_latency gives + a snapshot of sampling result of all enabled monitors. + + Examples in Kabylake: + + :: + + 1) Disable sampling all latency data: + + $ sudo echo 0 > /sys/kernel/debug/iommu/intel/dmar_perf_latency + + 2) Enable sampling IOTLB invalidation latency data + + $ sudo echo 1 > /sys/kernel/debug/iommu/intel/dmar_perf_latency + + $ sudo cat /sys/kernel/debug/iommu/intel/dmar_perf_latency + + IOMMU: dmar0 Register Base Address: 26be37000 + <0.1us 0.1us-1us 1us-10us 10us-100us 100us-1ms + inv_iotlb 0 0 0 0 0 + + 1ms-10ms >=10ms min(us) max(us) average(us) + inv_iotlb 0 0 0 0 0 + + [...] + + IOMMU: dmar2 Register Base Address: fed91000 + <0.1us 0.1us-1us 1us-10us 10us-100us 100us-1ms + inv_iotlb 0 0 18 0 0 + + 1ms-10ms >=10ms min(us) max(us) average(us) + inv_iotlb 0 0 2 2 2 + + 3) Enable sampling devTLB invalidation latency data + + $ sudo echo 2 > /sys/kernel/debug/iommu/intel/dmar_perf_latency + + $ sudo cat /sys/kernel/debug/iommu/intel/dmar_perf_latency + + IOMMU: dmar0 Register Base Address: 26be37000 + <0.1us 0.1us-1us 1us-10us 10us-100us 100us-1ms + inv_devtlb 0 0 0 0 0 + + >=10ms min(us) max(us) average(us) + inv_devtlb 0 0 0 0 + + [...] + +What: /sys/kernel/debug/iommu/intel/<bdf>/domain_translation_struct +Date: December 2023 +Contact: Jingqi Liu <Jingqi.liu@intel.com> +Description: + This file dumps a specified page table of Intel IOMMU + in legacy mode or scalable mode. + + For a device that only supports legacy mode, dump its + page table by the debugfs file in the debugfs device + directory. e.g. + /sys/kernel/debug/iommu/intel/0000:00:02.0/domain_translation_struct. + + For a device that supports scalable mode, dump the + page table of specified pasid by the debugfs file in + the debugfs pasid directory. e.g. + /sys/kernel/debug/iommu/intel/0000:00:02.0/1/domain_translation_struct. + + Examples in Kabylake: + + :: + + 1) Dump the page table of device "0000:00:02.0" that only supports legacy mode. + + $ sudo cat /sys/kernel/debug/iommu/intel/0000:00:02.0/domain_translation_struct + + Device 0000:00:02.0 @0x1017f8000 + IOVA_PFN PML5E PML4E + 0x000000008d800 | 0x0000000000000000 0x00000001017f9003 + 0x000000008d801 | 0x0000000000000000 0x00000001017f9003 + 0x000000008d802 | 0x0000000000000000 0x00000001017f9003 + + PDPE PDE PTE + 0x00000001017fa003 0x00000001017fb003 0x000000008d800003 + 0x00000001017fa003 0x00000001017fb003 0x000000008d801003 + 0x00000001017fa003 0x00000001017fb003 0x000000008d802003 + + [...] + + 2) Dump the page table of device "0000:00:0a.0" with PASID "1" that + supports scalable mode. + + $ sudo cat /sys/kernel/debug/iommu/intel/0000:00:0a.0/1/domain_translation_struct + + Device 0000:00:0a.0 with pasid 1 @0x10c112000 + IOVA_PFN PML5E PML4E + 0x0000000000000 | 0x0000000000000000 0x000000010df93003 + 0x0000000000001 | 0x0000000000000000 0x000000010df93003 + 0x0000000000002 | 0x0000000000000000 0x000000010df93003 + + PDPE PDE PTE + 0x0000000106ae6003 0x0000000104b38003 0x0000000147c00803 + 0x0000000106ae6003 0x0000000104b38003 0x0000000147c01803 + 0x0000000106ae6003 0x0000000104b38003 0x0000000147c02803 + + [...] diff --git a/Documentation/ABI/testing/gpio-cdev b/Documentation/ABI/testing/gpio-cdev index 66bdcd188b6c..c9689b2a6fed 100644 --- a/Documentation/ABI/testing/gpio-cdev +++ b/Documentation/ABI/testing/gpio-cdev @@ -6,8 +6,9 @@ Description: The character device files /dev/gpiochip* are the interface between GPIO chips and userspace. - The ioctl(2)-based ABI is defined and documented in - [include/uapi]<linux/gpio.h>. + The ioctl(2)-based ABI is defined in + [include/uapi]<linux/gpio.h> and documented in + Documentation/userspace-api/gpio/chardev.rst. The following file operations are supported: @@ -17,8 +18,8 @@ Description: ioctl(2) Initiate various actions. - See the inline documentation in [include/uapi]<linux/gpio.h> - for descriptions of all ioctls. + See Documentation/userspace-api/gpio/chardev.rst + for a description of all ioctls. close(2) Stops and free up the I/O contexts that was associated diff --git a/Documentation/ABI/testing/sysfs-bus-vdpa b/Documentation/ABI/testing/sysfs-bus-vdpa index 4da53878bff6..2c833b5163f2 100644 --- a/Documentation/ABI/testing/sysfs-bus-vdpa +++ b/Documentation/ABI/testing/sysfs-bus-vdpa @@ -1,6 +1,6 @@ What: /sys/bus/vdpa/drivers_autoprobe Date: March 2020 -Contact: virtualization@lists.linux-foundation.org +Contact: virtualization@lists.linux.dev Description: This file determines whether new devices are immediately bound to a driver after the creation. It initially contains 1, which @@ -12,7 +12,7 @@ Description: What: /sys/bus/vdpa/driver_probe Date: March 2020 -Contact: virtualization@lists.linux-foundation.org +Contact: virtualization@lists.linux.dev Description: Writing a device name to this file will cause the kernel binds devices to a compatible driver. @@ -22,7 +22,7 @@ Description: What: /sys/bus/vdpa/drivers/.../bind Date: March 2020 -Contact: virtualization@lists.linux-foundation.org +Contact: virtualization@lists.linux.dev Description: Writing a device name to this file will cause the driver to attempt to bind to the device. This is useful for overriding @@ -30,7 +30,7 @@ Description: What: /sys/bus/vdpa/drivers/.../unbind Date: March 2020 -Contact: virtualization@lists.linux-foundation.org +Contact: virtualization@lists.linux.dev Description: Writing a device name to this file will cause the driver to attempt to unbind from the device. This may be useful when @@ -38,7 +38,7 @@ Description: What: /sys/bus/vdpa/devices/.../driver_override Date: November 2021 -Contact: virtualization@lists.linux-foundation.org +Contact: virtualization@lists.linux.dev Description: This file allows the driver for a device to be specified. When specified, only a driver with a name matching the value diff --git a/Documentation/ABI/testing/sysfs-class-hwmon b/Documentation/ABI/testing/sysfs-class-hwmon index 3dac923c9b0e..cfd0d0bab483 100644 --- a/Documentation/ABI/testing/sysfs-class-hwmon +++ b/Documentation/ABI/testing/sysfs-class-hwmon @@ -149,6 +149,15 @@ Description: RW +What: /sys/class/hwmon/hwmonX/inY_fault +Description: + Reports a voltage hard failure (eg: shorted component) + + - 1: Failed + - 0: Ok + + RO + What: /sys/class/hwmon/hwmonX/cpuY_vid Description: CPU core reference voltage. @@ -968,6 +977,15 @@ Description: RW +What: /sys/class/hwmon/hwmonX/humidityY_max_alarm +Description: + Maximum humidity detection + + - 0: OK + - 1: Maximum humidity detected + + RO + What: /sys/class/hwmon/hwmonX/humidityY_max_hyst Description: Humidity hysteresis value for max limit. @@ -987,6 +1005,15 @@ Description: RW +What: /sys/class/hwmon/hwmonX/humidityY_min_alarm +Description: + Minimum humidity detection + + - 0: OK + - 1: Minimum humidity detected + + RO + What: /sys/class/hwmon/hwmonX/humidityY_min_hyst Description: Humidity hysteresis value for min limit. diff --git a/Documentation/ABI/testing/sysfs-class-net-queues b/Documentation/ABI/testing/sysfs-class-net-queues index 906ff3ca928a..84aa25e0d14d 100644 --- a/Documentation/ABI/testing/sysfs-class-net-queues +++ b/Documentation/ABI/testing/sysfs-class-net-queues @@ -1,4 +1,4 @@ -What: /sys/class/<iface>/queues/rx-<queue>/rps_cpus +What: /sys/class/net/<iface>/queues/rx-<queue>/rps_cpus Date: March 2010 KernelVersion: 2.6.35 Contact: netdev@vger.kernel.org @@ -8,7 +8,7 @@ Description: network device queue. Possible values depend on the number of available CPU(s) in the system. -What: /sys/class/<iface>/queues/rx-<queue>/rps_flow_cnt +What: /sys/class/net/<iface>/queues/rx-<queue>/rps_flow_cnt Date: April 2010 KernelVersion: 2.6.35 Contact: netdev@vger.kernel.org @@ -16,7 +16,7 @@ Description: Number of Receive Packet Steering flows being currently processed by this particular network device receive queue. -What: /sys/class/<iface>/queues/tx-<queue>/tx_timeout +What: /sys/class/net/<iface>/queues/tx-<queue>/tx_timeout Date: November 2011 KernelVersion: 3.3 Contact: netdev@vger.kernel.org @@ -24,7 +24,7 @@ Description: Indicates the number of transmit timeout events seen by this network interface transmit queue. -What: /sys/class/<iface>/queues/tx-<queue>/tx_maxrate +What: /sys/class/net/<iface>/queues/tx-<queue>/tx_maxrate Date: March 2015 KernelVersion: 4.1 Contact: netdev@vger.kernel.org @@ -32,7 +32,7 @@ Description: A Mbps max-rate set for the queue, a value of zero means disabled, default is disabled. -What: /sys/class/<iface>/queues/tx-<queue>/xps_cpus +What: /sys/class/net/<iface>/queues/tx-<queue>/xps_cpus Date: November 2010 KernelVersion: 2.6.38 Contact: netdev@vger.kernel.org @@ -42,7 +42,7 @@ Description: network device transmit queue. Possible values depend on the number of available CPU(s) in the system. -What: /sys/class/<iface>/queues/tx-<queue>/xps_rxqs +What: /sys/class/net/<iface>/queues/tx-<queue>/xps_rxqs Date: June 2018 KernelVersion: 4.18.0 Contact: netdev@vger.kernel.org @@ -53,7 +53,7 @@ Description: number of available receive queue(s) in the network device. Default is disabled. -What: /sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/hold_time +What: /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/hold_time Date: November 2011 KernelVersion: 3.3 Contact: netdev@vger.kernel.org @@ -62,7 +62,7 @@ Description: of this particular network device transmit queue. Default value is 1000. -What: /sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/inflight +What: /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/inflight Date: November 2011 KernelVersion: 3.3 Contact: netdev@vger.kernel.org @@ -70,7 +70,7 @@ Description: Indicates the number of bytes (objects) in flight on this network device transmit queue. -What: /sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/limit +What: /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/limit Date: November 2011 KernelVersion: 3.3 Contact: netdev@vger.kernel.org @@ -79,7 +79,7 @@ Description: on this network device transmit queue. This value is clamped to be within the bounds defined by limit_max and limit_min. -What: /sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/limit_max +What: /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/limit_max Date: November 2011 KernelVersion: 3.3 Contact: netdev@vger.kernel.org @@ -88,7 +88,7 @@ Description: queued on this network device transmit queue. See include/linux/dynamic_queue_limits.h for the default value. -What: /sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/limit_min +What: /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/limit_min Date: November 2011 KernelVersion: 3.3 Contact: netdev@vger.kernel.org @@ -96,3 +96,26 @@ Description: Indicates the absolute minimum limit of bytes allowed to be queued on this network device transmit queue. Default value is 0. + +What: /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/stall_thrs +Date: Jan 2024 +KernelVersion: 6.9 +Contact: netdev@vger.kernel.org +Description: + Tx completion stall detection threshold in ms. Kernel will + guarantee to detect all stalls longer than this threshold but + may also detect stalls longer than half of the threshold. + +What: /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/stall_cnt +Date: Jan 2024 +KernelVersion: 6.9 +Contact: netdev@vger.kernel.org +Description: + Number of detected Tx completion stalls. + +What: /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/stall_max +Date: Jan 2024 +KernelVersion: 6.9 +Contact: netdev@vger.kernel.org +Description: + Longest detected Tx completion stall. Write 0 to clear. diff --git a/Documentation/ABI/testing/sysfs-class-net-statistics b/Documentation/ABI/testing/sysfs-class-net-statistics index 55db27815361..53e508c6936a 100644 --- a/Documentation/ABI/testing/sysfs-class-net-statistics +++ b/Documentation/ABI/testing/sysfs-class-net-statistics @@ -1,4 +1,4 @@ -What: /sys/class/<iface>/statistics/collisions +What: /sys/class/net/<iface>/statistics/collisions Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -6,7 +6,7 @@ Description: Indicates the number of collisions seen by this network device. This value might not be relevant with all MAC layers. -What: /sys/class/<iface>/statistics/multicast +What: /sys/class/net/<iface>/statistics/multicast Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -14,7 +14,7 @@ Description: Indicates the number of multicast packets received by this network device. -What: /sys/class/<iface>/statistics/rx_bytes +What: /sys/class/net/<iface>/statistics/rx_bytes Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -23,7 +23,7 @@ Description: See the network driver for the exact meaning of when this value is incremented. -What: /sys/class/<iface>/statistics/rx_compressed +What: /sys/class/net/<iface>/statistics/rx_compressed Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -32,7 +32,7 @@ Description: network device. This value might only be relevant for interfaces that support packet compression (e.g: PPP). -What: /sys/class/<iface>/statistics/rx_crc_errors +What: /sys/class/net/<iface>/statistics/rx_crc_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -41,7 +41,7 @@ Description: by this network device. Note that the specific meaning might depend on the MAC layer used by the interface. -What: /sys/class/<iface>/statistics/rx_dropped +What: /sys/class/net/<iface>/statistics/rx_dropped Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -51,7 +51,7 @@ Description: packet processing. See the network driver for the exact meaning of this value. -What: /sys/class/<iface>/statistics/rx_errors +What: /sys/class/net/<iface>/statistics/rx_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -59,7 +59,7 @@ Description: Indicates the number of receive errors on this network device. See the network driver for the exact meaning of this value. -What: /sys/class/<iface>/statistics/rx_fifo_errors +What: /sys/class/net/<iface>/statistics/rx_fifo_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -68,7 +68,7 @@ Description: network device. See the network driver for the exact meaning of this value. -What: /sys/class/<iface>/statistics/rx_frame_errors +What: /sys/class/net/<iface>/statistics/rx_frame_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -78,7 +78,7 @@ Description: on the MAC layer protocol used. See the network driver for the exact meaning of this value. -What: /sys/class/<iface>/statistics/rx_length_errors +What: /sys/class/net/<iface>/statistics/rx_length_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -87,7 +87,7 @@ Description: error, oversized or undersized. See the network driver for the exact meaning of this value. -What: /sys/class/<iface>/statistics/rx_missed_errors +What: /sys/class/net/<iface>/statistics/rx_missed_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -96,7 +96,7 @@ Description: due to lack of capacity in the receive side. See the network driver for the exact meaning of this value. -What: /sys/class/<iface>/statistics/rx_nohandler +What: /sys/class/net/<iface>/statistics/rx_nohandler Date: February 2016 KernelVersion: 4.6 Contact: netdev@vger.kernel.org @@ -104,7 +104,7 @@ Description: Indicates the number of received packets that were dropped on an inactive device by the network core. -What: /sys/class/<iface>/statistics/rx_over_errors +What: /sys/class/net/<iface>/statistics/rx_over_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -114,7 +114,7 @@ Description: (e.g: larger than MTU). See the network driver for the exact meaning of this value. -What: /sys/class/<iface>/statistics/rx_packets +What: /sys/class/net/<iface>/statistics/rx_packets Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -122,7 +122,7 @@ Description: Indicates the total number of good packets received by this network device. -What: /sys/class/<iface>/statistics/tx_aborted_errors +What: /sys/class/net/<iface>/statistics/tx_aborted_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -132,7 +132,7 @@ Description: a medium collision). See the network driver for the exact meaning of this value. -What: /sys/class/<iface>/statistics/tx_bytes +What: /sys/class/net/<iface>/statistics/tx_bytes Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -143,7 +143,7 @@ Description: transmitted packets or all packets that have been queued for transmission. -What: /sys/class/<iface>/statistics/tx_carrier_errors +What: /sys/class/net/<iface>/statistics/tx_carrier_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -152,7 +152,7 @@ Description: because of carrier errors (e.g: physical link down). See the network driver for the exact meaning of this value. -What: /sys/class/<iface>/statistics/tx_compressed +What: /sys/class/net/<iface>/statistics/tx_compressed Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -161,7 +161,7 @@ Description: this might only be relevant for devices that support compression (e.g: PPP). -What: /sys/class/<iface>/statistics/tx_dropped +What: /sys/class/net/<iface>/statistics/tx_dropped Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -170,7 +170,7 @@ Description: See the driver for the exact reasons as to why the packets were dropped. -What: /sys/class/<iface>/statistics/tx_errors +What: /sys/class/net/<iface>/statistics/tx_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -179,7 +179,7 @@ Description: a network device. See the driver for the exact reasons as to why the packets were dropped. -What: /sys/class/<iface>/statistics/tx_fifo_errors +What: /sys/class/net/<iface>/statistics/tx_fifo_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -188,7 +188,7 @@ Description: FIFO error. See the driver for the exact reasons as to why the packets were dropped. -What: /sys/class/<iface>/statistics/tx_heartbeat_errors +What: /sys/class/net/<iface>/statistics/tx_heartbeat_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -197,7 +197,7 @@ Description: reported as heartbeat errors. See the driver for the exact reasons as to why the packets were dropped. -What: /sys/class/<iface>/statistics/tx_packets +What: /sys/class/net/<iface>/statistics/tx_packets Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -206,7 +206,7 @@ Description: device. See the driver for whether this reports the number of all attempted or successful transmissions. -What: /sys/class/<iface>/statistics/tx_window_errors +What: /sys/class/net/<iface>/statistics/tx_window_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index a1db6db47505..710d47be11e0 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -516,6 +516,7 @@ What: /sys/devices/system/cpu/vulnerabilities /sys/devices/system/cpu/vulnerabilities/mds /sys/devices/system/cpu/vulnerabilities/meltdown /sys/devices/system/cpu/vulnerabilities/mmio_stale_data + /sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling /sys/devices/system/cpu/vulnerabilities/retbleed /sys/devices/system/cpu/vulnerabilities/spec_store_bypass /sys/devices/system/cpu/vulnerabilities/spectre_v1 diff --git a/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon index 8d7d8f05f6cd..92fe7c5c5ac1 100644 --- a/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon +++ b/Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon @@ -1,4 +1,4 @@ -What: /sys/devices/.../hwmon/hwmon<i>/in0_input +What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/in0_input Date: February 2023 KernelVersion: 6.2 Contact: intel-gfx@lists.freedesktop.org @@ -6,7 +6,7 @@ Description: RO. Current Voltage in millivolt. Only supported for particular Intel i915 graphics platforms. -What: /sys/devices/.../hwmon/hwmon<i>/power1_max +What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_max Date: February 2023 KernelVersion: 6.2 Contact: intel-gfx@lists.freedesktop.org @@ -20,7 +20,7 @@ Description: RW. Card reactive sustained (PL1/Tau) power limit in microwatts. Only supported for particular Intel i915 graphics platforms. -What: /sys/devices/.../hwmon/hwmon<i>/power1_rated_max +What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_rated_max Date: February 2023 KernelVersion: 6.2 Contact: intel-gfx@lists.freedesktop.org @@ -28,7 +28,7 @@ Description: RO. Card default power limit (default TDP setting). Only supported for particular Intel i915 graphics platforms. -What: /sys/devices/.../hwmon/hwmon<i>/power1_max_interval +What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_max_interval Date: February 2023 KernelVersion: 6.2 Contact: intel-gfx@lists.freedesktop.org @@ -37,7 +37,7 @@ Description: RW. Sustained power limit interval (Tau in PL1/Tau) in Only supported for particular Intel i915 graphics platforms. -What: /sys/devices/.../hwmon/hwmon<i>/power1_crit +What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_crit Date: February 2023 KernelVersion: 6.2 Contact: intel-gfx@lists.freedesktop.org @@ -50,7 +50,7 @@ Description: RW. Card reactive critical (I1) power limit in microwatts. Only supported for particular Intel i915 graphics platforms. -What: /sys/devices/.../hwmon/hwmon<i>/curr1_crit +What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/curr1_crit Date: February 2023 KernelVersion: 6.2 Contact: intel-gfx@lists.freedesktop.org @@ -63,7 +63,7 @@ Description: RW. Card reactive critical (I1) power limit in milliamperes. Only supported for particular Intel i915 graphics platforms. -What: /sys/devices/.../hwmon/hwmon<i>/energy1_input +What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/energy1_input Date: February 2023 KernelVersion: 6.2 Contact: intel-gfx@lists.freedesktop.org diff --git a/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon b/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon index 8c321bc9dc04..023fd82de3f7 100644 --- a/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon +++ b/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon @@ -1,4 +1,4 @@ -What: /sys/devices/.../hwmon/hwmon<i>/power1_max +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_max Date: September 2023 KernelVersion: 6.5 Contact: intel-xe@lists.freedesktop.org @@ -12,7 +12,7 @@ Description: RW. Card reactive sustained (PL1) power limit in microwatts. Only supported for particular Intel xe graphics platforms. -What: /sys/devices/.../hwmon/hwmon<i>/power1_rated_max +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_rated_max Date: September 2023 KernelVersion: 6.5 Contact: intel-xe@lists.freedesktop.org @@ -20,7 +20,7 @@ Description: RO. Card default power limit (default TDP setting). Only supported for particular Intel xe graphics platforms. -What: /sys/devices/.../hwmon/hwmon<i>/power1_crit +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_crit Date: September 2023 KernelVersion: 6.5 Contact: intel-xe@lists.freedesktop.org @@ -33,7 +33,7 @@ Description: RW. Card reactive critical (I1) power limit in microwatts. Only supported for particular Intel xe graphics platforms. -What: /sys/devices/.../hwmon/hwmon<i>/curr1_crit +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/curr1_crit Date: September 2023 KernelVersion: 6.5 Contact: intel-xe@lists.freedesktop.org @@ -44,7 +44,7 @@ Description: RW. Card reactive critical (I1) power limit in milliamperes. the operating frequency if the power averaged over a window exceeds this limit. -What: /sys/devices/.../hwmon/hwmon<i>/in0_input +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/in0_input Date: September 2023 KernelVersion: 6.5 Contact: intel-xe@lists.freedesktop.org @@ -52,7 +52,7 @@ Description: RO. Current Voltage in millivolt. Only supported for particular Intel xe graphics platforms. -What: /sys/devices/.../hwmon/hwmon<i>/energy1_input +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/energy1_input Date: September 2023 KernelVersion: 6.5 Contact: intel-xe@lists.freedesktop.org @@ -60,7 +60,7 @@ Description: RO. Energy input of device in microjoules. Only supported for particular Intel xe graphics platforms. -What: /sys/devices/.../hwmon/hwmon<i>/power1_max_interval +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_max_interval Date: October 2023 KernelVersion: 6.6 Contact: intel-xe@lists.freedesktop.org diff --git a/Documentation/ABI/testing/sysfs-nvmem-cells b/Documentation/ABI/testing/sysfs-nvmem-cells index 7af70adf3690..c7c9444f92a8 100644 --- a/Documentation/ABI/testing/sysfs-nvmem-cells +++ b/Documentation/ABI/testing/sysfs-nvmem-cells @@ -4,18 +4,18 @@ KernelVersion: 6.5 Contact: Miquel Raynal <miquel.raynal@bootlin.com> Description: The "cells" folder contains one file per cell exposed by the - NVMEM device. The name of the file is: <name>@<where>, with - <name> being the cell name and <where> its location in the NVMEM - device, in hexadecimal (without the '0x' prefix, to mimic device - tree node names). The length of the file is the size of the cell - (when known). The content of the file is the binary content of - the cell (may sometimes be ASCII, likely without trailing - character). + NVMEM device. The name of the file is: "<name>@<byte>,<bit>", + with <name> being the cell name and <where> its location in + the NVMEM device, in hexadecimal bytes and bits (without the + '0x' prefix, to mimic device tree node names). The length of + the file is the size of the cell (when known). The content of + the file is the binary content of the cell (may sometimes be + ASCII, likely without trailing character). Note: This file is only present if CONFIG_NVMEM_SYSFS is enabled. Example:: - hexdump -C /sys/bus/nvmem/devices/1-00563/cells/product-name@d + hexdump -C /sys/bus/nvmem/devices/1-00563/cells/product-name@d,0 00000000 54 4e 34 38 4d 2d 50 2d 44 4e |TN48M-P-DN| 0000000a diff --git a/Documentation/ABI/testing/sysfs-platform-silicom b/Documentation/ABI/testing/sysfs-platform-silicom index 2288b3665d16..4d1cc5bdbcc5 100644 --- a/Documentation/ABI/testing/sysfs-platform-silicom +++ b/Documentation/ABI/testing/sysfs-platform-silicom @@ -10,6 +10,7 @@ What: /sys/devices/platform/silicom-platform/power_cycle Date: November 2023 KernelVersion: 6.7 Contact: Henry Shi <henrys@silicom-usa.com> +Description: This file allow user to power cycle the platform. Default value is 0; when set to 1, it powers down the platform, waits 5 seconds, then powers on the diff --git a/Documentation/Makefile b/Documentation/Makefile index 3885bbe260eb..b68f8c816897 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -111,7 +111,9 @@ $(YNL_INDEX): $(YNL_RST_FILES) $(YNL_RST_DIR)/%.rst: $(YNL_YAML_DIR)/%.yaml $(YNL_TOOL) $(Q)$(YNL_TOOL) -i $< -o $@ -htmldocs: $(YNL_INDEX) +htmldocs texinfodocs latexdocs epubdocs xmldocs: $(YNL_INDEX) + +htmldocs: @$(srctree)/scripts/sphinx-pre-install --version-check @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var))) @@ -176,6 +178,7 @@ refcheckdocs: $(Q)cd $(srctree);scripts/documentation-file-ref-check cleandocs: + $(Q)rm -f $(YNL_INDEX) $(YNL_RST_FILES) $(Q)rm -rf $(BUILDDIR) $(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/userspace-api/media clean diff --git a/Documentation/RCU/checklist.rst b/Documentation/RCU/checklist.rst index 2d42998a89a6..3e6407de231c 100644 --- a/Documentation/RCU/checklist.rst +++ b/Documentation/RCU/checklist.rst @@ -68,7 +68,8 @@ over a rather long period of time, but improvements are always welcome! rcu_read_lock_sched(), or by the appropriate update-side lock. Explicit disabling of preemption (preempt_disable(), for example) can serve as rcu_read_lock_sched(), but is less readable and - prevents lockdep from detecting locking issues. + prevents lockdep from detecting locking issues. Acquiring a + spinlock also enters an RCU read-side critical section. Please note that you *cannot* rely on code known to be built only in non-preemptible kernels. Such code can and will break, @@ -382,16 +383,17 @@ over a rather long period of time, but improvements are always welcome! must use whatever locking or other synchronization is required to safely access and/or modify that data structure. - Do not assume that RCU callbacks will be executed on the same - CPU that executed the corresponding call_rcu() or call_srcu(). - For example, if a given CPU goes offline while having an RCU - callback pending, then that RCU callback will execute on some - surviving CPU. (If this was not the case, a self-spawning RCU - callback would prevent the victim CPU from ever going offline.) - Furthermore, CPUs designated by rcu_nocbs= might well *always* - have their RCU callbacks executed on some other CPUs, in fact, - for some real-time workloads, this is the whole point of using - the rcu_nocbs= kernel boot parameter. + Do not assume that RCU callbacks will be executed on + the same CPU that executed the corresponding call_rcu(), + call_srcu(), call_rcu_tasks(), call_rcu_tasks_rude(), or + call_rcu_tasks_trace(). For example, if a given CPU goes offline + while having an RCU callback pending, then that RCU callback + will execute on some surviving CPU. (If this was not the case, + a self-spawning RCU callback would prevent the victim CPU from + ever going offline.) Furthermore, CPUs designated by rcu_nocbs= + might well *always* have their RCU callbacks executed on some + other CPUs, in fact, for some real-time workloads, this is the + whole point of using the rcu_nocbs= kernel boot parameter. In addition, do not assume that callbacks queued in a given order will be invoked in that order, even if they all are queued on the @@ -444,7 +446,7 @@ over a rather long period of time, but improvements are always welcome! real-time workloads than is synchronize_rcu_expedited(). It is also permissible to sleep in RCU Tasks Trace read-side - critical, which are delimited by rcu_read_lock_trace() and + critical section, which are delimited by rcu_read_lock_trace() and rcu_read_unlock_trace(). However, this is a specialized flavor of RCU, and you should not use it without first checking with its current users. In most cases, you should instead use SRCU. @@ -490,6 +492,12 @@ over a rather long period of time, but improvements are always welcome! since the last time that you passed that same object to call_rcu() (or friends). + CONFIG_RCU_STRICT_GRACE_PERIOD: + combine with KASAN to check for pointers leaked out + of RCU read-side critical sections. This Kconfig + option is tough on both performance and scalability, + and so is limited to four-CPU systems. + __rcu sparse checks: tag the pointer to the RCU-protected data structure with __rcu, and sparse will warn you if you access that diff --git a/Documentation/RCU/rcu_dereference.rst b/Documentation/RCU/rcu_dereference.rst index 659d5913784d..2524dcdadde2 100644 --- a/Documentation/RCU/rcu_dereference.rst +++ b/Documentation/RCU/rcu_dereference.rst @@ -408,7 +408,10 @@ member of the rcu_dereference() to use in various situations: RCU flavors, an RCU read-side critical section is entered using rcu_read_lock(), anything that disables bottom halves, anything that disables interrupts, or anything that disables - preemption. + preemption. Please note that spinlock critical sections + are also implied RCU read-side critical sections, even when + they are preemptible, as they are in kernels built with + CONFIG_PREEMPT_RT=y. 2. If the access might be within an RCU read-side critical section on the one hand, or protected by (say) my_lock on the other, diff --git a/Documentation/RCU/torture.rst b/Documentation/RCU/torture.rst index 49e7beea6ae1..4b1f99c4181f 100644 --- a/Documentation/RCU/torture.rst +++ b/Documentation/RCU/torture.rst @@ -318,7 +318,7 @@ Suppose that a previous kvm.sh run left its output in this directory:: tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28 -Then this run can be re-run without rebuilding as follow: +Then this run can be re-run without rebuilding as follow:: kvm-again.sh tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28 diff --git a/Documentation/RCU/whatisRCU.rst b/Documentation/RCU/whatisRCU.rst index 60ce02475142..872ac665223f 100644 --- a/Documentation/RCU/whatisRCU.rst +++ b/Documentation/RCU/whatisRCU.rst @@ -172,14 +172,25 @@ rcu_read_lock() critical section. Reference counts may be used in conjunction with RCU to maintain longer-term references to data structures. + Note that anything that disables bottom halves, preemption, + or interrupts also enters an RCU read-side critical section. + Acquiring a spinlock also enters an RCU read-side critical + sections, even for spinlocks that do not disable preemption, + as is the case in kernels built with CONFIG_PREEMPT_RT=y. + Sleeplocks do *not* enter RCU read-side critical sections. + rcu_read_unlock() ^^^^^^^^^^^^^^^^^ void rcu_read_unlock(void); This temporal primitives is used by a reader to inform the reclaimer that the reader is exiting an RCU read-side critical - section. Note that RCU read-side critical sections may be nested - and/or overlapping. + section. Anything that enables bottom halves, preemption, + or interrupts also exits an RCU read-side critical section. + Releasing a spinlock also exits an RCU read-side critical section. + + Note that RCU read-side critical sections may be nested and/or + overlapping. synchronize_rcu() ^^^^^^^^^^^^^^^^^ @@ -952,8 +963,8 @@ unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be initialized after each and every call to kmem_cache_alloc(), which renders reference-free spinlock acquisition completely unsafe. Therefore, when using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter. -(Those willing to use a kmem_cache constructor may also use locking, -including cache-friendly sequence locking.) +(Those willing to initialize their locks in a kmem_cache constructor +may also use locking, including cache-friendly sequence locking.) With traditional reference counting -- such as that implemented by the kref library in Linux -- there is typically code that runs when the last diff --git a/Documentation/accel/introduction.rst b/Documentation/accel/introduction.rst index 89984dfececf..ae3030136637 100644 --- a/Documentation/accel/introduction.rst +++ b/Documentation/accel/introduction.rst @@ -101,8 +101,8 @@ External References email threads ------------- -* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022) -* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022) +* `Initial discussion on the New subsystem for acceleration devices <https://lore.kernel.org/lkml/CAFCwf11=9qpNAepL7NL+YAV_QO=Wv6pnWPhKHKAepK3fNn+2Dg@mail.gmail.com/>`_ - Oded Gabbay (2022) +* `patch-set to add the new subsystem <https://lore.kernel.org/lkml/20221022214622.18042-1-ogabbay@kernel.org/>`_ - Oded Gabbay (2022) Conference talks ---------------- diff --git a/Documentation/admin-guide/RAS/address-translation.rst b/Documentation/admin-guide/RAS/address-translation.rst new file mode 100644 index 000000000000..f0ca17b43cd3 --- /dev/null +++ b/Documentation/admin-guide/RAS/address-translation.rst @@ -0,0 +1,24 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Address translation +=================== + +x86 AMD +------- + +Zen-based AMD systems include a Data Fabric that manages the layout of +physical memory. Devices attached to the Fabric, like memory controllers, +I/O, etc., may not have a complete view of the system physical memory map. +These devices may provide a "normalized", i.e. device physical, address +when reporting memory errors. Normalized addresses must be translated to +a system physical address for the kernel to action on the memory. + +AMD Address Translation Library (CONFIG_AMD_ATL) provides translation for +this case. + +Glossary of acronyms used in address translation for Zen-based systems + +* CCM = Cache Coherent Moderator +* COD = Cluster-on-Die +* COH_ST = Coherent Station +* DF = Data Fabric diff --git a/Documentation/RAS/ras.rst b/Documentation/admin-guide/RAS/error-decoding.rst index 2556b397cd27..26a72f3fe5de 100644 --- a/Documentation/RAS/ras.rst +++ b/Documentation/admin-guide/RAS/error-decoding.rst @@ -1,15 +1,10 @@ .. SPDX-License-Identifier: GPL-2.0 -Reliability, Availability and Serviceability features -===================================================== - -This documents different aspects of the RAS functionality present in the -kernel. - Error decoding ---------------- +============== -* x86 +x86 +--- Error decoding on AMD systems should be done using the rasdaemon tool: https://github.com/mchehab/rasdaemon/ diff --git a/Documentation/admin-guide/RAS/index.rst b/Documentation/admin-guide/RAS/index.rst new file mode 100644 index 000000000000..f4087040a7c0 --- /dev/null +++ b/Documentation/admin-guide/RAS/index.rst @@ -0,0 +1,7 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. toctree:: + :maxdepth: 2 + + main + error-decoding + address-translation diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/RAS/main.rst index 8e03751d126d..7ac1d4ccc509 100644 --- a/Documentation/admin-guide/ras.rst +++ b/Documentation/admin-guide/RAS/main.rst @@ -1,8 +1,12 @@ +.. SPDX-License-Identifier: GPL-2.0 .. include:: <isonum.txt> -============================================ -Reliability, Availability and Serviceability -============================================ +================================================== +Reliability, Availability and Serviceability (RAS) +================================================== + +This documents different aspects of the RAS functionality present in the +kernel. RAS concepts ************ diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index 9a969c0157f1..f2bebff6a733 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -262,9 +262,11 @@ Compiling the kernel - Make sure you have at least gcc 5.1 available. For more information, refer to :ref:`Documentation/process/changes.rst <changes>`. - - Do a ``make`` to create a compressed kernel image. It is also - possible to do ``make install`` if you have lilo installed to suit the - kernel makefiles, but you may want to check your particular lilo setup first. + - Do a ``make`` to create a compressed kernel image. It is also possible to do + ``make install`` if you have lilo installed or if your distribution has an + install script recognised by the kernel's installer. Most popular + distributions will have a recognized install script. You may want to + check your distribution's setup first. To do the actual install, you have to be root, but none of the normal build should require that. Don't take the name of root in vain. @@ -301,32 +303,51 @@ Compiling the kernel image (e.g. .../linux/arch/x86/boot/bzImage after compilation) to the place where your regular bootable kernel is found. - - Booting a kernel directly from a floppy without the assistance of a - bootloader such as LILO, is no longer supported. - - If you boot Linux from the hard drive, chances are you use LILO, which - uses the kernel image as specified in the file /etc/lilo.conf. The - kernel image file is usually /vmlinuz, /boot/vmlinuz, /bzImage or - /boot/bzImage. To use the new kernel, save a copy of the old image - and copy the new image over the old one. Then, you MUST RERUN LILO - to update the loading map! If you don't, you won't be able to boot - the new kernel image. - - Reinstalling LILO is usually a matter of running /sbin/lilo. - You may wish to edit /etc/lilo.conf to specify an entry for your - old kernel image (say, /vmlinux.old) in case the new one does not - work. See the LILO docs for more information. - - After reinstalling LILO, you should be all set. Shutdown the system, + - Booting a kernel directly from a storage device without the assistance + of a bootloader such as LILO or GRUB, is no longer supported in BIOS + (non-EFI systems). On UEFI/EFI systems, however, you can use EFISTUB + which allows the motherboard to boot directly to the kernel. + On modern workstations and desktops, it's generally recommended to use a + bootloader as difficulties can arise with multiple kernels and secure boot. + For more details on EFISTUB, + see "Documentation/admin-guide/efi-stub.rst". + + - It's important to note that as of 2016 LILO (LInux LOader) is no longer in + active development, though as it was extremely popular, it often comes up + in documentation. Popular alternatives include GRUB2, rEFInd, Syslinux, + systemd-boot, or EFISTUB. For various reasons, it's not recommended to use + software that's no longer in active development. + + - Chances are your distribution includes an install script and running + ``make install`` will be all that's needed. Should that not be the case + you'll have to identify your bootloader and reference its documentation or + configure your EFI. + +Legacy LILO Instructions +------------------------ + + + - If you use LILO the kernel images are specified in the file /etc/lilo.conf. + The kernel image file is usually /vmlinuz, /boot/vmlinuz, /bzImage or + /boot/bzImage. To use the new kernel, save a copy of the old image and copy + the new image over the old one. Then, you MUST RERUN LILO to update the + loading map! If you don't, you won't be able to boot the new kernel image. + + - Reinstalling LILO is usually a matter of running /sbin/lilo. You may wish + to edit /etc/lilo.conf to specify an entry for your old kernel image + (say, /vmlinux.old) in case the new one does not work. See the LILO docs + for more information. + + - After reinstalling LILO, you should be all set. Shutdown the system, reboot, and enjoy! - If you ever need to change the default root device, video mode, - etc. in the kernel image, use your bootloader's boot options - where appropriate. No need to recompile the kernel to change - these parameters. + - If you ever need to change the default root device, video mode, etc. in the + kernel image, use your bootloader's boot options where appropriate. No need + to recompile the kernel to change these parameters. - Reboot with the new kernel and enjoy. + If something goes wrong ----------------------- diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst index ae646d621a8a..7d3415eea05d 100644 --- a/Documentation/admin-guide/cgroup-v1/cpusets.rst +++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst @@ -179,7 +179,7 @@ files describing that cpuset: - cpuset.mem_hardwall flag: is memory allocation hardwalled - cpuset.memory_pressure: measure of how much paging pressure in cpuset - cpuset.memory_spread_page flag: if set, spread page cache evenly on allowed nodes - - cpuset.memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes + - cpuset.memory_spread_slab flag: OBSOLETE. Doesn't have any function. - cpuset.sched_load_balance flag: if set, load balance within CPUs on that cpuset - cpuset.sched_relax_domain_level: the searching range when migrating tasks diff --git a/Documentation/admin-guide/cgroup-v1/hugetlb.rst b/Documentation/admin-guide/cgroup-v1/hugetlb.rst index 0fa724d82abb..493a8e386700 100644 --- a/Documentation/admin-guide/cgroup-v1/hugetlb.rst +++ b/Documentation/admin-guide/cgroup-v1/hugetlb.rst @@ -65,10 +65,12 @@ files include:: 1. Page fault accounting -hugetlb.<hugepagesize>.limit_in_bytes -hugetlb.<hugepagesize>.max_usage_in_bytes -hugetlb.<hugepagesize>.usage_in_bytes -hugetlb.<hugepagesize>.failcnt +:: + + hugetlb.<hugepagesize>.limit_in_bytes + hugetlb.<hugepagesize>.max_usage_in_bytes + hugetlb.<hugepagesize>.usage_in_bytes + hugetlb.<hugepagesize>.failcnt The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per control group and enforces the limit during page fault. Since HugeTLB @@ -82,10 +84,12 @@ getting SIGBUS. 2. Reservation accounting -hugetlb.<hugepagesize>.rsvd.limit_in_bytes -hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes -hugetlb.<hugepagesize>.rsvd.usage_in_bytes -hugetlb.<hugepagesize>.rsvd.failcnt +:: + + hugetlb.<hugepagesize>.rsvd.limit_in_bytes + hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes + hugetlb.<hugepagesize>.rsvd.usage_in_bytes + hugetlb.<hugepagesize>.rsvd.failcnt The HugeTLB controller allows to limit the HugeTLB reservations per control group and enforces the controller limit at reservation time and at the fault of diff --git a/Documentation/admin-guide/device-mapper/index.rst b/Documentation/admin-guide/device-mapper/index.rst index cde52cc09645..cc5aec861576 100644 --- a/Documentation/admin-guide/device-mapper/index.rst +++ b/Documentation/admin-guide/device-mapper/index.rst @@ -34,6 +34,8 @@ Device Mapper switch thin-provisioning unstriped + vdo-design + vdo verity writecache zero diff --git a/Documentation/admin-guide/device-mapper/vdo-design.rst b/Documentation/admin-guide/device-mapper/vdo-design.rst new file mode 100644 index 000000000000..3cd59decbec0 --- /dev/null +++ b/Documentation/admin-guide/device-mapper/vdo-design.rst @@ -0,0 +1,633 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +================ +Design of dm-vdo +================ + +The dm-vdo (virtual data optimizer) target provides inline deduplication, +compression, zero-block elimination, and thin provisioning. A dm-vdo target +can be backed by up to 256TB of storage, and can present a logical size of +up to 4PB. This target was originally developed at Permabit Technology +Corp. starting in 2009. It was first released in 2013 and has been used in +production environments ever since. It was made open-source in 2017 after +Permabit was acquired by Red Hat. This document describes the design of +dm-vdo. For usage, see vdo.rst in the same directory as this file. + +Because deduplication rates fall drastically as the block size increases, a +vdo target has a maximum block size of 4K. However, it can achieve +deduplication rates of 254:1, i.e. up to 254 copies of a given 4K block can +reference a single 4K of actual storage. It can achieve compression rates +of 14:1. All zero blocks consume no storage at all. + +Theory of Operation +=================== + +The design of dm-vdo is based on the idea that deduplication is a two-part +problem. The first is to recognize duplicate data. The second is to avoid +storing multiple copies of those duplicates. Therefore, dm-vdo has two main +parts: a deduplication index (called UDS) that is used to discover +duplicate data, and a data store with a reference counted block map that +maps from logical block addresses to the actual storage location of the +data. + +Zones and Threading +------------------- + +Due to the complexity of data optimization, the number of metadata +structures involved in a single write operation to a vdo target is larger +than most other targets. Furthermore, because vdo must operate on small +block sizes in order to achieve good deduplication rates, acceptable +performance can only be achieved through parallelism. Therefore, vdo's +design attempts to be lock-free. + +Most of a vdo's main data structures are designed to be easily divided into +"zones" such that any given bio must only access a single zone of any zoned +structure. Safety with minimal locking is achieved by ensuring that during +normal operation, each zone is assigned to a specific thread, and only that +thread will access the portion of the data structure in that zone. +Associated with each thread is a work queue. Each bio is associated with a +request object (the "data_vio") which will be added to a work queue when +the next phase of its operation requires access to the structures in the +zone associated with that queue. + +Another way of thinking about this arrangement is that the work queue for +each zone has an implicit lock on the structures it manages for all its +operations, because vdo guarantees that no other thread will alter those +structures. + +Although each structure is divided into zones, this division is not +reflected in the on-disk representation of each data structure. Therefore, +the number of zones for each structure, and hence the number of threads, +can be reconfigured each time a vdo target is started. + +The Deduplication Index +----------------------- + +In order to identify duplicate data efficiently, vdo was designed to +leverage some common characteristics of duplicate data. From empirical +observations, we gathered two key insights. The first is that in most data +sets with significant amounts of duplicate data, the duplicates tend to +have temporal locality. When a duplicate appears, it is more likely that +other duplicates will be detected, and that those duplicates will have been +written at about the same time. This is why the index keeps records in +temporal order. The second insight is that new data is more likely to +duplicate recent data than it is to duplicate older data and in general, +there are diminishing returns to looking further back in time. Therefore, +when the index is full, it should cull its oldest records to make space for +new ones. Another important idea behind the design of the index is that the +ultimate goal of deduplication is to reduce storage costs. Since there is a +trade-off between the storage saved and the resources expended to achieve +those savings, vdo does not attempt to find every last duplicate block. It +is sufficient to find and eliminate most of the redundancy. + +Each block of data is hashed to produce a 16-byte block name. An index +record consists of this block name paired with the presumed location of +that data on the underlying storage. However, it is not possible to +guarantee that the index is accurate. In the most common case, this occurs +because it is too costly to update the index when a block is over-written +or discarded. Doing so would require either storing the block name along +with the blocks, which is difficult to do efficiently in block-based +storage, or reading and rehashing each block before overwriting it. +Inaccuracy can also result from a hash collision where two different blocks +have the same name. In practice, this is extremely unlikely, but because +vdo does not use a cryptographic hash, a malicious workload could be +constructed. Because of these inaccuracies, vdo treats the locations in the +index as hints, and reads each indicated block to verify that it is indeed +a duplicate before sharing the existing block with a new one. + +Records are collected into groups called chapters. New records are added to +the newest chapter, called the open chapter. This chapter is stored in a +format optimized for adding and modifying records, and the content of the +open chapter is not finalized until it runs out of space for new records. +When the open chapter fills up, it is closed and a new open chapter is +created to collect new records. + +Closing a chapter converts it to a different format which is optimized for +reading. The records are written to a series of record pages based on the +order in which they were received. This means that records with temporal +locality should be on a small number of pages, reducing the I/O required to +retrieve them. The chapter also compiles an index that indicates which +record page contains any given name. This index means that a request for a +name can determine exactly which record page may contain that record, +without having to load the entire chapter from storage. This index uses +only a subset of the block name as its key, so it cannot guarantee that an +index entry refers to the desired block name. It can only guarantee that if +there is a record for this name, it will be on the indicated page. Closed +chapters are read-only structures and their contents are never altered in +any way. + +Once enough records have been written to fill up all the available index +space, the oldest chapter is removed to make space for new chapters. Any +time a request finds a matching record in the index, that record is copied +into the open chapter. This ensures that useful block names remain available +in the index, while unreferenced block names are forgotten over time. + +In order to find records in older chapters, the index also maintains a +higher level structure called the volume index, which contains entries +mapping each block name to the chapter containing its newest record. This +mapping is updated as records for the block name are copied or updated, +ensuring that only the newest record for a given block name can be found. +An older record for a block name will no longer be found even though it has +not been deleted from its chapter. Like the chapter index, the volume index +uses only a subset of the block name as its key and can not definitively +say that a record exists for a name. It can only say which chapter would +contain the record if a record exists. The volume index is stored entirely +in memory and is saved to storage only when the vdo target is shut down. + +From the viewpoint of a request for a particular block name, it will first +look up the name in the volume index. This search will either indicate that +the name is new, or which chapter to search. If it returns a chapter, the +request looks up its name in the chapter index. This will indicate either +that the name is new, or which record page to search. Finally, if it is not +new, the request will look for its name in the indicated record page. +This process may require up to two page reads per request (one for the +chapter index page and one for the request page). However, recently +accessed pages are cached so that these page reads can be amortized across +many block name requests. + +The volume index and the chapter indexes are implemented using a +memory-efficient structure called a delta index. Instead of storing the +entire block name (the key) for each entry, the entries are sorted by name +and only the difference between adjacent keys (the delta) is stored. +Because we expect the hashes to be randomly distributed, the size of the +deltas follows an exponential distribution. Because of this distribution, +the deltas are expressed using a Huffman code to take up even less space. +The entire sorted list of keys is called a delta list. This structure +allows the index to use many fewer bytes per entry than a traditional hash +table, but it is slightly more expensive to look up entries, because a +request must read every entry in a delta list to add up the deltas in order +to find the record it needs. The delta index reduces this lookup cost by +splitting its key space into many sub-lists, each starting at a fixed key +value, so that each individual list is short. + +The default index size can hold 64 million records, corresponding to about +256GB of data. This means that the index can identify duplicate data if the +original data was written within the last 256GB of writes. This range is +called the deduplication window. If new writes duplicate data that is older +than that, the index will not be able to find it because the records of the +older data have been removed. This means that if an application writes a +200 GB file to a vdo target and then immediately writes it again, the two +copies will deduplicate perfectly. Doing the same with a 500 GB file will +result in no deduplication, because the beginning of the file will no +longer be in the index by the time the second write begins (assuming there +is no duplication within the file itself). + +If an application anticipates a data workload that will see useful +deduplication beyond the 256GB threshold, vdo can be configured to use a +larger index with a correspondingly larger deduplication window. (This +configuration can only be set when the target is created, not altered +later. It is important to consider the expected workload for a vdo target +before configuring it.) There are two ways to do this. + +One way is to increase the memory size of the index, which also increases +the amount of backing storage required. Doubling the size of the index will +double the length of the deduplication window at the expense of doubling +the storage size and the memory requirements. + +The other option is to enable sparse indexing. Sparse indexing increases +the deduplication window by a factor of 10, at the expense of also +increasing the storage size by a factor of 10. However with sparse +indexing, the memory requirements do not increase. The trade-off is +slightly more computation per request and a slight decrease in the amount +of deduplication detected. For most workloads with significant amounts of +duplicate data, sparse indexing will detect 97-99% of the deduplication +that a standard index will detect. + +The vio and data_vio Structures +------------------------------- + +A vio (short for Vdo I/O) is conceptually similar to a bio, with additional +fields and data to track vdo-specific information. A struct vio maintains a +pointer to a bio but also tracks other fields specific to the operation of +vdo. The vio is kept separate from its related bio because there are many +circumstances where vdo completes the bio but must continue to do work +related to deduplication or compression. + +Metadata reads and writes, and other writes that originate within vdo, use +a struct vio directly. Application reads and writes use a larger structure +called a data_vio to track information about their progress. A struct +data_vio contain a struct vio and also includes several other fields +related to deduplication and other vdo features. The data_vio is the +primary unit of application work in vdo. Each data_vio proceeds through a +set of steps to handle the application data, after which it is reset and +returned to a pool of data_vios for reuse. + +There is a fixed pool of 2048 data_vios. This number was chosen to bound +the amount of work that is required to recover from a crash. In addition, +benchmarks have indicated that increasing the size of the pool does not +significantly improve performance. + +The Data Store +-------------- + +The data store is implemented by three main data structures, all of which +work in concert to reduce or amortize metadata updates across as many data +writes as possible. + +*The Slab Depot* + +Most of the vdo volume belongs to the slab depot. The depot contains a +collection of slabs. The slabs can be up to 32GB, and are divided into +three sections. Most of a slab consists of a linear sequence of 4K blocks. +These blocks are used either to store data, or to hold portions of the +block map (see below). In addition to the data blocks, each slab has a set +of reference counters, using 1 byte for each data block. Finally each slab +has a journal. + +Reference updates are written to the slab journal. Slab journal blocks are +written out either when they are full, or when the recovery journal +requests they do so in order to allow the main recovery journal (see below) +to free up space. The slab journal is used both to ensure that the main +recovery journal can regularly free up space, and also to amortize the cost +of updating individual reference blocks. The reference counters are kept in +memory and are written out, a block at a time in oldest-dirtied-order, only +when there is a need to reclaim slab journal space. The write operations +are performed in the background as needed so they do not add latency to +particular I/O operations. + +Each slab is independent of every other. They are assigned to "physical +zones" in round-robin fashion. If there are P physical zones, then slab n +is assigned to zone n mod P. + +The slab depot maintains an additional small data structure, the "slab +summary," which is used to reduce the amount of work needed to come back +online after a crash. The slab summary maintains an entry for each slab +indicating whether or not the slab has ever been used, whether all of its +reference count updates have been persisted to storage, and approximately +how full it is. During recovery, each physical zone will attempt to recover +at least one slab, stopping whenever it has recovered a slab which has some +free blocks. Once each zone has some space, or has determined that none is +available, the target can resume normal operation in a degraded mode. Read +and write requests can be serviced, perhaps with degraded performance, +while the remainder of the dirty slabs are recovered. + +*The Block Map* + +The block map contains the logical to physical mapping. It can be thought +of as an array with one entry per logical address. Each entry is 5 bytes, +36 bits of which contain the physical block number which holds the data for +the given logical address. The other 4 bits are used to indicate the nature +of the mapping. Of the 16 possible states, one represents a logical address +which is unmapped (i.e. it has never been written, or has been discarded), +one represents an uncompressed block, and the other 14 states are used to +indicate that the mapped data is compressed, and which of the compression +slots in the compressed block contains the data for this logical address. + +In practice, the array of mapping entries is divided into "block map +pages," each of which fits in a single 4K block. Each block map page +consists of a header and 812 mapping entries. Each mapping page is actually +a leaf of a radix tree which consists of block map pages at each level. +There are 60 radix trees which are assigned to "logical zones" in round +robin fashion. (If there are L logical zones, tree n will belong to zone n +mod L.) At each level, the trees are interleaved, so logical addresses +0-811 belong to tree 0, logical addresses 812-1623 belong to tree 1, and so +on. The interleaving is maintained all the way up to the 60 root nodes. +Choosing 60 trees results in an evenly distributed number of trees per zone +for a large number of possible logical zone counts. The storage for the 60 +tree roots is allocated at format time. All other block map pages are +allocated out of the slabs as needed. This flexible allocation avoids the +need to pre-allocate space for the entire set of logical mappings and also +makes growing the logical size of a vdo relatively easy. + +In operation, the block map maintains two caches. It is prohibitive to keep +the entire leaf level of the trees in memory, so each logical zone +maintains its own cache of leaf pages. The size of this cache is +configurable at target start time. The second cache is allocated at start +time, and is large enough to hold all the non-leaf pages of the entire +block map. This cache is populated as pages are needed. + +*The Recovery Journal* + +The recovery journal is used to amortize updates across the block map and +slab depot. Each write request causes an entry to be made in the journal. +Entries are either "data remappings" or "block map remappings." For a data +remapping, the journal records the logical address affected and its old and +new physical mappings. For a block map remapping, the journal records the +block map page number and the physical block allocated for it. Block map +pages are never reclaimed or repurposed, so the old mapping is always 0. + +Each journal entry is an intent record summarizing the metadata updates +that are required for a data_vio. The recovery journal issues a flush +before each journal block write to ensure that the physical data for the +new block mappings in that block are stable on storage, and journal block +writes are all issued with the FUA bit set to ensure the recovery journal +entries themselves are stable. The journal entry and the data write it +represents must be stable on disk before the other metadata structures may +be updated to reflect the operation. These entries allow the vdo device to +reconstruct the logical to physical mappings after an unexpected +interruption such as a loss of power. + +*Write Path* + +All write I/O to vdo is asynchronous. Each bio will be acknowledged as soon +as vdo has done enough work to guarantee that it can complete the write +eventually. Generally, the data for acknowledged but unflushed write I/O +can be treated as though it is cached in memory. If an application +requires data to be stable on storage, it must issue a flush or write the +data with the FUA bit set like any other asynchronous I/O. Shutting down +the vdo target will also flush any remaining I/O. + +Application write bios follow the steps outlined below. + +1. A data_vio is obtained from the data_vio pool and associated with the + application bio. If there are no data_vios available, the incoming bio + will block until a data_vio is available. This provides back pressure + to the application. The data_vio pool is protected by a spin lock. + + The newly acquired data_vio is reset and the bio's data is copied into + the data_vio if it is a write and the data is not all zeroes. The data + must be copied because the application bio can be acknowledged before + the data_vio processing is complete, which means later processing steps + will no longer have access to the application bio. The application bio + may also be smaller than 4K, in which case the data_vio will have + already read the underlying block and the data is instead copied over + the relevant portion of the larger block. + +2. The data_vio places a claim (the "logical lock") on the logical address + of the bio. It is vital to prevent simultaneous modifications of the + same logical address, because deduplication involves sharing blocks. + This claim is implemented as an entry in a hashtable where the key is + the logical address and the value is a pointer to the data_vio + currently handling that address. + + If a data_vio looks in the hashtable and finds that another data_vio is + already operating on that logical address, it waits until the previous + operation finishes. It also sends a message to inform the current + lock holder that it is waiting. Most notably, a new data_vio waiting + for a logical lock will flush the previous lock holder out of the + compression packer (step 8d) rather than allowing it to continue + waiting to be packed. + + This stage requires the data_vio to get an implicit lock on the + appropriate logical zone to prevent concurrent modifications of the + hashtable. This implicit locking is handled by the zone divisions + described above. + +3. The data_vio traverses the block map tree to ensure that all the + necessary internal tree nodes have been allocated, by trying to find + the leaf page for its logical address. If any interior tree page is + missing, it is allocated at this time out of the same physical storage + pool used to store application data. + + a. If any page-node in the tree has not yet been allocated, it must be + allocated before the write can continue. This step requires the + data_vio to lock the page-node that needs to be allocated. This + lock, like the logical block lock in step 2, is a hashtable entry + that causes other data_vios to wait for the allocation process to + complete. + + The implicit logical zone lock is released while the allocation is + happening, in order to allow other operations in the same logical + zone to proceed. The details of allocation are the same as in + step 4. Once a new node has been allocated, that node is added to + the tree using a similar process to adding a new data block mapping. + The data_vio journals the intent to add the new node to the block + map tree (step 10), updates the reference count of the new block + (step 11), and reacquires the implicit logical zone lock to add the + new mapping to the parent tree node (step 12). Once the tree is + updated, the data_vio proceeds down the tree. Any other data_vios + waiting on this allocation also proceed. + + b. In the steady-state case, the block map tree nodes will already be + allocated, so the data_vio just traverses the tree until it finds + the required leaf node. The location of the mapping (the "block map + slot") is recorded in the data_vio so that later steps do not need + to traverse the tree again. The data_vio then releases the implicit + logical zone lock. + +4. If the block is a zero block, skip to step 9. Otherwise, an attempt is + made to allocate a free data block. This allocation ensures that the + data_vio can write its data somewhere even if deduplication and + compression are not possible. This stage gets an implicit lock on a + physical zone to search for free space within that zone. + + The data_vio will search each slab in a zone until it finds a free + block or decides there are none. If the first zone has no free space, + it will proceed to search the next physical zone by taking the implicit + lock for that zone and releasing the previous one until it finds a + free block or runs out of zones to search. The data_vio will acquire a + struct pbn_lock (the "physical block lock") on the free block. The + struct pbn_lock also has several fields to record the various kinds of + claims that data_vios can have on physical blocks. The pbn_lock is + added to a hashtable like the logical block locks in step 2. This + hashtable is also covered by the implicit physical zone lock. The + reference count of the free block is updated to prevent any other + data_vio from considering it free. The reference counters are a + sub-component of the slab and are thus also covered by the implicit + physical zone lock. + +5. If an allocation was obtained, the data_vio has all the resources it + needs to complete the write. The application bio can safely be + acknowledged at this point. The acknowledgment happens on a separate + thread to prevent the application callback from blocking other data_vio + operations. + + If an allocation could not be obtained, the data_vio continues to + attempt to deduplicate or compress the data, but the bio is not + acknowledged because the vdo device may be out of space. + +6. At this point vdo must determine where to store the application data. + The data_vio's data is hashed and the hash (the "record name") is + recorded in the data_vio. + +7. The data_vio reserves or joins a struct hash_lock, which manages all of + the data_vios currently writing the same data. Active hash locks are + tracked in a hashtable similar to the way logical block locks are + tracked in step 2. This hashtable is covered by the implicit lock on + the hash zone. + + If there is no existing hash lock for this data_vio's record_name, the + data_vio obtains a hash lock from the pool, adds it to the hashtable, + and sets itself as the new hash lock's "agent." The hash_lock pool is + also covered by the implicit hash zone lock. The hash lock agent will + do all the work to decide where the application data will be + written. If a hash lock for the data_vio's record_name already exists, + and the data_vio's data is the same as the agent's data, the new + data_vio will wait for the agent to complete its work and then share + its result. + + In the rare case that a hash lock exists for the data_vio's hash but + the data does not match the hash lock's agent, the data_vio skips to + step 8h and attempts to write its data directly. This can happen if two + different data blocks produce the same hash, for example. + +8. The hash lock agent attempts to deduplicate or compress its data with + the following steps. + + a. The agent initializes and sends its embedded deduplication request + (struct uds_request) to the deduplication index. This does not + require the data_vio to get any locks because the index components + manage their own locking. The data_vio waits until it either gets a + response from the index or times out. + + b. If the deduplication index returns advice, the data_vio attempts to + obtain a physical block lock on the indicated physical address, in + order to read the data and verify that it is the same as the + data_vio's data, and that it can accept more references. If the + physical address is already locked by another data_vio, the data at + that address may soon be overwritten so it is not safe to use the + address for deduplication. + + c. If the data matches and the physical block can add references, the + agent and any other data_vios waiting on it will record this + physical block as their new physical address and proceed to step 9 + to record their new mapping. If there are more data_vios in the hash + lock than there are references available, one of the remaining + data_vios becomes the new agent and continues to step 8d as if no + valid advice was returned. + + d. If no usable duplicate block was found, the agent first checks that + it has an allocated physical block (from step 3) that it can write + to. If the agent does not have an allocation, some other data_vio in + the hash lock that does have an allocation takes over as agent. If + none of the data_vios have an allocated physical block, these writes + are out of space, so they proceed to step 13 for cleanup. + + e. The agent attempts to compress its data. If the data does not + compress, the data_vio will continue to step 8h to write its data + directly. + + If the compressed size is small enough, the agent will release the + implicit hash zone lock and go to the packer (struct packer) where + it will be placed in a bin (struct packer_bin) along with other + data_vios. All compression operations require the implicit lock on + the packer zone. + + The packer can combine up to 14 compressed blocks in a single 4k + data block. Compression is only helpful if vdo can pack at least 2 + data_vios into a single data block. This means that a data_vio may + wait in the packer for an arbitrarily long time for other data_vios + to fill out the compressed block. There is a mechanism for vdo to + evict waiting data_vios when continuing to wait would cause + problems. Circumstances causing an eviction include an application + flush, device shutdown, or a subsequent data_vio trying to overwrite + the same logical block address. A data_vio may also be evicted from + the packer if it cannot be paired with any other compressed block + before more compressible blocks need to use its bin. An evicted + data_vio will proceed to step 8h to write its data directly. + + f. If the agent fills a packer bin, either because all 14 of its slots + are used or because it has no remaining space, it is written out + using the allocated physical block from one of its data_vios. Step + 8d has already ensured that an allocation is available. + + g. Each data_vio sets the compressed block as its new physical address. + The data_vio obtains an implicit lock on the physical zone and + acquires the struct pbn_lock for the compressed block, which is + modified to be a shared lock. Then it releases the implicit physical + zone lock and proceeds to step 8i. + + h. Any data_vio evicted from the packer will have an allocation from + step 3. It will write its data to that allocated physical block. + + i. After the data is written, if the data_vio is the agent of a hash + lock, it will reacquire the implicit hash zone lock and share its + physical address with as many other data_vios in the hash lock as + possible. Each data_vio will then proceed to step 9 to record its + new mapping. + + j. If the agent actually wrote new data (whether compressed or not), + the deduplication index is updated to reflect the location of the + new data. The agent then releases the implicit hash zone lock. + +9. The data_vio determines the previous mapping of the logical address. + There is a cache for block map leaf pages (the "block map cache"), + because there are usually too many block map leaf nodes to store + entirely in memory. If the desired leaf page is not in the cache, the + data_vio will reserve a slot in the cache and load the desired page + into it, possibly evicting an older cached page. The data_vio then + finds the current physical address for this logical address (the "old + physical mapping"), if any, and records it. This step requires a lock + on the block map cache structures, covered by the implicit logical zone + lock. + +10. The data_vio makes an entry in the recovery journal containing the + logical block address, the old physical mapping, and the new physical + mapping. Making this journal entry requires holding the implicit + recovery journal lock. The data_vio will wait in the journal until all + recovery blocks up to the one containing its entry have been written + and flushed to ensure the transaction is stable on storage. + +11. Once the recovery journal entry is stable, the data_vio makes two slab + journal entries: an increment entry for the new mapping, and a + decrement entry for the old mapping. These two operations each require + holding a lock on the affected physical slab, covered by its implicit + physical zone lock. For correctness during recovery, the slab journal + entries in any given slab journal must be in the same order as the + corresponding recovery journal entries. Therefore, if the two entries + are in different zones, they are made concurrently, and if they are in + the same zone, the increment is always made before the decrement in + order to avoid underflow. After each slab journal entry is made in + memory, the associated reference count is also updated in memory. + +12. Once both of the reference count updates are done, the data_vio + acquires the implicit logical zone lock and updates the + logical-to-physical mapping in the block map to point to the new + physical block. At this point the write operation is complete. + +13. If the data_vio has a hash lock, it acquires the implicit hash zone + lock and releases its hash lock to the pool. + + The data_vio then acquires the implicit physical zone lock and releases + the struct pbn_lock it holds for its allocated block. If it had an + allocation that it did not use, it also sets the reference count for + that block back to zero to free it for use by subsequent data_vios. + + The data_vio then acquires the implicit logical zone lock and releases + the logical block lock acquired in step 2. + + The application bio is then acknowledged if it has not previously been + acknowledged, and the data_vio is returned to the pool. + +*Read Path* + +An application read bio follows a much simpler set of steps. It does steps +1 and 2 in the write path to obtain a data_vio and lock its logical +address. If there is already a write data_vio in progress for that logical +address that is guaranteed to complete, the read data_vio will copy the +data from the write data_vio and return it. Otherwise, it will look up the +logical-to-physical mapping by traversing the block map tree as in step 3, +and then read and possibly decompress the indicated data at the indicated +physical block address. A read data_vio will not allocate block map tree +nodes if they are missing. If the interior block map nodes do not exist +yet, the logical block map address must still be unmapped and the read +data_vio will return all zeroes. A read data_vio handles cleanup and +acknowledgment as in step 13, although it only needs to release the logical +lock and return itself to the pool. + +*Small Writes* + +All storage within vdo is managed as 4KB blocks, but it can accept writes +as small as 512 bytes. Processing a write that is smaller than 4K requires +a read-modify-write operation that reads the relevant 4K block, copies the +new data over the approriate sectors of the block, and then launches a +write operation for the modified data block. The read and write stages of +this operation are nearly identical to the normal read and write +operations, and a single data_vio is used throughout this operation. + +*Recovery* + +When a vdo is restarted after a crash, it will attempt to recover from the +recovery journal. During the pre-resume phase of the next start, the +recovery journal is read. The increment portion of valid entries are played +into the block map. Next, valid entries are played, in order as required, +into the slab journals. Finally, each physical zone attempts to replay at +least one slab journal to reconstruct the reference counts of one slab. +Once each zone has some free space (or has determined that it has none), +the vdo comes back online, while the remainder of the slab journals are +used to reconstruct the rest of the reference counts in the background. + +*Read-only Rebuild* + +If a vdo encounters an unrecoverable error, it will enter read-only mode. +This mode indicates that some previously acknowledged data may have been +lost. The vdo may be instructed to rebuild as best it can in order to +return to a writable state. However, this is never done automatically due +to the possibility that data has been lost. During a read-only rebuild, the +block map is recovered from the recovery journal as before. However, the +reference counts are not rebuilt from the slab journals. Instead, the +reference counts are zeroed, the entire block map is traversed, and the +reference counts are updated from the block mappings. While this may lose +some data, it ensures that the block map and reference counts are +consistent with each other. This allows vdo to resume normal operation and +accept further writes. diff --git a/Documentation/admin-guide/device-mapper/vdo.rst b/Documentation/admin-guide/device-mapper/vdo.rst new file mode 100644 index 000000000000..7e1ecafdf91e --- /dev/null +++ b/Documentation/admin-guide/device-mapper/vdo.rst @@ -0,0 +1,406 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +dm-vdo +====== + +The dm-vdo (virtual data optimizer) device mapper target provides +block-level deduplication, compression, and thin provisioning. As a device +mapper target, it can add these features to the storage stack, compatible +with any file system. The vdo target does not protect against data +corruption, relying instead on integrity protection of the storage below +it. It is strongly recommended that lvm be used to manage vdo volumes. See +lvmvdo(7). + +Userspace component +=================== + +Formatting a vdo volume requires the use of the 'vdoformat' tool, available +at: + +https://github.com/dm-vdo/vdo/ + +In most cases, a vdo target will recover from a crash automatically the +next time it is started. In cases where it encountered an unrecoverable +error (either during normal operation or crash recovery) the target will +enter or come up in read-only mode. Because read-only mode is indicative of +data-loss, a positive action must be taken to bring vdo out of read-only +mode. The 'vdoforcerebuild' tool, available from the same repo, is used to +prepare a read-only vdo to exit read-only mode. After running this tool, +the vdo target will rebuild its metadata the next time it is +started. Although some data may be lost, the rebuilt vdo's metadata will be +internally consistent and the target will be writable again. + +The repo also contains additional userspace tools which can be used to +inspect a vdo target's on-disk metadata. Fortunately, these tools are +rarely needed except by dm-vdo developers. + +Metadata requirements +===================== + +Each vdo volume reserves 3GB of space for metadata, or more depending on +its configuration. It is helpful to check that the space saved by +deduplication and compression is not cancelled out by the metadata +requirements. An estimation of the space saved for a specific dataset can +be computed with the vdo estimator tool, which is available at: + +https://github.com/dm-vdo/vdoestimator/ + +Target interface +================ + +Table line +---------- + +:: + + <offset> <logical device size> vdo V4 <storage device> + <storage device size> <minimum I/O size> <block map cache size> + <block map era length> [optional arguments] + + +Required parameters: + + offset: + The offset, in sectors, at which the vdo volume's logical + space begins. + + logical device size: + The size of the device which the vdo volume will service, + in sectors. Must match the current logical size of the vdo + volume. + + storage device: + The device holding the vdo volume's data and metadata. + + storage device size: + The size of the device holding the vdo volume, as a number + of 4096-byte blocks. Must match the current size of the vdo + volume. + + minimum I/O size: + The minimum I/O size for this vdo volume to accept, in + bytes. Valid values are 512 or 4096. The recommended value + is 4096. + + block map cache size: + The size of the block map cache, as a number of 4096-byte + blocks. The minimum and recommended value is 32768 blocks. + If the logical thread count is non-zero, the cache size + must be at least 4096 blocks per logical thread. + + block map era length: + The speed with which the block map cache writes out + modified block map pages. A smaller era length is likely to + reduce the amount of time spent rebuilding, at the cost of + increased block map writes during normal operation. The + maximum and recommended value is 16380; the minimum value + is 1. + +Optional parameters: +-------------------- +Some or all of these parameters may be specified as <key> <value> pairs. + +Thread related parameters: + +Different categories of work are assigned to separate thread groups, and +the number of threads in each group can be configured separately. + +If <hash>, <logical>, and <physical> are all set to 0, the work handled by +all three thread types will be handled by a single thread. If any of these +values are non-zero, all of them must be non-zero. + + ack: + The number of threads used to complete bios. Since + completing a bio calls an arbitrary completion function + outside the vdo volume, threads of this type allow the vdo + volume to continue processing requests even when bio + completion is slow. The default is 1. + + bio: + The number of threads used to issue bios to the underlying + storage. Threads of this type allow the vdo volume to + continue processing requests even when bio submission is + slow. The default is 4. + + bioRotationInterval: + The number of bios to enqueue on each bio thread before + switching to the next thread. The value must be greater + than 0 and not more than 1024; the default is 64. + + cpu: + The number of threads used to do CPU-intensive work, such + as hashing and compression. The default is 1. + + hash: + The number of threads used to manage data comparisons for + deduplication based on the hash value of data blocks. The + default is 0. + + logical: + The number of threads used to manage caching and locking + based on the logical address of incoming bios. The default + is 0; the maximum is 60. + + physical: + The number of threads used to manage administration of the + underlying storage device. At format time, a slab size for + the vdo is chosen; the vdo storage device must be large + enough to have at least 1 slab per physical thread. The + default is 0; the maximum is 16. + +Miscellaneous parameters: + + maxDiscard: + The maximum size of discard bio accepted, in 4096-byte + blocks. I/O requests to a vdo volume are normally split + into 4096-byte blocks, and processed up to 2048 at a time. + However, discard requests to a vdo volume can be + automatically split to a larger size, up to <maxDiscard> + 4096-byte blocks in a single bio, and are limited to 1500 + at a time. Increasing this value may provide better overall + performance, at the cost of increased latency for the + individual discard requests. The default and minimum is 1; + the maximum is UINT_MAX / 4096. + + deduplication: + Whether deduplication is enabled. The default is 'on'; the + acceptable values are 'on' and 'off'. + + compression: + Whether compression is enabled. The default is 'off'; the + acceptable values are 'on' and 'off'. + +Device modification +------------------- + +A modified table may be loaded into a running, non-suspended vdo volume. +The modifications will take effect when the device is next resumed. The +modifiable parameters are <logical device size>, <physical device size>, +<maxDiscard>, <compression>, and <deduplication>. + +If the logical device size or physical device size are changed, upon +successful resume vdo will store the new values and require them on future +startups. These two parameters may not be decreased. The logical device +size may not exceed 4 PB. The physical device size must increase by at +least 32832 4096-byte blocks if at all, and must not exceed the size of the +underlying storage device. Additionally, when formatting the vdo device, a +slab size is chosen: the physical device size may never increase above the +size which provides 8192 slabs, and each increase must be large enough to +add at least one new slab. + +Examples: + +Start a previously-formatted vdo volume with 1 GB logical space and 1 GB +physical space, storing to /dev/dm-1 which has more than 1 GB of space. + +:: + + dmsetup create vdo0 --table \ + "0 2097152 vdo V4 /dev/dm-1 262144 4096 32768 16380" + +Grow the logical size to 4 GB. + +:: + + dmsetup reload vdo0 --table \ + "0 8388608 vdo V4 /dev/dm-1 262144 4096 32768 16380" + dmsetup resume vdo0 + +Grow the physical size to 2 GB. + +:: + + dmsetup reload vdo0 --table \ + "0 8388608 vdo V4 /dev/dm-1 524288 4096 32768 16380" + dmsetup resume vdo0 + +Grow the physical size by 1 GB more and increase max discard sectors. + +:: + + dmsetup reload vdo0 --table \ + "0 10485760 vdo V4 /dev/dm-1 786432 4096 32768 16380 maxDiscard 8" + dmsetup resume vdo0 + +Stop the vdo volume. + +:: + + dmsetup remove vdo0 + +Start the vdo volume again. Note that the logical and physical device sizes +must still match, but other parameters can change. + +:: + + dmsetup create vdo1 --table \ + "0 10485760 vdo V4 /dev/dm-1 786432 512 65550 5000 hash 1 logical 3 physical 2" + +Messages +-------- +All vdo devices accept messages in the form: + +:: + dmsetup message <target-name> 0 <message-name> <message-parameters> + +The messages are: + + stats: + Outputs the current view of the vdo statistics. Mostly used + by the vdostats userspace program to interpret the output + buffer. + + dump: + Dumps many internal structures to the system log. This is + not always safe to run, so it should only be used to debug + a hung vdo. Optional parameters to specify structures to + dump are: + + viopool: The pool of I/O requests incoming bios + pools: A synonym of 'viopool' + vdo: Most of the structures managing on-disk data + queues: Basic information about each vdo thread + threads: A synonym of 'queues' + default: Equivalent to 'queues vdo' + all: All of the above. + + dump-on-shutdown: + Perform a default dump next time vdo shuts down. + + +Status +------ + +:: + + <device> <operating mode> <in recovery> <index state> + <compression state> <physical blocks used> <total physical blocks> + + device: + The name of the vdo volume. + + operating mode: + The current operating mode of the vdo volume; values may be + 'normal', 'recovering' (the volume has detected an issue + with its metadata and is attempting to repair itself), and + 'read-only' (an error has occurred that forces the vdo + volume to only support read operations and not writes). + + in recovery: + Whether the vdo volume is currently in recovery mode; + values may be 'recovering' or '-' which indicates not + recovering. + + index state: + The current state of the deduplication index in the vdo + volume; values may be 'closed', 'closing', 'error', + 'offline', 'online', 'opening', and 'unknown'. + + compression state: + The current state of compression in the vdo volume; values + may be 'offline' and 'online'. + + used physical blocks: + The number of physical blocks in use by the vdo volume. + + total physical blocks: + The total number of physical blocks the vdo volume may use; + the difference between this value and the + <used physical blocks> is the number of blocks the vdo + volume has left before being full. + +Memory Requirements +=================== + +A vdo target requires a fixed 38 MB of RAM along with the following amounts +that scale with the target: + +- 1.15 MB of RAM for each 1 MB of configured block map cache size. The + block map cache requires a minimum of 150 MB. +- 1.6 MB of RAM for each 1 TB of logical space. +- 268 MB of RAM for each 1 TB of physical storage managed by the volume. + +The deduplication index requires additional memory which scales with the +size of the deduplication window. For dense indexes, the index requires 1 +GB of RAM per 1 TB of window. For sparse indexes, the index requires 1 GB +of RAM per 10 TB of window. The index configuration is set when the target +is formatted and may not be modified. + +Module Parameters +================= + +The vdo driver has a numeric parameter 'log_level' which controls the +verbosity of logging from the driver. The default setting is 6 +(LOGLEVEL_INFO and more severe messages). + +Run-time Usage +============== + +When using dm-vdo, it is important to be aware of the ways in which its +behavior differs from other storage targets. + +- There is no guarantee that over-writes of existing blocks will succeed. + Because the underlying storage may be multiply referenced, over-writing + an existing block generally requires a vdo to have a free block + available. + +- When blocks are no longer in use, sending a discard request for those + blocks lets the vdo release references for those blocks. If the vdo is + thinly provisioned, discarding unused blocks is essential to prevent the + target from running out of space. However, due to the sharing of + duplicate blocks, no discard request for any given logical block is + guaranteed to reclaim space. + +- Assuming the underlying storage properly implements flush requests, vdo + is resilient against crashes, however, unflushed writes may or may not + persist after a crash. + +- Each write to a vdo target entails a significant amount of processing. + However, much of the work is paralellizable. Therefore, vdo targets + achieve better throughput at higher I/O depths, and can support up 2048 + requests in parallel. + +Tuning +====== + +The vdo device has many options, and it can be difficult to make optimal +choices without perfect knowledge of the workload. Additionally, most +configuration options must be set when a vdo target is started, and cannot +be changed without shutting it down completely; the configuration cannot be +changed while the target is active. Ideally, tuning with simulated +workloads should be performed before deploying vdo in production +environments. + +The most important value to adjust is the block map cache size. In order to +service a request for any logical address, a vdo must load the portion of +the block map which holds the relevant mapping. These mappings are cached. +Performance will suffer when the working set does not fit in the cache. By +default, a vdo allocates 128 MB of metadata cache in RAM to support +efficient access to 100 GB of logical space at a time. It should be scaled +up proportionally for larger working sets. + +The logical and physical thread counts should also be adjusted. A logical +thread controls a disjoint section of the block map, so additional logical +threads increase parallelism and can increase throughput. Physical threads +control a disjoint section of the data blocks, so additional physical +threads can also increase throughput. However, excess threads can waste +resources and increase contention. + +Bio submission threads control the parallelism involved in sending I/O to +the underlying storage; fewer threads mean there is more opportunity to +reorder I/O requests for performance benefit, but also that each I/O +request has to wait longer before being submitted. + +Bio acknowledgment threads are used for finishing I/O requests. This is +done on dedicated threads since the amount of work required to execute a +bio's callback can not be controlled by the vdo itself. Usually one thread +is sufficient but additional threads may be beneficial, particularly when +bios have CPU-heavy callbacks. + +CPU threads are used for hashing and for compression; in workloads with +compression enabled, more threads may result in higher throughput. + +Hash threads are used to sort active requests by hash and determine whether +they should deduplicate; the most CPU intensive actions done by these +threads are comparison of 4096-byte data blocks. In most cases, a single +hash thread is sufficient. diff --git a/Documentation/admin-guide/edid.rst b/Documentation/admin-guide/edid.rst index 80deeb21a265..1a9b965aa486 100644 --- a/Documentation/admin-guide/edid.rst +++ b/Documentation/admin-guide/edid.rst @@ -24,37 +24,4 @@ restrictions later on. As a remedy for such situations, the kernel configuration item CONFIG_DRM_LOAD_EDID_FIRMWARE was introduced. It allows to provide an individually prepared or corrected EDID data set in the /lib/firmware -directory from where it is loaded via the firmware interface. The code -(see drivers/gpu/drm/drm_edid_load.c) contains built-in data sets for -commonly used screen resolutions (800x600, 1024x768, 1280x1024, 1600x1200, -1680x1050, 1920x1080) as binary blobs, but the kernel source tree does -not contain code to create these data. In order to elucidate the origin -of the built-in binary EDID blobs and to facilitate the creation of -individual data for a specific misbehaving monitor, commented sources -and a Makefile environment are given here. - -To create binary EDID and C source code files from the existing data -material, simply type "make" in tools/edid/. - -If you want to create your own EDID file, copy the file 1024x768.S, -replace the settings with your own data and add a new target to the -Makefile. Please note that the EDID data structure expects the timing -values in a different way as compared to the standard X11 format. - -X11: - HTimings: - hdisp hsyncstart hsyncend htotal - VTimings: - vdisp vsyncstart vsyncend vtotal - -EDID:: - - #define XPIX hdisp - #define XBLANK htotal-hdisp - #define XOFFSET hsyncstart-hdisp - #define XPULSE hsyncend-hsyncstart - - #define YPIX vdisp - #define YBLANK vtotal-vdisp - #define YOFFSET vsyncstart-vdisp - #define YPULSE vsyncend-vsyncstart +directory from where it is loaded via the firmware interface. diff --git a/Documentation/admin-guide/gpio/gpio-mockup.rst b/Documentation/admin-guide/gpio/gpio-mockup.rst index 493071da1738..d6e7438a7550 100644 --- a/Documentation/admin-guide/gpio/gpio-mockup.rst +++ b/Documentation/admin-guide/gpio/gpio-mockup.rst @@ -3,6 +3,14 @@ GPIO Testing Driver =================== +.. note:: + + This module has been obsoleted by the more flexible gpio-sim.rst. + New developments should use that API and existing developments are + encouraged to migrate as soon as possible. + This module will continue to be maintained but no new features will be + added. + The GPIO Testing Driver (gpio-mockup) provides a way to create simulated GPIO chips for testing purposes. The lines exposed by these chips can be accessed using the standard GPIO character device interface as well as manipulated diff --git a/Documentation/admin-guide/gpio/index.rst b/Documentation/admin-guide/gpio/index.rst index f6861ca16ffe..460afd29617e 100644 --- a/Documentation/admin-guide/gpio/index.rst +++ b/Documentation/admin-guide/gpio/index.rst @@ -1,16 +1,16 @@ .. SPDX-License-Identifier: GPL-2.0 ==== -gpio +GPIO ==== .. toctree:: :maxdepth: 1 + Character Device Userspace API <../../userspace-api/gpio/chardev> gpio-aggregator - sysfs - gpio-mockup gpio-sim + Obsolete APIs <obsolete> .. only:: subproject and html diff --git a/Documentation/admin-guide/gpio/obsolete.rst b/Documentation/admin-guide/gpio/obsolete.rst new file mode 100644 index 000000000000..5adbff02d61f --- /dev/null +++ b/Documentation/admin-guide/gpio/obsolete.rst @@ -0,0 +1,13 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================== +Obsolete GPIO APIs +================== + +.. toctree:: + :maxdepth: 1 + + Character Device Userspace API (v1) <../../userspace-api/gpio/chardev_v1> + Sysfs Interface <../../userspace-api/gpio/sysfs> + Mockup Testing Module <gpio-mockup> + diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst index de99caabf65a..ff0b440ef2dc 100644 --- a/Documentation/admin-guide/hw-vuln/index.rst +++ b/Documentation/admin-guide/hw-vuln/index.rst @@ -21,3 +21,4 @@ are configurable at compile, boot or run time. cross-thread-rsb srso gather_data_sampling + reg-file-data-sampling diff --git a/Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst b/Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst new file mode 100644 index 000000000000..0585d02b9a6c --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst @@ -0,0 +1,104 @@ +================================== +Register File Data Sampling (RFDS) +================================== + +Register File Data Sampling (RFDS) is a microarchitectural vulnerability that +only affects Intel Atom parts(also branded as E-cores). RFDS may allow +a malicious actor to infer data values previously used in floating point +registers, vector registers, or integer registers. RFDS does not provide the +ability to choose which data is inferred. CVE-2023-28746 is assigned to RFDS. + +Affected Processors +=================== +Below is the list of affected Intel processors [#f1]_: + + =================== ============ + Common name Family_Model + =================== ============ + ATOM_GOLDMONT 06_5CH + ATOM_GOLDMONT_D 06_5FH + ATOM_GOLDMONT_PLUS 06_7AH + ATOM_TREMONT_D 06_86H + ATOM_TREMONT 06_96H + ALDERLAKE 06_97H + ALDERLAKE_L 06_9AH + ATOM_TREMONT_L 06_9CH + RAPTORLAKE 06_B7H + RAPTORLAKE_P 06_BAH + ATOM_GRACEMONT 06_BEH + RAPTORLAKE_S 06_BFH + =================== ============ + +As an exception to this table, Intel Xeon E family parts ALDERLAKE(06_97H) and +RAPTORLAKE(06_B7H) codenamed Catlow are not affected. They are reported as +vulnerable in Linux because they share the same family/model with an affected +part. Unlike their affected counterparts, they do not enumerate RFDS_CLEAR or +CPUID.HYBRID. This information could be used to distinguish between the +affected and unaffected parts, but it is deemed not worth adding complexity as +the reporting is fixed automatically when these parts enumerate RFDS_NO. + +Mitigation +========== +Intel released a microcode update that enables software to clear sensitive +information using the VERW instruction. Like MDS, RFDS deploys the same +mitigation strategy to force the CPU to clear the affected buffers before an +attacker can extract the secrets. This is achieved by using the otherwise +unused and obsolete VERW instruction in combination with a microcode update. +The microcode clears the affected CPU buffers when the VERW instruction is +executed. + +Mitigation points +----------------- +VERW is executed by the kernel before returning to user space, and by KVM +before VMentry. None of the affected cores support SMT, so VERW is not required +at C-state transitions. + +New bits in IA32_ARCH_CAPABILITIES +---------------------------------- +Newer processors and microcode update on existing affected processors added new +bits to IA32_ARCH_CAPABILITIES MSR. These bits can be used to enumerate +vulnerability and mitigation capability: + +- Bit 27 - RFDS_NO - When set, processor is not affected by RFDS. +- Bit 28 - RFDS_CLEAR - When set, processor is affected by RFDS, and has the + microcode that clears the affected buffers on VERW execution. + +Mitigation control on the kernel command line +--------------------------------------------- +The kernel command line allows to control RFDS mitigation at boot time with the +parameter "reg_file_data_sampling=". The valid arguments are: + + ========== ================================================================= + on If the CPU is vulnerable, enable mitigation; CPU buffer clearing + on exit to userspace and before entering a VM. + off Disables mitigation. + ========== ================================================================= + +Mitigation default is selected by CONFIG_MITIGATION_RFDS. + +Mitigation status information +----------------------------- +The Linux kernel provides a sysfs interface to enumerate the current +vulnerability status of the system: whether the system is vulnerable, and +which mitigations are active. The relevant sysfs file is: + + /sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling + +The possible values in this file are: + + .. list-table:: + + * - 'Not affected' + - The processor is not vulnerable + * - 'Vulnerable' + - The processor is vulnerable, but no mitigation enabled + * - 'Vulnerable: No microcode' + - The processor is vulnerable but microcode is not updated. + * - 'Mitigation: Clear Register File' + - The processor is vulnerable and the CPU buffer clearing mitigation is + enabled. + +References +---------- +.. [#f1] Affected Processors + https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst index 32a8893e5617..cce768afec6b 100644 --- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -473,8 +473,8 @@ Spectre variant 2 -mindirect-branch=thunk-extern -mindirect-branch-register options. If the kernel is compiled with a Clang compiler, the compiler needs to support -mretpoline-external-thunk option. The kernel config - CONFIG_RETPOLINE needs to be turned on, and the CPU needs to run with - the latest updated microcode. + CONFIG_MITIGATION_RETPOLINE needs to be turned on, and the CPU needs + to run with the latest updated microcode. On Intel Skylake-era systems the mitigation covers most, but not all, cases. See :ref:`[3] <spec_ref3>` for more details. @@ -609,8 +609,8 @@ kernel command line. Selecting 'on' will, and 'auto' may, choose a mitigation method at run time according to the CPU, the available microcode, the setting of the - CONFIG_RETPOLINE configuration option, and the - compiler with which the kernel was built. + CONFIG_MITIGATION_RETPOLINE configuration option, + and the compiler with which the kernel was built. Selecting 'on' will also enable the mitigation against user space to user space task attacks. diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index fb40a1f6f79e..32ea52f1d150 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -1,3 +1,4 @@ +================================================= The Linux kernel user's and administrator's guide ================================================= @@ -37,6 +38,7 @@ problems and bugs in particular. reporting-issues reporting-regressions quickly-build-trimmed-linux + verify-bugs-and-bisect-regressions bug-hunting bug-bisect tainted-kernels @@ -122,7 +124,7 @@ configure specific aspects of kernel behavior to your liking. pmf pnp rapidio - ras + RAS/index rtc serial-console svga diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index 5762e7477a0c..0302a93b1d40 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -191,9 +191,7 @@ Dump-capture kernel config options (Arch Dependent, i386 and x86_64) CPU is enough for kdump kernel to dump vmcore on most of systems. However, you can also specify nr_cpus=X to enable multiple processors - in kdump kernel. In this case, "disable_cpu_apicid=" is needed to - tell kdump kernel which cpu is 1st kernel's BSP. Please refer to - admin-guide/kernel-parameters.txt for more details. + in kdump kernel. With CONFIG_SMP=n, the above things are not related. @@ -454,8 +452,7 @@ Notes on loading the dump-capture kernel: to use multi-thread programs with it, such as parallel dump feature of makedumpfile. Otherwise, the multi-thread program may have a great performance degradation. To enable multi-cpu support, you should bring up an - SMP dump-capture kernel and specify maxcpus/nr_cpus, disable_cpu_apicid=[X] - options while loading it. + SMP dump-capture kernel and specify maxcpus/nr_cpus options while loading it. * For s390x there are two kdump modes: If a ELF header is specified with the elfcorehdr= kernel parameter, it is used by the kdump kernel as it diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst index 102937bc8443..e8bdf5e86a9b 100644 --- a/Documentation/admin-guide/kernel-parameters.rst +++ b/Documentation/admin-guide/kernel-parameters.rst @@ -108,6 +108,7 @@ is applicable:: CMA Contiguous Memory Area support is enabled. DRM Direct Rendering Management support is enabled. DYNAMIC_DEBUG Build in debug messages and enable them at runtime + EARLY Parameter processed too early to be embedded in initrd. EDD BIOS Enhanced Disk Drive Services (EDD) is enabled EFI EFI Partitioning (GPT) is enabled EVM Extended Verification Module @@ -218,8 +219,3 @@ bytes respectively. Such letter suffixes can also be entirely omitted: .. include:: kernel-parameters.txt :literal: - -Todo ----- - - Add more DRM drivers. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 31b3a25680d0..479ff1737c2f 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -9,7 +9,7 @@ accept_memory=eager can be used to accept all memory at once during boot. - acpi= [HW,ACPI,X86,ARM64,RISCV64] + acpi= [HW,ACPI,X86,ARM64,RISCV64,EARLY] Advanced Configuration and Power Interface Format: { force | on | off | strict | noirq | rsdt | copy_dsdt } @@ -26,7 +26,7 @@ See also Documentation/power/runtime_pm.rst, pci=noacpi - acpi_apic_instance= [ACPI, IOAPIC] + acpi_apic_instance= [ACPI,IOAPIC,EARLY] Format: <int> 2: use 2nd APIC table, if available 1,0: use 1st APIC table @@ -41,7 +41,7 @@ If set to native, use the device's native backlight mode. If set to none, disable the ACPI backlight interface. - acpi_force_32bit_fadt_addr + acpi_force_32bit_fadt_addr [ACPI,EARLY] force FADT to use 32 bit addresses rather than the 64 bit X_* addresses. Some firmware have broken 64 bit addresses for force ACPI ignore these and use @@ -97,7 +97,7 @@ no: ACPI OperationRegions are not marked as reserved, no further checks are performed. - acpi_force_table_verification [HW,ACPI] + acpi_force_table_verification [HW,ACPI,EARLY] Enable table checksum verification during early stage. By default, this is disabled due to x86 early mapping size limitation. @@ -137,7 +137,7 @@ acpi_no_memhotplug [ACPI] Disable memory hotplug. Useful for kdump kernels. - acpi_no_static_ssdt [HW,ACPI] + acpi_no_static_ssdt [HW,ACPI,EARLY] Disable installation of static SSDTs at early boot time By default, SSDTs contained in the RSDT/XSDT will be installed automatically and they will appear under @@ -151,7 +151,7 @@ Ignore the ACPI-based watchdog interface (WDAT) and let a native driver control the watchdog device instead. - acpi_rsdp= [ACPI,EFI,KEXEC] + acpi_rsdp= [ACPI,EFI,KEXEC,EARLY] Pass the RSDP address to the kernel, mostly used on machines running EFI runtime service to boot the second kernel for kdump. @@ -228,10 +228,10 @@ to assume that this machine's pmtimer latches its value and always returns good values. - acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode + acpi_sci= [HW,ACPI,EARLY] ACPI System Control Interrupt trigger mode Format: { level | edge | high | low } - acpi_skip_timer_override [HW,ACPI] + acpi_skip_timer_override [HW,ACPI,EARLY] Recognize and ignore IRQ0/pin2 Interrupt Override. For broken nForce2 BIOS resulting in XT-PIC timer. @@ -266,11 +266,11 @@ behave incorrectly in some ways with respect to system suspend and resume to be ignored (use wisely). - acpi_use_timer_override [HW,ACPI] + acpi_use_timer_override [HW,ACPI,EARLY] Use timer override. For some broken Nvidia NF5 boards that require a timer override, but don't have HPET - add_efi_memmap [EFI; X86] Include EFI memory map in + add_efi_memmap [EFI,X86,EARLY] Include EFI memory map in kernel's map of available physical RAM. agp= [AGP] @@ -307,7 +307,7 @@ do not want to use tracing_snapshot_alloc() as it needs to be done where GFP_KERNEL allocations are allowed. - allow_mismatched_32bit_el0 [ARM64] + allow_mismatched_32bit_el0 [ARM64,EARLY] Allow execve() of 32-bit applications and setting of the PER_LINUX32 personality on systems where only a strict subset of the CPUs support 32-bit EL0. When this @@ -351,7 +351,7 @@ This mode requires kvm-amd.avic=1. (Default when IOMMU HW support is present.) - amd_pstate= [X86] + amd_pstate= [X86,EARLY] disable Do not enable amd_pstate as the default scaling driver for the supported processors @@ -374,6 +374,11 @@ selects a performance level in this range and appropriate to the current workload. + amd_prefcore= + [X86] + disable + Disable amd-pstate preferred core. + amijoy.map= [HW,JOY] Amiga joystick support Map of devices attached to JOY0DAT and JOY1DAT Format: <a>,<b> @@ -391,7 +396,7 @@ not play well with APC CPU idle - disable it if you have APC and your system crashes randomly. - apic= [APIC,X86] Advanced Programmable Interrupt Controller + apic= [APIC,X86,EARLY] Advanced Programmable Interrupt Controller Change the output verbosity while booting Format: { quiet (default) | verbose | debug } Change the amount of debugging information output @@ -401,7 +406,7 @@ Format: apic=driver_name Examples: apic=bigsmp - apic_extnmi= [APIC,X86] External NMI delivery setting + apic_extnmi= [APIC,X86,EARLY] External NMI delivery setting Format: { bsp (default) | all | none } bsp: External NMI is delivered only to CPU 0 all: External NMIs are broadcast to all CPUs as a @@ -508,21 +513,22 @@ bert_disable [ACPI] Disable BERT OS support on buggy BIOSes. - bgrt_disable [ACPI][X86] + bgrt_disable [ACPI,X86,EARLY] Disable BGRT to avoid flickering OEM logo. blkdevparts= Manual partition parsing of block device(s) for embedded devices based on command line input. See Documentation/block/cmdline-partition.rst - boot_delay= Milliseconds to delay each printk during boot. + boot_delay= [KNL,EARLY] + Milliseconds to delay each printk during boot. Only works if CONFIG_BOOT_PRINTK_DELAY is enabled, and you may also have to specify "lpj=". Boot_delay values larger than 10 seconds (10000) are assumed erroneous and ignored. Format: integer - bootconfig [KNL] + bootconfig [KNL,EARLY] Extended command line options can be added to an initrd and this will cause the kernel to look for it. @@ -557,7 +563,7 @@ trust validation. format: { id:<keyid> | builtin } - cca= [MIPS] Override the kernel pages' cache coherency + cca= [MIPS,EARLY] Override the kernel pages' cache coherency algorithm. Accepted values range from 0 to 7 inclusive. See arch/mips/include/asm/pgtable-bits.h for platform specific values (SB1, Loongson3 and @@ -672,19 +678,13 @@ [X86-64] hpet,tsc clocksource.arm_arch_timer.evtstrm= - [ARM,ARM64] + [ARM,ARM64,EARLY] Format: <bool> Enable/disable the eventstream feature of the ARM architected timer so that code using WFE-based polling loops can be debugged more effectively on production systems. - clocksource.max_cswd_read_retries= [KNL] - Number of clocksource_watchdog() retries due to - external delays before the clock will be marked - unstable. Defaults to two retries, that is, - three attempts to read the clock under test. - clocksource.verify_n_cpus= [KNL] Limit the number of CPUs checked for clocksources marked with CLOCK_SOURCE_VERIFY_PERCPU that @@ -702,7 +702,7 @@ 10 seconds when built into the kernel. cma=nn[MG]@[start[MG][-end[MG]]] - [KNL,CMA] + [KNL,CMA,EARLY] Sets the size of kernel global memory area for contiguous memory allocations and optionally the placement constraint by the physical address range of @@ -711,7 +711,7 @@ kernel/dma/contiguous.c cma_pernuma=nn[MG] - [KNL,CMA] + [KNL,CMA,EARLY] Sets the size of kernel per-numa memory area for contiguous memory allocations. A value of 0 disables per-numa CMA altogether. And If this option is not @@ -722,7 +722,7 @@ they will fallback to the global default memory area. numa_cma=<node>:nn[MG][,<node>:nn[MG]] - [KNL,CMA] + [KNL,CMA,EARLY] Sets the size of kernel numa memory area for contiguous memory allocations. It will reserve CMA area for the specified node. @@ -739,7 +739,7 @@ a hypervisor. Default: yes - coherent_pool=nn[KMG] [ARM,KNL] + coherent_pool=nn[KMG] [ARM,KNL,EARLY] Sets the size of memory pool for coherent, atomic dma allocations, by default set to 256K. @@ -757,7 +757,7 @@ condev= [HW,S390] console device conmode= - con3215_drop= [S390] 3215 console drop mode. + con3215_drop= [S390,EARLY] 3215 console drop mode. Format: y|n|Y|N|1|0 When set to true, drop data on the 3215 console when the console buffer is full. In this case the @@ -863,7 +863,7 @@ kernel before the cpufreq driver probes. cpu_init_udelay=N - [X86] Delay for N microsec between assert and de-assert + [X86,EARLY] Delay for N microsec between assert and de-assert of APIC INIT to start processors. This delay occurs on every CPU online, such as boot, and resume from suspend. Default: 10000 @@ -883,7 +883,7 @@ kernel more unstable. crashkernel=size[KMG][@offset[KMG]] - [KNL] Using kexec, Linux can switch to a 'crash kernel' + [KNL,EARLY] Using kexec, Linux can switch to a 'crash kernel' upon panic. This parameter reserves the physical memory region [offset, offset + size] for that kernel image. If '@offset' is omitted, then a suitable offset @@ -954,10 +954,10 @@ Format: <port#>,<type> See also Documentation/input/devices/joystick-parport.rst - debug [KNL] Enable kernel debugging (events log level). + debug [KNL,EARLY] Enable kernel debugging (events log level). debug_boot_weak_hash - [KNL] Enable printing [hashed] pointers early in the + [KNL,EARLY] Enable printing [hashed] pointers early in the boot sequence. If enabled, we use a weak hash instead of siphash to hash pointers. Use this option if you are seeing instances of '(___ptrval___)') and need to see a @@ -974,10 +974,10 @@ will print _a_lot_ more information - normally only useful to lockdep developers. - debug_objects [KNL] Enable object debugging + debug_objects [KNL,EARLY] Enable object debugging debug_guardpage_minorder= - [KNL] When CONFIG_DEBUG_PAGEALLOC is set, this + [KNL,EARLY] When CONFIG_DEBUG_PAGEALLOC is set, this parameter allows control of the order of pages that will be intentionally kept free (and hence protected) by the buddy allocator. Bigger value increase the probability @@ -996,7 +996,7 @@ help tracking down these problems. debug_pagealloc= - [KNL] When CONFIG_DEBUG_PAGEALLOC is set, this parameter + [KNL,EARLY] When CONFIG_DEBUG_PAGEALLOC is set, this parameter enables the feature at boot time. By default, it is disabled and the system will work mostly the same as a kernel built without CONFIG_DEBUG_PAGEALLOC. @@ -1004,8 +1004,8 @@ useful to also enable the page_owner functionality. on: enable the feature - debugfs= [KNL] This parameter enables what is exposed to userspace - and debugfs internal clients. + debugfs= [KNL,EARLY] This parameter enables what is exposed to + userspace and debugfs internal clients. Format: { on, no-mount, off } on: All functions are enabled. no-mount: @@ -1084,7 +1084,7 @@ dhash_entries= [KNL] Set number of hash buckets for dentry cache. - disable_1tb_segments [PPC] + disable_1tb_segments [PPC,EARLY] Disables the use of 1TB hash page table segments. This causes the kernel to fall back to 256MB segments which can be useful when debugging issues that require an SLB @@ -1093,41 +1093,32 @@ disable= [IPV6] See Documentation/networking/ipv6.rst. - disable_radix [PPC] + disable_radix [PPC,EARLY] Disable RADIX MMU mode on POWER9 disable_tlbie [PPC] Disable TLBIE instruction. Currently does not work with KVM, with HASH MMU, or with coherent accelerators. - disable_cpu_apicid= [X86,APIC,SMP] - Format: <int> - The number of initial APIC ID for the - corresponding CPU to be disabled at boot, - mostly used for the kdump 2nd kernel to - disable BSP to wake up multiple CPUs without - causing system reset or hang due to sending - INIT from AP to BSP. - - disable_ddw [PPC/PSERIES] + disable_ddw [PPC/PSERIES,EARLY] Disable Dynamic DMA Window support. Use this to workaround buggy firmware. disable_ipv6= [IPV6] See Documentation/networking/ipv6.rst. - disable_mtrr_cleanup [X86] + disable_mtrr_cleanup [X86,EARLY] The kernel tries to adjust MTRR layout from continuous to discrete, to make X server driver able to add WB entry later. This parameter disables that. - disable_mtrr_trim [X86, Intel and AMD only] + disable_mtrr_trim [X86, Intel and AMD only,EARLY] By default the kernel will trim any uncacheable memory out of your available memory pool based on MTRR settings. This parameter disables that behavior, possibly causing your machine to run very slowly. - disable_timer_pin_1 [X86] + disable_timer_pin_1 [X86,EARLY] Disable PIN 1 of APIC timer Can be useful to work around chipset bugs. @@ -1150,6 +1141,26 @@ The filter can be disabled or changed to another driver later using sysfs. + reg_file_data_sampling= + [X86] Controls mitigation for Register File Data + Sampling (RFDS) vulnerability. RFDS is a CPU + vulnerability which may allow userspace to infer + kernel data values previously stored in floating point + registers, vector registers, or integer registers. + RFDS only affects Intel Atom processors. + + on: Turns ON the mitigation. + off: Turns OFF the mitigation. + + This parameter overrides the compile time default set + by CONFIG_MITIGATION_RFDS. Mitigation cannot be + disabled when other VERW based mitigations (like MDS) + are enabled. In order to disable RFDS mitigation all + VERW based mitigations need to be disabled. + + For details see: + Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst + driver_async_probe= [KNL] List of driver names to be probed asynchronously. * matches with all driver names. If * is specified, the @@ -1162,22 +1173,16 @@ panels may send no or incorrect EDID data sets. This parameter allows to specify an EDID data sets in the /lib/firmware directory that are used instead. - Generic built-in EDID data sets are used, if one of - edid/1024x768.bin, edid/1280x1024.bin, - edid/1680x1050.bin, or edid/1920x1080.bin is given - and no file with the same name exists. Details and - instructions how to build your own EDID data are - available in Documentation/admin-guide/edid.rst. An EDID - data set will only be used for a particular connector, - if its name and a colon are prepended to the EDID - name. Each connector may use a unique EDID data - set by separating the files with a comma. An EDID + An EDID data set will only be used for a particular + connector, if its name and a colon are prepended to + the EDID name. Each connector may use a unique EDID + data set by separating the files with a comma. An EDID data set with no connector name will be used for any connectors not explicitly specified. dscc4.setup= [NET] - dt_cpu_ftrs= [PPC] + dt_cpu_ftrs= [PPC,EARLY] Format: {"off" | "known"} Control how the dt_cpu_ftrs device-tree binding is used for CPU feature discovery and setup (if it @@ -1197,12 +1202,12 @@ Documentation/admin-guide/dynamic-debug-howto.rst for details. - early_ioremap_debug [KNL] + early_ioremap_debug [KNL,EARLY] Enable debug messages in early_ioremap support. This is useful for tracking down temporary early mappings which are not unmapped. - earlycon= [KNL] Output early console device and options. + earlycon= [KNL,EARLY] Output early console device and options. When used with no options, the early console is determined by stdout-path property in device tree's @@ -1338,7 +1343,7 @@ address must be provided, and the serial port must already be setup and configured. - earlyprintk= [X86,SH,ARM,M68k,S390] + earlyprintk= [X86,SH,ARM,M68k,S390,UM,EARLY] earlyprintk=vga earlyprintk=sclp earlyprintk=xen @@ -1396,7 +1401,7 @@ edd= [EDD] Format: {"off" | "on" | "skip[mbr]"} - efi= [EFI] + efi= [EFI,EARLY] Format: { "debug", "disable_early_pci_dma", "nochunk", "noruntime", "nosoftreserve", "novamap", "no_disable_early_pci_dma" } @@ -1417,13 +1422,13 @@ no_disable_early_pci_dma: Leave the busmaster bit set on all PCI bridges while in the EFI boot stub - efi_no_storage_paranoia [EFI; X86] + efi_no_storage_paranoia [EFI,X86,EARLY] Using this parameter you can use more than 50% of your efi variable storage. Use this parameter only if you are really sure that your UEFI does sane gc and fulfills the spec otherwise your board may brick. - efi_fake_mem= nn[KMG]@ss[KMG]:aa[,nn[KMG]@ss[KMG]:aa,..] [EFI; X86] + efi_fake_mem= nn[KMG]@ss[KMG]:aa[,nn[KMG]@ss[KMG]:aa,..] [EFI,X86,EARLY] Add arbitrary attribute to specific memory range by updating original EFI memory map. Region of memory which aa attribute is added to is @@ -1454,7 +1459,7 @@ eisa_irq_edge= [PARISC,HW] See header of drivers/parisc/eisa.c. - ekgdboc= [X86,KGDB] Allow early kernel console debugging + ekgdboc= [X86,KGDB,EARLY] Allow early kernel console debugging Format: ekgdboc=kbd This is designed to be used in conjunction with @@ -1469,13 +1474,13 @@ See comment before function elanfreq_setup() in arch/x86/kernel/cpu/cpufreq/elanfreq.c. - elfcorehdr=[size[KMG]@]offset[KMG] [PPC,SH,X86,S390] + elfcorehdr=[size[KMG]@]offset[KMG] [PPC,SH,X86,S390,EARLY] Specifies physical address of start of kernel core image elf header and optionally the size. Generally kexec loader will pass this option to capture kernel. See Documentation/admin-guide/kdump/kdump.rst for details. - enable_mtrr_cleanup [X86] + enable_mtrr_cleanup [X86,EARLY] The kernel tries to adjust MTRR layout from continuous to discrete, to make X server driver able to add WB entry later. This parameter enables that. @@ -1508,7 +1513,7 @@ Permit 'security.evm' to be updated regardless of current integrity status. - early_page_ext [KNL] Enforces page_ext initialization to earlier + early_page_ext [KNL,EARLY] Enforces page_ext initialization to earlier stages so cover more early boot allocations. Please note that as side effect some optimizations might be disabled to achieve that (e.g. parallelized @@ -1539,6 +1544,12 @@ Warning: use of this parameter will taint the kernel and may cause unknown problems. + fred= [X86-64] + Enable/disable Flexible Return and Event Delivery. + Format: { on | off } + on: enable FRED when it's present. + off: disable FRED, the default setting. + ftrace=[tracer] [FTRACE] will set and start the specified tracer as early as possible in order to facilitate early @@ -1600,7 +1611,7 @@ can be changed at run time by the max_graph_depth file in the tracefs tracing directory. default: 0 (no limit) - fw_devlink= [KNL] Create device links between consumer and supplier + fw_devlink= [KNL,EARLY] Create device links between consumer and supplier devices by scanning the firmware to infer the consumer/supplier relationships. This feature is especially useful when drivers are loaded as modules as @@ -1619,12 +1630,12 @@ rpm -- Like "on", but also use to order runtime PM. fw_devlink.strict=<bool> - [KNL] Treat all inferred dependencies as mandatory + [KNL,EARLY] Treat all inferred dependencies as mandatory dependencies. This only applies for fw_devlink=on|rpm. Format: <bool> fw_devlink.sync_state = - [KNL] When all devices that could probe have finished + [KNL,EARLY] When all devices that could probe have finished probing, this parameter controls what to do with devices that haven't yet received their sync_state() calls. @@ -1645,12 +1656,12 @@ gamma= [HW,DRM] - gart_fix_e820= [X86-64] disable the fix e820 for K8 GART + gart_fix_e820= [X86-64,EARLY] disable the fix e820 for K8 GART Format: off | on default: on gather_data_sampling= - [X86,INTEL] Control the Gather Data Sampling (GDS) + [X86,INTEL,EARLY] Control the Gather Data Sampling (GDS) mitigation. Gather Data Sampling is a hardware vulnerability which @@ -1748,7 +1759,18 @@ (that will set all pages holding image data during restoration read-only). - highmem=nn[KMG] [KNL,BOOT] forces the highmem zone to have an exact + hibernate.compressor= [HIBERNATION] Compression algorithm to be + used with hibernation. + Format: { lzo | lz4 } + Default: lzo + + lzo: Select LZO compression algorithm to + compress/decompress hibernation image. + + lz4: Select LZ4 compression algorithm to + compress/decompress hibernation image. + + highmem=nn[KMG] [KNL,BOOT,EARLY] forces the highmem zone to have an exact size of <nn>. This works even on boxes that have no highmem otherwise. This also works to reduce highmem size on bigger boxes. @@ -1759,7 +1781,7 @@ hlt [BUGS=ARM,SH] - hostname= [KNL] Set the hostname (aka UTS nodename). + hostname= [KNL,EARLY] Set the hostname (aka UTS nodename). Format: <string> This allows setting the system's hostname during early startup. This sets the name returned by gethostname. @@ -1804,7 +1826,7 @@ Documentation/admin-guide/mm/hugetlbpage.rst. Format: size[KMG] - hugetlb_cma= [HW,CMA] The size of a CMA area used for allocation + hugetlb_cma= [HW,CMA,EARLY] The size of a CMA area used for allocation of gigantic hugepages. Or using node format, the size of a CMA area per node can be specified. Format: nn[KMGTPE] or (node format) @@ -1850,9 +1872,10 @@ If specified, z/VM IUCV HVC accepts connections from listed z/VM user IDs only. - hv_nopvspin [X86,HYPER_V] Disables the paravirt spinlock optimizations - which allow the hypervisor to 'idle' the - guest on lock contention. + hv_nopvspin [X86,HYPER_V,EARLY] + Disables the paravirt spinlock optimizations + which allow the hypervisor to 'idle' the guest + on lock contention. i2c_bus= [HW] Override the default board specific I2C bus speed or register an additional I2C bus that is not @@ -1917,7 +1940,7 @@ Format: <io>[,<membase>[,<icn_id>[,<icn_id2>]]] - idle= [X86] + idle= [X86,EARLY] Format: idle=poll, idle=halt, idle=nomwait Poll forces a polling idle loop that can slightly improve the performance of waking up a idle CPU, but @@ -1973,7 +1996,7 @@ mode generally follows that for the NaN encoding, except where unsupported by hardware. - ignore_loglevel [KNL] + ignore_loglevel [KNL,EARLY] Ignore loglevel setting - this will print /all/ kernel messages to the console. Useful for debugging. We also add it as printk module parameter, so users @@ -2091,21 +2114,21 @@ unpacking being completed before device_ and late_ initcalls. - initrd= [BOOT] Specify the location of the initial ramdisk + initrd= [BOOT,EARLY] Specify the location of the initial ramdisk - initrdmem= [KNL] Specify a physical address and size from which to + initrdmem= [KNL,EARLY] Specify a physical address and size from which to load the initrd. If an initrd is compiled in or specified in the bootparams, it takes priority over this setting. Format: ss[KMG],nn[KMG] Default is 0, 0 - init_on_alloc= [MM] Fill newly allocated pages and heap objects with + init_on_alloc= [MM,EARLY] Fill newly allocated pages and heap objects with zeroes. Format: 0 | 1 Default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON. - init_on_free= [MM] Fill freed pages and heap objects with zeroes. + init_on_free= [MM,EARLY] Fill freed pages and heap objects with zeroes. Format: 0 | 1 Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON. @@ -2161,7 +2184,7 @@ 0 disables intel_idle and fall back on acpi_idle. 1 to 9 specify maximum depth of C-state. - intel_pstate= [X86] + intel_pstate= [X86,EARLY] disable Do not enable intel_pstate as the default scaling driver for the supported processors @@ -2205,7 +2228,7 @@ Allow per-logical-CPU P-State performance control limits using cpufreq sysfs interface - intremap= [X86-64, Intel-IOMMU] + intremap= [X86-64,Intel-IOMMU,EARLY] on enable Interrupt Remapping (default) off disable Interrupt Remapping nosid disable Source ID checking @@ -2217,7 +2240,7 @@ strict regions from userspace. relaxed - iommu= [X86] + iommu= [X86,EARLY] off force noforce @@ -2232,7 +2255,7 @@ nobypass [PPC/POWERNV] Disable IOMMU bypass, using IOMMU for PCI devices. - iommu.forcedac= [ARM64, X86] Control IOVA allocation for PCI devices. + iommu.forcedac= [ARM64,X86,EARLY] Control IOVA allocation for PCI devices. Format: { "0" | "1" } 0 - Try to allocate a 32-bit DMA address first, before falling back to the full range if needed. @@ -2240,7 +2263,7 @@ forcing Dual Address Cycle for PCI cards supporting greater than 32-bit addressing. - iommu.strict= [ARM64, X86, S390] Configure TLB invalidation behaviour + iommu.strict= [ARM64,X86,S390,EARLY] Configure TLB invalidation behaviour Format: { "0" | "1" } 0 - Lazy mode. Request that DMA unmap operations use deferred @@ -2256,7 +2279,7 @@ legacy driver-specific options takes precedence. iommu.passthrough= - [ARM64, X86] Configure DMA to bypass the IOMMU by default. + [ARM64,X86,EARLY] Configure DMA to bypass the IOMMU by default. Format: { "0" | "1" } 0 - Use IOMMU translation for DMA. 1 - Bypass the IOMMU for DMA. @@ -2266,7 +2289,7 @@ See comment before marvel_specify_io7 in arch/alpha/kernel/core_marvel.c. - io_delay= [X86] I/O delay method + io_delay= [X86,EARLY] I/O delay method 0x80 Standard port 0x80 based delay 0xed @@ -2279,28 +2302,28 @@ ip= [IP_PNP] See Documentation/admin-guide/nfs/nfsroot.rst. - ipcmni_extend [KNL] Extend the maximum number of unique System V + ipcmni_extend [KNL,EARLY] Extend the maximum number of unique System V IPC identifiers from 32,768 to 16,777,216. irqaffinity= [SMP] Set the default irq affinity mask The argument is a cpu list, as described above. irqchip.gicv2_force_probe= - [ARM, ARM64] + [ARM,ARM64,EARLY] Format: <bool> Force the kernel to look for the second 4kB page of a GICv2 controller even if the memory range exposed by the device tree is too small. irqchip.gicv3_nolpi= - [ARM, ARM64] + [ARM,ARM64,EARLY] Force the kernel to ignore the availability of LPIs (and by consequence ITSs). Intended for system that use the kernel as a bootloader, and thus want to let secondary kernels in charge of setting up LPIs. - irqchip.gicv3_pseudo_nmi= [ARM64] + irqchip.gicv3_pseudo_nmi= [ARM64,EARLY] Enables support for pseudo-NMIs in the kernel. This requires the kernel to be built with CONFIG_ARM64_PSEUDO_NMI. @@ -2445,7 +2468,7 @@ parameter KASAN will print report only for the first invalid access. - keep_bootcon [KNL] + keep_bootcon [KNL,EARLY] Do not unregister boot console at start. This is only useful for debugging when something happens in the window between unregistering the boot console and initializing @@ -2453,7 +2476,7 @@ keepinitrd [HW,ARM] See retain_initrd. - kernelcore= [KNL,X86,IA-64,PPC] + kernelcore= [KNL,X86,IA-64,PPC,EARLY] Format: nn[KMGTPE] | nn% | "mirror" This parameter specifies the amount of memory usable by the kernel for non-movable allocations. The requested @@ -2478,7 +2501,7 @@ for Movable pages. "nn[KMGTPE]", "nn%", and "mirror" are exclusive, so you cannot specify multiple forms. - kgdbdbgp= [KGDB,HW] kgdb over EHCI usb debug port. + kgdbdbgp= [KGDB,HW,EARLY] kgdb over EHCI usb debug port. Format: <Controller#>[,poll interval] The controller # is the number of the ehci usb debug port as it is probed via PCI. The poll interval is @@ -2499,7 +2522,7 @@ kms, kbd format: kms,kbd kms, kbd and serial format: kms,kbd,<ser_dev>[,baud] - kgdboc_earlycon= [KGDB,HW] + kgdboc_earlycon= [KGDB,HW,EARLY] If the boot console provides the ability to read characters and can work in polling mode, you can use this parameter to tell kgdb to use it as a backend @@ -2514,14 +2537,14 @@ blank and the first boot console that implements read() will be picked. - kgdbwait [KGDB] Stop kernel execution and enter the + kgdbwait [KGDB,EARLY] Stop kernel execution and enter the kernel debugger at the earliest opportunity. kmac= [MIPS] Korina ethernet MAC address. Configure the RouterBoard 532 series on-chip Ethernet adapter MAC address. - kmemleak= [KNL] Boot-time kmemleak enable/disable + kmemleak= [KNL,EARLY] Boot-time kmemleak enable/disable Valid arguments: on, off Default: on Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, @@ -2540,8 +2563,8 @@ See also Documentation/trace/kprobetrace.rst "Kernel Boot Parameter" section. - kpti= [ARM64] Control page table isolation of user - and kernel address spaces. + kpti= [ARM64,EARLY] Control page table isolation of + user and kernel address spaces. Default: enabled on cores which need mitigation. 0: force disabled 1: force enabled @@ -2618,7 +2641,8 @@ for NPT. kvm-arm.mode= - [KVM,ARM] Select one of KVM/arm64's modes of operation. + [KVM,ARM,EARLY] Select one of KVM/arm64's modes of + operation. none: Forcefully disable KVM. @@ -2638,22 +2662,22 @@ used with extreme caution. kvm-arm.vgic_v3_group0_trap= - [KVM,ARM] Trap guest accesses to GICv3 group-0 + [KVM,ARM,EARLY] Trap guest accesses to GICv3 group-0 system registers kvm-arm.vgic_v3_group1_trap= - [KVM,ARM] Trap guest accesses to GICv3 group-1 + [KVM,ARM,EARLY] Trap guest accesses to GICv3 group-1 system registers kvm-arm.vgic_v3_common_trap= - [KVM,ARM] Trap guest accesses to GICv3 common + [KVM,ARM,EARLY] Trap guest accesses to GICv3 common system registers kvm-arm.vgic_v4_enable= - [KVM,ARM] Allow use of GICv4 for direct injection of - LPIs. + [KVM,ARM,EARLY] Allow use of GICv4 for direct + injection of LPIs. - kvm_cma_resv_ratio=n [PPC] + kvm_cma_resv_ratio=n [PPC,EARLY] Reserves given percentage from system memory area for contiguous memory allocation for KVM hash pagetable allocation. @@ -2706,7 +2730,7 @@ (enabled). Disable by KVM if hardware lacks support for it. - l1d_flush= [X86,INTEL] + l1d_flush= [X86,INTEL,EARLY] Control mitigation for L1D based snooping vulnerability. Certain CPUs are vulnerable to an exploit against CPU @@ -2723,7 +2747,7 @@ on - enable the interface for the mitigation - l1tf= [X86] Control mitigation of the L1TF vulnerability on + l1tf= [X86,EARLY] Control mitigation of the L1TF vulnerability on affected CPUs The kernel PTE inversion protection is unconditionally @@ -2792,7 +2816,7 @@ l3cr= [PPC] - lapic [X86-32,APIC] Enable the local APIC even if BIOS + lapic [X86-32,APIC,EARLY] Enable the local APIC even if BIOS disabled it. lapic= [X86,APIC] Do not use TSC deadline @@ -2800,7 +2824,7 @@ back to the programmable timer unit in the LAPIC. Format: notscdeadline - lapic_timer_c2_ok [X86,APIC] trust the local apic timer + lapic_timer_c2_ok [X86,APIC,EARLY] trust the local apic timer in C2 power state. libata.dma= [LIBATA] DMA control @@ -2924,7 +2948,7 @@ lockd.nlm_udpport=M [NFS] Assign UDP port. Format: <integer> - lockdown= [SECURITY] + lockdown= [SECURITY,EARLY] { integrity | confidentiality } Enable the kernel lockdown feature. If set to integrity, kernel features that allow userland to @@ -3031,7 +3055,8 @@ logibm.irq= [HW,MOUSE] Logitech Bus Mouse Driver Format: <irq> - loglevel= All Kernel Messages with a loglevel smaller than the + loglevel= [KNL,EARLY] + All Kernel Messages with a loglevel smaller than the console loglevel will be printed to the console. It can also be changed with klogd or other programs. The loglevels are defined as follows: @@ -3045,13 +3070,15 @@ 6 (KERN_INFO) informational 7 (KERN_DEBUG) debug-level messages - log_buf_len=n[KMG] Sets the size of the printk ring buffer, - in bytes. n must be a power of two and greater - than the minimal size. The minimal size is defined - by LOG_BUF_SHIFT kernel config parameter. There is - also CONFIG_LOG_CPU_MAX_BUF_SHIFT config parameter - that allows to increase the default size depending on - the number of CPUs. See init/Kconfig for more details. + log_buf_len=n[KMG] [KNL,EARLY] + Sets the size of the printk ring buffer, in bytes. + n must be a power of two and greater than the + minimal size. The minimal size is defined by + LOG_BUF_SHIFT kernel config parameter. There + is also CONFIG_LOG_CPU_MAX_BUF_SHIFT config + parameter that allows to increase the default size + depending on the number of CPUs. See init/Kconfig + for more details. logo.nologo [FB] Disables display of the built-in Linux logo. This may be used to provide more screen space for @@ -3109,7 +3136,7 @@ max_addr=nn[KMG] [KNL,BOOT,IA-64] All physical memory greater than or equal to this physical address is ignored. - maxcpus= [SMP] Maximum number of processors that an SMP kernel + maxcpus= [SMP,EARLY] Maximum number of processors that an SMP kernel will bring up during bootup. maxcpus=n : n >= 0 limits the kernel to bring up 'n' processors. Surely after bootup you can bring up the other plugged cpu by executing @@ -3136,7 +3163,7 @@ Format: <first>,<last> Specifies range of consoles to be captured by the MDA. - mds= [X86,INTEL] + mds= [X86,INTEL,EARLY] Control mitigation for the Micro-architectural Data Sampling (MDS) vulnerability. @@ -3168,11 +3195,12 @@ For details see: Documentation/admin-guide/hw-vuln/mds.rst - mem=nn[KMG] [HEXAGON] Set the memory size. + mem=nn[KMG] [HEXAGON,EARLY] Set the memory size. Must be specified, otherwise memory size will be 0. - mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory - Amount of memory to be used in cases as follows: + mem=nn[KMG] [KNL,BOOT,EARLY] Force usage of a specific amount + of memory Amount of memory to be used in cases + as follows: 1 for test; 2 when the kernel is not able to see the whole system memory; @@ -3196,8 +3224,8 @@ if system memory of hypervisor is not sufficient. mem=nn[KMG]@ss[KMG] - [ARM,MIPS] - override the memory layout reported by - firmware. + [ARM,MIPS,EARLY] - override the memory layout + reported by firmware. Define a memory region of size nn[KMG] starting at ss[KMG]. Multiple different regions can be specified with @@ -3206,7 +3234,7 @@ mem=nopentium [BUGS=X86-32] Disable usage of 4MB pages for kernel memory. - memblock=debug [KNL] Enable memblock debug messages. + memblock=debug [KNL,EARLY] Enable memblock debug messages. memchunk=nn[KMG] [KNL,SH] Allow user to override the default size for @@ -3220,14 +3248,14 @@ option. See Documentation/admin-guide/mm/memory-hotplug.rst. - memmap=exactmap [KNL,X86] Enable setting of an exact + memmap=exactmap [KNL,X86,EARLY] Enable setting of an exact E820 memory map, as specified by the user. Such memmap=exactmap lines can be constructed based on BIOS output or other requirements. See the memmap=nn@ss option description. memmap=nn[KMG]@ss[KMG] - [KNL, X86, MIPS, XTENSA] Force usage of a specific region of memory. + [KNL, X86,MIPS,XTENSA,EARLY] Force usage of a specific region of memory. Region of memory to be used is from ss to ss+nn. If @ss[KMG] is omitted, it is equivalent to mem=nn[KMG], which limits max address to nn[KMG]. @@ -3237,11 +3265,11 @@ memmap=100M@2G,100M#3G,1G!1024G memmap=nn[KMG]#ss[KMG] - [KNL,ACPI] Mark specific memory as ACPI data. + [KNL,ACPI,EARLY] Mark specific memory as ACPI data. Region of memory to be marked is from ss to ss+nn. memmap=nn[KMG]$ss[KMG] - [KNL,ACPI] Mark specific memory as reserved. + [KNL,ACPI,EARLY] Mark specific memory as reserved. Region of memory to be reserved is from ss to ss+nn. Example: Exclude memory from 0x18690000-0x1869ffff memmap=64K$0x18690000 @@ -3251,14 +3279,14 @@ like Grub2, otherwise '$' and the following number will be eaten. - memmap=nn[KMG]!ss[KMG] + memmap=nn[KMG]!ss[KMG,EARLY] [KNL,X86] Mark specific memory as protected. Region of memory to be used, from ss to ss+nn. The memory region may be marked as e820 type 12 (0xc) and is NVDIMM or ADR memory. memmap=<size>%<offset>-<oldtype>+<newtype> - [KNL,ACPI] Convert memory within the specified region + [KNL,ACPI,EARLY] Convert memory within the specified region from <oldtype> to <newtype>. If "-<oldtype>" is left out, the whole region will be marked as <newtype>, even if previously unavailable. If "+<newtype>" is left @@ -3266,7 +3294,7 @@ specified as e820 types, e.g., 1 = RAM, 2 = reserved, 3 = ACPI, 12 = PRAM. - memory_corruption_check=0/1 [X86] + memory_corruption_check=0/1 [X86,EARLY] Some BIOSes seem to corrupt the first 64k of memory when doing things like suspend/resume. Setting this option will scan the memory @@ -3278,13 +3306,13 @@ affects the same memory, you can use memmap= to prevent the kernel from using that memory. - memory_corruption_check_size=size [X86] + memory_corruption_check_size=size [X86,EARLY] By default it checks for corruption in the low 64k, making this memory unavailable for normal use. Use this parameter to scan for corruption in more or less memory. - memory_corruption_check_period=seconds [X86] + memory_corruption_check_period=seconds [X86,EARLY] By default it checks for corruption every 60 seconds. Use this parameter to check at some other rate. 0 disables periodic checking. @@ -3308,7 +3336,7 @@ Note that even when enabled, there are a few cases where the feature is not effective. - memtest= [KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest + memtest= [KNL,X86,ARM,M68K,PPC,RISCV,EARLY] Enable memtest Format: <integer> default : 0 <disable> Specifies the number of memtest passes to be @@ -3320,9 +3348,7 @@ mem_encrypt= [X86-64] AMD Secure Memory Encryption (SME) control Valid arguments: on, off - Default (depends on kernel configuration option): - on (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y) - off (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n) + Default: off mem_encrypt=on: Activate SME mem_encrypt=off: Do not activate SME @@ -3376,7 +3402,7 @@ https://repo.or.cz/w/linux-2.6/mini2440.git mitigations= - [X86,PPC,S390,ARM64] Control optional mitigations for + [X86,PPC,S390,ARM64,EARLY] Control optional mitigations for CPU vulnerabilities. This is a set of curated, arch-independent options, each of which is an aggregation of existing arch-specific options. @@ -3398,7 +3424,9 @@ nospectre_bhb [ARM64] nospectre_v1 [X86,PPC] nospectre_v2 [X86,PPC,S390,ARM64] + reg_file_data_sampling=off [X86] retbleed=off [X86] + spec_rstack_overflow=off [X86] spec_store_bypass_disable=off [X86,PPC] spectre_v2_user=off [X86] srbds=off [X86,INTEL] @@ -3429,7 +3457,7 @@ retbleed=auto,nosmt [X86] mminit_loglevel= - [KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this + [KNL,EARLY] When CONFIG_DEBUG_MEMORY_INIT is set, this parameter allows control of the logging verbosity for the additional memory initialisation checks. A value of 0 disables mminit logging and a level of 4 will @@ -3437,7 +3465,7 @@ so loglevel=8 may also need to be specified. mmio_stale_data= - [X86,INTEL] Control mitigation for the Processor + [X86,INTEL,EARLY] Control mitigation for the Processor MMIO Stale Data vulnerabilities. Processor MMIO Stale Data is a class of @@ -3512,7 +3540,7 @@ mousedev.yres= [MOUSE] Vertical screen resolution, used for devices reporting absolute coordinates, such as tablets - movablecore= [KNL,X86,IA-64,PPC] + movablecore= [KNL,X86,IA-64,PPC,EARLY] Format: nn[KMGTPE] | nn% This parameter is the complement to kernelcore=, it specifies the amount of memory used for migratable @@ -3523,7 +3551,7 @@ that the amount of memory usable for all allocations is not too small. - movable_node [KNL] Boot-time switch to make hotplugable memory + movable_node [KNL,EARLY] Boot-time switch to make hotplugable memory NUMA nodes to be movable. This means that the memory of such nodes will be usable only for movable allocations which rules out almost all kernel @@ -3547,21 +3575,21 @@ [HW] Make the MicroTouch USB driver use raw coordinates ('y', default) or cooked coordinates ('n') - mtrr=debug [X86] + mtrr=debug [X86,EARLY] Enable printing debug information related to MTRR registers at boot time. - mtrr_chunk_size=nn[KMG] [X86] + mtrr_chunk_size=nn[KMG,X86,EARLY] used for mtrr cleanup. It is largest continuous chunk that could hold holes aka. UC entries. - mtrr_gran_size=nn[KMG] [X86] + mtrr_gran_size=nn[KMG,X86,EARLY] Used for mtrr cleanup. It is granularity of mtrr block. Default is 1. Large value could prevent small alignment from using up MTRRs. - mtrr_spare_reg_nr=n [X86] + mtrr_spare_reg_nr=n [X86,EARLY] Format: <integer> Range: 0,7 : spare reg number Default : 1 @@ -3747,27 +3775,23 @@ emulation library even if a 387 maths coprocessor is present. - no4lvl [RISCV] Disable 4-level and 5-level paging modes. Forces - kernel to use 3-level paging instead. + no4lvl [RISCV,EARLY] Disable 4-level and 5-level paging modes. + Forces kernel to use 3-level paging instead. - no5lvl [X86-64,RISCV] Disable 5-level paging mode. Forces + no5lvl [X86-64,RISCV,EARLY] Disable 5-level paging mode. Forces kernel to use 4-level paging instead. - noaliencache [MM, NUMA, SLAB] Disables the allocation of alien - caches in the slab allocator. Saves per-node memory, - but will impact performance. - noalign [KNL,ARM] - noaltinstr [S390] Disables alternative instructions patching - (CPU alternatives feature). + noaltinstr [S390,EARLY] Disables alternative instructions + patching (CPU alternatives feature). - noapic [SMP,APIC] Tells the kernel to not make use of any + noapic [SMP,APIC,EARLY] Tells the kernel to not make use of any IOAPICs that may be present in the system. noautogroup Disable scheduler automatic task group creation. - nocache [ARM] + nocache [ARM,EARLY] no_console_suspend [HW] Never suspend the console @@ -3785,13 +3809,13 @@ turn on/off it dynamically. no_debug_objects - [KNL] Disable object debugging + [KNL,EARLY] Disable object debugging nodsp [SH] Disable hardware DSP at boot time. - noefi Disable EFI runtime services support. + noefi [EFI,EARLY] Disable EFI runtime services support. - no_entry_flush [PPC] Don't flush the L1-D cache when entering the kernel. + no_entry_flush [PPC,EARLY] Don't flush the L1-D cache when entering the kernel. noexec [IA-64] @@ -3822,6 +3846,7 @@ real-time systems. no_hash_pointers + [KNL,EARLY] Force pointers printed to the console or buffers to be unhashed. By default, when a pointer is printed via %p format string, that pointer is "hashed", i.e. obscured @@ -3846,9 +3871,9 @@ the impact of the sleep instructions. This is also useful when using JTAG debugger. - nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings. + nohugeiomap [KNL,X86,PPC,ARM64,EARLY] Disable kernel huge I/O mappings. - nohugevmalloc [KNL,X86,PPC,ARM64] Disable kernel huge vmalloc mappings. + nohugevmalloc [KNL,X86,PPC,ARM64,EARLY] Disable kernel huge vmalloc mappings. nohz= [KNL] Boottime enable/disable dynamic ticks Valid arguments: on, off @@ -3870,13 +3895,13 @@ noinitrd [RAM] Tells the kernel not to load any configured initial RAM disk. - nointremap [X86-64, Intel-IOMMU] Do not enable interrupt + nointremap [X86-64,Intel-IOMMU,EARLY] Do not enable interrupt remapping. [Deprecated - use intremap=off] nointroute [IA-64] - noinvpcid [X86] Disable the INVPCID cpu feature. + noinvpcid [X86,EARLY] Disable the INVPCID cpu feature. noiotrap [SH] Disables trapped I/O port accesses. @@ -3887,19 +3912,19 @@ nojitter [IA-64] Disables jitter checking for ITC timers. - nokaslr [KNL] + nokaslr [KNL,EARLY] When CONFIG_RANDOMIZE_BASE is set, this disables kernel and module base offset ASLR (Address Space Layout Randomization). - no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page + no-kvmapf [X86,KVM,EARLY] Disable paravirtualized asynchronous page fault handling. - no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver + no-kvmclock [X86,KVM,EARLY] Disable paravirtualized KVM clock driver - nolapic [X86-32,APIC] Do not enable or use the local APIC. + nolapic [X86-32,APIC,EARLY] Do not enable or use the local APIC. - nolapic_timer [X86-32,APIC] Do not use the local APIC timer. + nolapic_timer [X86-32,APIC,EARLY] Do not use the local APIC timer. nomca [IA-64] Disable machine check abort handling @@ -3924,23 +3949,23 @@ shutdown the other cpus. Instead use the REBOOT_VECTOR irq. - nopat [X86] Disable PAT (page attribute table extension of + nopat [X86,EARLY] Disable PAT (page attribute table extension of pagetables) support. - nopcid [X86-64] Disable the PCID cpu feature. + nopcid [X86-64,EARLY] Disable the PCID cpu feature. nopku [X86] Disable Memory Protection Keys CPU feature found in some Intel CPUs. - nopti [X86-64] + nopti [X86-64,EARLY] Equivalent to pti=off - nopv= [X86,XEN,KVM,HYPER_V,VMWARE] + nopv= [X86,XEN,KVM,HYPER_V,VMWARE,EARLY] Disables the PV optimizations forcing the guest to run as generic guest with no PV drivers. Currently support XEN HVM, KVM, HYPER_V and VMWARE guest. - nopvspin [X86,XEN,KVM] + nopvspin [X86,XEN,KVM,EARLY] Disables the qspinlock slow path using PV optimizations which allow the hypervisor to 'idle' the guest on lock contention. @@ -3960,20 +3985,20 @@ This is required for the Braillex ib80-piezo Braille reader made by F.H. Papenmeier (Germany). - nosgx [X86-64,SGX] Disables Intel SGX kernel support. + nosgx [X86-64,SGX,EARLY] Disables Intel SGX kernel support. - nosmap [PPC] + nosmap [PPC,EARLY] Disable SMAP (Supervisor Mode Access Prevention) even if it is supported by processor. - nosmep [PPC64s] + nosmep [PPC64s,EARLY] Disable SMEP (Supervisor Mode Execution Prevention) even if it is supported by processor. - nosmp [SMP] Tells an SMP kernel to act as a UP kernel, + nosmp [SMP,EARLY] Tells an SMP kernel to act as a UP kernel, and disable the IO APIC. legacy for "maxcpus=0". - nosmt [KNL,MIPS,PPC,S390] Disable symmetric multithreading (SMT). + nosmt [KNL,MIPS,PPC,S390,EARLY] Disable symmetric multithreading (SMT). Equivalent to smt=1. [KNL,X86,PPC] Disable symmetric multithreading (SMT). @@ -3983,22 +4008,23 @@ nosoftlockup [KNL] Disable the soft-lockup detector. nospec_store_bypass_disable - [HW] Disable all mitigations for the Speculative Store Bypass vulnerability + [HW,EARLY] Disable all mitigations for the Speculative + Store Bypass vulnerability - nospectre_bhb [ARM64] Disable all mitigations for Spectre-BHB (branch + nospectre_bhb [ARM64,EARLY] Disable all mitigations for Spectre-BHB (branch history injection) vulnerability. System may allow data leaks with this option. - nospectre_v1 [X86,PPC] Disable mitigations for Spectre Variant 1 + nospectre_v1 [X86,PPC,EARLY] Disable mitigations for Spectre Variant 1 (bounds check bypass). With this option data leaks are possible in the system. - nospectre_v2 [X86,PPC_E500,ARM64] Disable all mitigations for - the Spectre variant 2 (indirect branch prediction) - vulnerability. System may allow data leaks with this - option. + nospectre_v2 [X86,PPC_E500,ARM64,EARLY] Disable all mitigations + for the Spectre variant 2 (indirect branch + prediction) vulnerability. System may allow data + leaks with this option. - no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES,RISCV] Disable + no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES,RISCV,EARLY] Disable paravirtualized steal time accounting. steal time is computed, but won't influence scheduler behaviour @@ -4008,7 +4034,7 @@ broken timer IRQ sources. no_uaccess_flush - [PPC] Don't flush the L1-D cache after accessing user data. + [PPC,EARLY] Don't flush the L1-D cache after accessing user data. novmcoredd [KNL,KDUMP] Disable device dump. Device dump allows drivers to @@ -4022,15 +4048,15 @@ is set. no-vmw-sched-clock - [X86,PV_OPS] Disable paravirtualized VMware scheduler - clock and use the default one. + [X86,PV_OPS,EARLY] Disable paravirtualized VMware + scheduler clock and use the default one. nowatchdog [KNL] Disable both lockup detectors, i.e. soft-lockup and NMI watchdog (hard-lockup). - nowb [ARM] + nowb [ARM,EARLY] - nox2apic [X86-64,APIC] Do not enable x2APIC mode. + nox2apic [X86-64,APIC,EARLY] Do not enable x2APIC mode. NOTE: this parameter will be ignored on systems with the LEGACY_XAPIC_DISABLED bit set in the @@ -4068,7 +4094,7 @@ purges which is reported from either PAL_VM_SUMMARY or SAL PALO. - nr_cpus= [SMP] Maximum number of processors that an SMP kernel + nr_cpus= [SMP,EARLY] Maximum number of processors that an SMP kernel could support. nr_cpus=n : n >= 1 limits the kernel to support 'n' processors. It could be larger than the number of already plugged CPU during bootup, later in @@ -4079,8 +4105,9 @@ nr_uarts= [SERIAL] maximum number of UARTs to be registered. - numa=off [KNL, ARM64, PPC, RISCV, SPARC, X86] Disable NUMA, Only - set up a single NUMA node spanning all memory. + numa=off [KNL, ARM64, PPC, RISCV, SPARC, X86, EARLY] + Disable NUMA, Only set up a single NUMA node + spanning all memory. numa_balancing= [KNL,ARM64,PPC,RISCV,S390,X86] Enable or disable automatic NUMA balancing. @@ -4091,7 +4118,7 @@ This can be set from sysctl after boot. See Documentation/admin-guide/sysctl/vm.rst for details. - ohci1394_dma=early [HW] enable debugging via the ohci1394 driver. + ohci1394_dma=early [HW,EARLY] enable debugging via the ohci1394 driver. See Documentation/core-api/debugging-via-ohci1394.rst for more info. @@ -4117,7 +4144,8 @@ Once locked, the boundary cannot be changed. 1 indicates lock status, 0 indicates unlock status. - oops=panic Always panic on oopses. Default is to just kill the + oops=panic [KNL,EARLY] + Always panic on oopses. Default is to just kill the process, but there is a small probability of deadlocking the machine. This will also cause panics on machine check exceptions. @@ -4133,13 +4161,13 @@ can be read from sysfs at: /sys/module/page_alloc/parameters/shuffle. - page_owner= [KNL] Boot-time page_owner enabling option. + page_owner= [KNL,EARLY] Boot-time page_owner enabling option. Storage of the information about who allocated each page is disabled in default. With this switch, we can turn it on. on: enable the feature - page_poison= [KNL] Boot-time parameter changing the state of + page_poison= [KNL,EARLY] Boot-time parameter changing the state of poisoning on the buddy allocator, available with CONFIG_PAGE_POISONING=y. off: turn off poisoning (default) @@ -4157,7 +4185,8 @@ timeout < 0: reboot immediately Format: <timeout> - panic_on_taint= Bitmask for conditionally calling panic() in add_taint() + panic_on_taint= [KNL,EARLY] + Bitmask for conditionally calling panic() in add_taint() Format: <hex>[,nousertaint] Hexadecimal bitmask representing the set of TAINT flags that will cause the kernel to panic when add_taint() is @@ -4313,7 +4342,7 @@ pcbit= [HW,ISDN] - pci=option[,option...] [PCI] various PCI subsystem options. + pci=option[,option...] [PCI,EARLY] various PCI subsystem options. Some options herein operate on a specific device or a set of devices (<pci_dev>). These are @@ -4582,7 +4611,8 @@ Format: { 0 | 1 } See arch/parisc/kernel/pdc_chassis.c - percpu_alloc= Select which percpu first chunk allocator to use. + percpu_alloc= [MM,EARLY] + Select which percpu first chunk allocator to use. Currently supported values are "embed" and "page". Archs may support subset or none of the selections. See comments in mm/percpu.c for details on each @@ -4644,6 +4674,11 @@ may be specified. Format: <port>,<port>.... + possible_cpus= [SMP,S390,X86] + Format: <unsigned int> + Set the number of possible CPUs, overriding the + regular discovery mechanisms (such as ACPI/FW, etc). + powersave=off [PPC] This option disables power saving features. It specifically disables cpuidle and sets the platform machine description specific power_save @@ -4651,12 +4686,12 @@ execution priority. ppc_strict_facility_enable - [PPC] This option catches any kernel floating point, + [PPC,ENABLE] This option catches any kernel floating point, Altivec, VSX and SPE outside of regions specifically allowed (eg kernel_enable_fpu()/kernel_disable_fpu()). There is some performance impact when enabling this. - ppc_tm= [PPC] + ppc_tm= [PPC,EARLY] Format: {"off"} Disable Hardware Transactional Memory @@ -4766,7 +4801,7 @@ [KNL] Number of legacy pty's. Overwrites compiled-in default number. - quiet [KNL] Disable most log messages + quiet [KNL,EARLY] Disable most log messages r128= [HW,DRM] @@ -4783,17 +4818,17 @@ ramdisk_start= [RAM] RAM disk image start address random.trust_cpu=off - [KNL] Disable trusting the use of the CPU's + [KNL,EARLY] Disable trusting the use of the CPU's random number generator (if available) to initialize the kernel's RNG. random.trust_bootloader=off - [KNL] Disable trusting the use of the a seed + [KNL,EARLY] Disable trusting the use of the a seed passed by the bootloader (if available) to initialize the kernel's RNG. randomize_kstack_offset= - [KNL] Enable or disable kernel stack offset + [KNL,EARLY] Enable or disable kernel stack offset randomization, which provides roughly 5 bits of entropy, frustrating memory corruption attacks that depend on stack address determinism or @@ -5034,6 +5069,11 @@ this kernel boot parameter, forcibly setting it to zero. + rcutree.enable_rcu_lazy= [KNL] + To save power, batch RCU callbacks and flush after + delay, memory pressure or callback list growing too + big. + rcuscale.gp_async= [KNL] Measure performance of asynchronous grace-period primitives such as call_rcu(). @@ -5484,7 +5524,7 @@ Run specified binary instead of /init from the ramdisk, used for early userspace startup. See initrd. - rdrand= [X86] + rdrand= [X86,EARLY] force - Override the decision by the kernel to hide the advertisement of RDRAND support (this affects certain AMD processors because of buggy BIOS @@ -5580,7 +5620,7 @@ them. If <base> is less than 0x10000, the region is assumed to be I/O ports; otherwise it is memory. - reservetop= [X86-32] + reservetop= [X86-32,EARLY] Format: nn[KMG] Reserves a hole at the top of the kernel virtual address space. @@ -5665,7 +5705,7 @@ [KNL] Disable ring 3 MONITOR/MWAIT feature on supported CPUs. - riscv_isa_fallback [RISCV] + riscv_isa_fallback [RISCV,EARLY] When CONFIG_RISCV_ISA_FALLBACK is not enabled, permit falling back to detecting extension support by parsing "riscv,isa" property on devicetree systems when the @@ -5674,13 +5714,14 @@ ro [KNL] Mount root device read-only on boot - rodata= [KNL] + rodata= [KNL,EARLY] on Mark read-only kernel memory as read-only (default). off Leave read-only kernel memory writable for debugging. full Mark read-only kernel memory and aliases as read-only [arm64] rockchip.usb_uart + [EARLY] Enable the uart passthrough on the designated usb port on Rockchip SoCs. When active, the signals of the debug-uart get routed to the D+ and D- pins of the usb @@ -5741,7 +5782,7 @@ sa1100ir [NET] See drivers/net/irda/sa1100_ir.c. - sched_verbose [KNL] Enables verbose scheduler debug messages. + sched_verbose [KNL,EARLY] Enables verbose scheduler debug messages. schedstats= [KNL,X86] Enable or disable scheduled statistics. Allowed values are enable and disable. This feature @@ -5856,7 +5897,7 @@ non-zero "wait" parameter. See weight_single and weight_many. - skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate + skew_tick= [KNL,EARLY] Offset the periodic timer tick per cpu to mitigate xtime_lock contention on larger systems, and/or RCU lock contention on all systems with CONFIG_MAXSMP set. Format: { "0" | "1" } @@ -5895,65 +5936,58 @@ simeth= [IA-64] simscsi= - slram= [HW,MTD] - - slab_merge [MM] - Enable merging of slabs with similar size when the - kernel is built without CONFIG_SLAB_MERGE_DEFAULT. - - slab_nomerge [MM] - Disable merging of slabs with similar size. May be - necessary if there is some reason to distinguish - allocs to different slabs, especially in hardened - environments where the risk of heap overflows and - layout control by attackers can usually be - frustrated by disabling merging. This will reduce - most of the exposure of a heap attack to a single - cache (risks via metadata attacks are mostly - unchanged). Debug options disable merging on their - own. - For more information see Documentation/mm/slub.rst. - - slab_max_order= [MM, SLAB] - Determines the maximum allowed order for slabs. - A high setting may cause OOMs due to memory - fragmentation. Defaults to 1 for systems with - more than 32MB of RAM, 0 otherwise. - - slub_debug[=options[,slabs][;[options[,slabs]]...] [MM, SLUB] - Enabling slub_debug allows one to determine the + slab_debug[=options[,slabs][;[options[,slabs]]...] [MM] + Enabling slab_debug allows one to determine the culprit if slab objects become corrupted. Enabling - slub_debug can create guard zones around objects and + slab_debug can create guard zones around objects and may poison objects when not in use. Also tracks the last alloc / free. For more information see Documentation/mm/slub.rst. + (slub_debug legacy name also accepted for now) - slub_max_order= [MM, SLUB] + slab_max_order= [MM] Determines the maximum allowed order for slabs. A high setting may cause OOMs due to memory fragmentation. For more information see Documentation/mm/slub.rst. + (slub_max_order legacy name also accepted for now) + + slab_merge [MM] + Enable merging of slabs with similar size when the + kernel is built without CONFIG_SLAB_MERGE_DEFAULT. + (slub_merge legacy name also accepted for now) - slub_min_objects= [MM, SLUB] + slab_min_objects= [MM] The minimum number of objects per slab. SLUB will - increase the slab order up to slub_max_order to + increase the slab order up to slab_max_order to generate a sufficiently large slab able to contain the number of objects indicated. The higher the number of objects the smaller the overhead of tracking slabs and the less frequently locks need to be acquired. For more information see Documentation/mm/slub.rst. + (slub_min_objects legacy name also accepted for now) - slub_min_order= [MM, SLUB] + slab_min_order= [MM] Determines the minimum page order for slabs. Must be - lower than slub_max_order. - For more information see Documentation/mm/slub.rst. + lower or equal to slab_max_order. For more information see + Documentation/mm/slub.rst. + (slub_min_order legacy name also accepted for now) - slub_merge [MM, SLUB] - Same with slab_merge. + slab_nomerge [MM] + Disable merging of slabs with similar size. May be + necessary if there is some reason to distinguish + allocs to different slabs, especially in hardened + environments where the risk of heap overflows and + layout control by attackers can usually be + frustrated by disabling merging. This will reduce + most of the exposure of a heap attack to a single + cache (risks via metadata attacks are mostly + unchanged). Debug options disable merging on their + own. + For more information see Documentation/mm/slub.rst. + (slub_nomerge legacy name also accepted for now) - slub_nomerge [MM, SLUB] - Same with slab_nomerge. This is supported for legacy. - See slab_nomerge for more information. + slram= [HW,MTD] smart2= [HW] Format: <io1>[,<io2>[,...,<io8>]] @@ -5987,10 +6021,10 @@ 1: Fast pin select (default) 2: ATC IRMode - smt= [KNL,MIPS,S390] Set the maximum number of threads (logical - CPUs) to use per physical CPU on systems capable of - symmetric multithreading (SMT). Will be capped to the - actual hardware limit. + smt= [KNL,MIPS,S390,EARLY] Set the maximum number of threads + (logical CPUs) to use per physical CPU on systems + capable of symmetric multithreading (SMT). Will + be capped to the actual hardware limit. Format: <integer> Default: -1 (no limit) @@ -6012,7 +6046,7 @@ sonypi.*= [HW] Sony Programmable I/O Control Device driver See Documentation/admin-guide/laptops/sonypi.rst - spectre_v2= [X86] Control mitigation of Spectre variant 2 + spectre_v2= [X86,EARLY] Control mitigation of Spectre variant 2 (indirect branch speculation) vulnerability. The default operation protects the kernel from user space attacks. @@ -6027,8 +6061,8 @@ Selecting 'on' will, and 'auto' may, choose a mitigation method at run time according to the CPU, the available microcode, the setting of the - CONFIG_RETPOLINE configuration option, and the - compiler with which the kernel was built. + CONFIG_MITIGATION_RETPOLINE configuration option, + and the compiler with which the kernel was built. Selecting 'on' will also enable the mitigation against user space to user space task attacks. @@ -6092,7 +6126,7 @@ spectre_v2_user=auto. spec_rstack_overflow= - [X86] Control RAS overflow mitigation on AMD Zen CPUs + [X86,EARLY] Control RAS overflow mitigation on AMD Zen CPUs off - Disable mitigation microcode - Enable microcode mitigation only @@ -6103,7 +6137,7 @@ (cloud-specific mitigation) spec_store_bypass_disable= - [HW] Control Speculative Store Bypass (SSB) Disable mitigation + [HW,EARLY] Control Speculative Store Bypass (SSB) Disable mitigation (Speculative Store Bypass vulnerability) Certain CPUs are vulnerable to an exploit against a @@ -6199,7 +6233,7 @@ #DB exception for bus lock is triggered only when CPL > 0. - srbds= [X86,INTEL] + srbds= [X86,INTEL,EARLY] Control the Special Register Buffer Data Sampling (SRBDS) mitigation. @@ -6286,7 +6320,7 @@ srcutree.convert_to_big must have the 0x10 bit set for contention-based conversions to occur. - ssbd= [ARM64,HW] + ssbd= [ARM64,HW,EARLY] Speculative Store Bypass Disable control On CPUs that are vulnerable to the Speculative @@ -6310,7 +6344,7 @@ growing up) the main stack are reserved for no other mapping. Default value is 256 pages. - stack_depot_disable= [KNL] + stack_depot_disable= [KNL,EARLY] Setting this to true through kernel command line will disable the stack depot thereby saving the static memory consumed by the stack hash table. By default this is set @@ -6349,12 +6383,12 @@ be used to filter out binaries which have not yet been made aware of AT_MINSIGSTKSZ. - stress_hpt [PPC] + stress_hpt [PPC,EARLY] Limits the number of kernel HPT entries in the hash page table to increase the rate of hash page table faults on kernel addresses. - stress_slb [PPC] + stress_slb [PPC,EARLY] Limits the number of kernel SLB entries, and flushes them frequently to increase the rate of SLB faults on kernel addresses. @@ -6414,7 +6448,7 @@ This parameter controls use of the Protected Execution Facility on pSeries. - swiotlb= [ARM,IA-64,PPC,MIPS,X86] + swiotlb= [ARM,IA-64,PPC,MIPS,X86,EARLY] Format: { <int> [,<int>] | force | noforce } <int> -- Number of I/O TLB slabs <int> -- Second integer after comma. Number of swiotlb @@ -6424,7 +6458,7 @@ wouldn't be automatically used by the kernel noforce -- Never use bounce buffers (for debugging) - switches= [HW,M68k] + switches= [HW,M68k,EARLY] sysctl.*= [KNL] Set a sysctl parameter, right before loading the init @@ -6483,11 +6517,11 @@ <deci-seconds>: poll all this frequency 0: no polling (default) - threadirqs [KNL] + threadirqs [KNL,EARLY] Force threading of all interrupt handlers except those marked explicitly IRQF_NO_THREAD. - topology= [S390] + topology= [S390,EARLY] Format: {off | on} Specify if the kernel should make use of the cpu topology information if the hardware supports this. @@ -6728,7 +6762,7 @@ can be overridden by a later tsc=nowatchdog. A console message will flag any such suppression or overriding. - tsc_early_khz= [X86] Skip early TSC calibration and use the given + tsc_early_khz= [X86,EARLY] Skip early TSC calibration and use the given value instead. Useful when the early TSC frequency discovery procedure is not reliable, such as on overclocked systems with CPUID.16h support and partial CPUID.15h support. @@ -6763,7 +6797,7 @@ See Documentation/admin-guide/hw-vuln/tsx_async_abort.rst for more details. - tsx_async_abort= [X86,INTEL] Control mitigation for the TSX Async + tsx_async_abort= [X86,INTEL,EARLY] Control mitigation for the TSX Async Abort (TAA) vulnerability. Similar to Micro-architectural Data Sampling (MDS) @@ -6829,7 +6863,7 @@ unknown_nmi_panic [X86] Cause panic on unknown NMI. - unwind_debug [X86-64] + unwind_debug [X86-64,EARLY] Enable unwinder debug output. This can be useful for debugging certain unwinder error conditions, including corrupt stacks and @@ -7019,7 +7053,7 @@ Example: user_debug=31 userpte= - [X86] Flags controlling user PTE allocations. + [X86,EARLY] Flags controlling user PTE allocations. nohigh = do not allocate PTE pages in HIGHMEM regardless of setting @@ -7048,7 +7082,7 @@ vector= [IA-64,SMP] vector=percpu: enable percpu vector domain - video= [FB] Frame buffer configuration + video= [FB,EARLY] Frame buffer configuration See Documentation/fb/modedb.rst. video.brightness_switch_enabled= [ACPI] @@ -7096,13 +7130,13 @@ P Enable page structure init time poisoning - Disable all of the above options - vmalloc=nn[KMG] [KNL,BOOT] Forces the vmalloc area to have an exact - size of <nn>. This can be used to increase the - minimum size (128MB on x86). It can also be used to - decrease the size and leave more room for directly - mapped kernel RAM. + vmalloc=nn[KMG] [KNL,BOOT,EARLY] Forces the vmalloc area to have an + exact size of <nn>. This can be used to increase + the minimum size (128MB on x86). It can also be + used to decrease the size and leave more room + for directly mapped kernel RAM. - vmcp_cma=nn[MG] [KNL,S390] + vmcp_cma=nn[MG] [KNL,S390,EARLY] Sets the memory size reserved for contiguous memory allocations for the vmcp device driver. @@ -7115,7 +7149,7 @@ vmpoff= [KNL,S390] Perform z/VM CP command after power off. Format: <command> - vsyscall= [X86-64] + vsyscall= [X86-64,EARLY] Controls the behavior of vsyscalls (i.e. calls to fixed addresses of 0xffffffffff600x00 from legacy code). Most statically-linked binaries and older @@ -7225,6 +7259,15 @@ threshold repeatedly. They are likely good candidates for using WQ_UNBOUND workqueues instead. + workqueue.cpu_intensive_warning_thresh=<uint> + If CONFIG_WQ_CPU_INTENSIVE_REPORT is set, the kernel + will report the work functions which violate the + intensive_threshold_us repeatedly. In order to prevent + spurious warnings, start printing only after a work + function has violated this threshold number of times. + + The default is 4 times. 0 disables the warning. + workqueue.power_efficient Per-cpu workqueues are generally preferred because they show better performance thanks to cache @@ -7263,13 +7306,13 @@ When enabled, memory and cache locality will be impacted. - writecombine= [LOONGARCH] Control the MAT (Memory Access Type) of - ioremap_wc(). + writecombine= [LOONGARCH,EARLY] Control the MAT (Memory Access + Type) of ioremap_wc(). on - Enable writecombine, use WUC for ioremap_wc() off - Disable writecombine, use SUC for ioremap_wc() - x2apic_phys [X86-64,APIC] Use x2apic physical mode instead of + x2apic_phys [X86-64,APIC,EARLY] Use x2apic physical mode instead of default x2apic cluster mode on platforms supporting x2apic. @@ -7280,7 +7323,7 @@ save/restore/migration must be enabled to handle larger domains. - xen_emul_unplug= [HW,X86,XEN] + xen_emul_unplug= [HW,X86,XEN,EARLY] Unplug Xen emulated devices Format: [unplug0,][unplug1] ide-disks -- unplug primary master IDE devices @@ -7292,17 +7335,17 @@ the unplug protocol never -- do not unplug even if version check succeeds - xen_legacy_crash [X86,XEN] + xen_legacy_crash [X86,XEN,EARLY] Crash from Xen panic notifier, without executing late panic() code such as dumping handler. - xen_msr_safe= [X86,XEN] + xen_msr_safe= [X86,XEN,EARLY] Format: <bool> Select whether to always use non-faulting (safe) MSR access functions when running as Xen PV guest. The default value is controlled by CONFIG_XEN_PV_MSR_SAFE. - xen_nopvspin [X86,XEN] + xen_nopvspin [X86,XEN,EARLY] Disables the qspinlock slowpath using Xen PV optimizations. This parameter is obsoleted by "nopvspin" parameter, which has equivalent effect for XEN platform. @@ -7314,7 +7357,7 @@ has equivalent effect for XEN platform. xen_no_vector_callback - [KNL,X86,XEN] Disable the vector callback for Xen + [KNL,X86,XEN,EARLY] Disable the vector callback for Xen event channel interrupts. xen_scrub_pages= [XEN] @@ -7323,7 +7366,7 @@ with /sys/devices/system/xen_memory/xen_memory0/scrub_pages. Default value controlled with CONFIG_XEN_SCRUB_PAGES_DEFAULT. - xen_timer_slop= [X86-64,XEN] + xen_timer_slop= [X86-64,XEN,EARLY] Set the timer slop (in nanoseconds) for the virtual Xen timers (default is 100000). This adjusts the minimum delta of virtualized Xen timers, where lower values @@ -7376,7 +7419,7 @@ host controller quirks. Meaning of each bit can be consulted in header drivers/usb/host/xhci.h. - xmon [PPC] + xmon [PPC,EARLY] Format: { early | on | rw | ro | off } Controls if xmon debugger is enabled. Default is off. Passing only "xmon" is equivalent to "xmon=early". diff --git a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst index 993c2a05f5ee..b6aeae3327ce 100644 --- a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst +++ b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst @@ -243,13 +243,9 @@ To reduce its OS jitter, do any of the following: 3. Do any of the following needed to avoid jitter that your application cannot tolerate: - a. Build your kernel with CONFIG_SLUB=y rather than - CONFIG_SLAB=y, thus avoiding the slab allocator's periodic - use of each CPU's workqueues to run its cache_reap() - function. - b. Avoid using oprofile, thus avoiding OS jitter from + a. Avoid using oprofile, thus avoiding OS jitter from wq_sync_buffer(). - c. Limit your CPU frequency so that a CPU-frequency + b. Limit your CPU frequency so that a CPU-frequency governor is not required, possibly enlisting the aid of special heatsinks or other cooling technologies. If done correctly, and if you CPU architecture permits, you should @@ -259,7 +255,7 @@ To reduce its OS jitter, do any of the following: WARNING: Please check your CPU specifications to make sure that this is safe on your particular system. - d. As of v3.18, Christoph Lameter's on-demand vmstat workers + c. As of v3.18, Christoph Lameter's on-demand vmstat workers commit prevents OS jitter due to vmstat_update() on CONFIG_SMP=y systems. Before v3.18, is not possible to entirely get rid of the OS jitter, but you can @@ -274,7 +270,7 @@ To reduce its OS jitter, do any of the following: (based on an earlier one from Gilad Ben-Yossef) that reduces or even eliminates vmstat overhead for some workloads at https://lore.kernel.org/r/00000140e9dfd6bd-40db3d4f-c1be-434f-8132-7820f81bb586-000000@email.amazonses.com. - e. If running on high-end powerpc servers, build with + d. If running on high-end powerpc servers, build with CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS daemon from running on each CPU every second or so. (This will require editing Kconfig files and will defeat @@ -282,12 +278,12 @@ To reduce its OS jitter, do any of the following: due to the rtas_event_scan() function. WARNING: Please check your CPU specifications to make sure that this is safe on your particular system. - f. If running on Cell Processor, build your kernel with + e. If running on Cell Processor, build your kernel with CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from spu_gov_work(). WARNING: Please check your CPU specifications to make sure that this is safe on your particular system. - g. If running on PowerMAC, build your kernel with + f. If running on PowerMAC, build your kernel with CONFIG_PMAC_RACKMETER=n to disable the CPU-meter, avoiding OS jitter from rackmeter_do_timer(). diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index 9eb26014d34b..1e0d101b020a 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -300,8 +300,8 @@ platforms. The AMD P-States mechanism is the more performance and energy efficiency frequency management method on AMD processors. -AMD Pstate Driver Operation Modes -================================= +``amd-pstate`` Driver Operation Modes +====================================== ``amd_pstate`` CPPC has 3 operation modes: autonomous (active) mode, non-autonomous (passive) mode and guided autonomous (guided) mode. @@ -353,6 +353,48 @@ is activated. In this mode, driver requests minimum and maximum performance level and the platform autonomously selects a performance level in this range and appropriate to the current workload. +``amd-pstate`` Preferred Core +================================= + +The core frequency is subjected to the process variation in semiconductors. +Not all cores are able to reach the maximum frequency respecting the +infrastructure limits. Consequently, AMD has redefined the concept of +maximum frequency of a part. This means that a fraction of cores can reach +maximum frequency. To find the best process scheduling policy for a given +scenario, OS needs to know the core ordering informed by the platform through +highest performance capability register of the CPPC interface. + +``amd-pstate`` preferred core enables the scheduler to prefer scheduling on +cores that can achieve a higher frequency with lower voltage. The preferred +core rankings can dynamically change based on the workload, platform conditions, +thermals and ageing. + +The priority metric will be initialized by the ``amd-pstate`` driver. The ``amd-pstate`` +driver will also determine whether or not ``amd-pstate`` preferred core is +supported by the platform. + +``amd-pstate`` driver will provide an initial core ordering when the system boots. +The platform uses the CPPC interfaces to communicate the core ranking to the +operating system and scheduler to make sure that OS is choosing the cores +with highest performance firstly for scheduling the process. When ``amd-pstate`` +driver receives a message with the highest performance change, it will +update the core ranking and set the cpu's priority. + +``amd-pstate`` Preferred Core Switch +===================================== +Kernel Parameters +----------------- + +``amd-pstate`` peferred core`` has two states: enable and disable. +Enable/disable states can be chosen by different kernel parameters. +Default enable ``amd-pstate`` preferred core. + +``amd_prefcore=disable`` + +For systems that support ``amd-pstate`` preferred core, the core rankings will +always be advertised by the platform. But OS can choose to ignore that via the +kernel parameter ``amd_prefcore=disable``. + User Space Interface in ``sysfs`` - General =========================================== @@ -385,6 +427,19 @@ control its functionality at the system level. They are located in the to the operation mode represented by that string - or to be unregistered in the "disable" case. +``prefcore`` + Preferred core state of the driver: "enabled" or "disabled". + + "enabled" + Enable the ``amd-pstate`` preferred core. + + "disabled" + Disable the ``amd-pstate`` preferred core + + + This attribute is read-only to check the state of preferred core set + by the kernel parameter. + ``cpupower`` tool support for ``amd-pstate`` =============================================== diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 396091651955..7250c0542828 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -206,6 +206,11 @@ Will increase power usage. Default: 0 (off) +mem_pcpu_rsv +------------ + +Per-cpu reserved forward alloc cache size in page units. Default 1MB per CPU. + rmem_default ------------ diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst index 92a8a07f5c43..f92551539e8a 100644 --- a/Documentation/admin-guide/tainted-kernels.rst +++ b/Documentation/admin-guide/tainted-kernels.rst @@ -34,7 +34,7 @@ name of the command ('Comm:') that triggered the event:: You'll find a 'Not tainted: ' there if the kernel was not tainted at the time of the event; if it was, then it will print 'Tainted: ' and characters -either letters or blanks. In above example it looks like this:: +either letters or blanks. In the example above it looks like this:: Tainted: P W O @@ -52,7 +52,7 @@ At runtime, you can query the tainted state by reading tainted; any other number indicates the reasons why it is. The easiest way to decode that number is the script ``tools/debugging/kernel-chktaint``, which your distribution might ship as part of a package called ``linux-tools`` or -``kernel-tools``; if it doesn't you can download the script from +``kernel-tools``; if it doesn't, you can download the script from `git.kernel.org <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/tools/debugging/kernel-chktaint>`_ and execute it with ``sh kernel-chktaint``, which would print something like this on the machine that had the statements in the logs that were quoted earlier:: diff --git a/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst b/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst new file mode 100644 index 000000000000..58211840ac6f --- /dev/null +++ b/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst @@ -0,0 +1,1952 @@ +.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) +.. [see the bottom of this file for redistribution information] + +========================================= +How to verify bugs and bisect regressions +========================================= + +This document describes how to check if some Linux kernel problem occurs in code +currently supported by developers -- to then explain how to locate the change +causing the issue, if it is a regression (e.g. did not happen with earlier +versions). + +The text aims at people running kernels from mainstream Linux distributions on +commodity hardware who want to report a kernel bug to the upstream Linux +developers. Despite this intent, the instructions work just as well for users +who are already familiar with building their own kernels: they help avoid +mistakes occasionally made even by experienced developers. + +.. + Note: if you see this note, you are reading the text's source file. You + might want to switch to a rendered version: it makes it a lot easier to + read and navigate this document -- especially when you want to look something + up in the reference section, then jump back to where you left off. +.. + Find the latest rendered version of this text here: + https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.rst.html + +The essence of the process (aka 'TL;DR') +======================================== + +*[If you are new to building or bisecting Linux, ignore this section and head +over to the* ":ref:`step-by-step guide<introguide_bissbs>`" *below. It utilizes +the same commands as this section while describing them in brief fashion. The +steps are nevertheless easy to follow and together with accompanying entries +in a reference section mention many alternatives, pitfalls, and additional +aspects, all of which might be essential in your present case.]* + +**In case you want to check if a bug is present in code currently supported by +developers**, execute just the *preparations* and *segment 1*; while doing so, +consider the newest Linux kernel you regularly use to be the 'working' kernel. +In the following example that's assumed to be 6.0.13, which is why the sources +of v6.0 will be used to prepare the .config file. + +**In case you face a regression**, follow the steps at least till the end of +*segment 2*. Then you can submit a preliminary report -- or continue with +*segment 3*, which describes how to perform a bisection needed for a +full-fledged regression report. In the following example 6.0.13 is assumed to be +the 'working' kernel and 6.1.5 to be the first 'broken', which is why v6.0 +will be considered the 'good' release and used to prepare the .config file. + +* **Preparations**: set up everything to build your own kernels:: + + # * Remove any software that depends on externally maintained kernel modules + # or builds any automatically during bootup. + # * Ensure Secure Boot permits booting self-compiled Linux kernels. + # * If you are not already running the 'working' kernel, reboot into it. + # * Install compilers and everything else needed for building Linux. + # * Ensure to have 15 Gigabyte free space in your home directory. + git clone -o mainline --no-checkout \ + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ~/linux/ + cd ~/linux/ + git remote add -t master stable \ + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git + git checkout --detach v6.0 + # * Hint: if you used an existing clone, ensure no stale .config is around. + make olddefconfig + # * Ensure the former command picked the .config of the 'working' kernel. + # * Connect external hardware (USB keys, tokens, ...), start a VM, bring up + # VPNs, mount network shares, and briefly try the feature that is broken. + yes '' | make localmodconfig + ./scripts/config --set-str CONFIG_LOCALVERSION '-local' + ./scripts/config -e CONFIG_LOCALVERSION_AUTO + # * Note, when short on storage space, check the guide for an alternative: + ./scripts/config -d DEBUG_INFO_NONE -e KALLSYMS_ALL -e DEBUG_KERNEL \ + -e DEBUG_INFO -e DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT -e KALLSYMS + # * Hint: at this point you might want to adjust the build configuration; + # you'll have to, if you are running Debian. + make olddefconfig + cp .config ~/kernel-config-working + +* **Segment 1**: build a kernel from the latest mainline codebase. + + This among others checks if the problem was fixed already and which developers + later need to be told about the problem; in case of a regression, this rules + out a .config change as root of the problem. + + a) Checking out latest mainline code:: + + cd ~/linux/ + git checkout --force --detach mainline/master + + b) Build, install, and boot a kernel:: + + cp ~/kernel-config-working .config + make olddefconfig + make -j $(nproc --all) + # * Make sure there is enough disk space to hold another kernel: + df -h /boot/ /lib/modules/ + # * Note: on Arch Linux, its derivatives and a few other distributions + # the following commands will do nothing at all or only part of the + # job. See the step-by-step guide for further details. + command -v installkernel && sudo make modules_install install + # * Check how much space your self-built kernel actually needs, which + # enables you to make better estimates later: + du -ch /boot/*$(make -s kernelrelease)* | tail -n 1 + du -sh /lib/modules/$(make -s kernelrelease)/ + # * Hint: the output of the following command will help you pick the + # right kernel from the boot menu: + make -s kernelrelease | tee -a ~/kernels-built + reboot + # * Once booted, ensure you are running the kernel you just built by + # checking if the output of the next two commands matches: + tail -n 1 ~/kernels-built + uname -r + + c) Check if the problem occurs with this kernel as well. + +* **Segment 2**: ensure the 'good' kernel is also a 'working' kernel. + + This among others verifies the trimmed .config file actually works well, as + bisecting with it otherwise would be a waste of time: + + a) Start by checking out the sources of the 'good' version:: + + cd ~/linux/ + git checkout --force --detach v6.0 + + b) Build, install, and boot a kernel as described earlier in *segment 1, + section b* -- just feel free to skip the 'du' commands, as you have a rough + estimate already. + + c) Ensure the feature that regressed with the 'broken' kernel actually works + with this one. + +* **Segment 3**: perform and validate the bisection. + + a) In case your 'broken' version is a stable/longterm release, add the Git + branch holding it:: + + git remote set-branches --add stable linux-6.1.y + git fetch stable + + b) Initialize the bisection:: + + cd ~/linux/ + git bisect start + git bisect good v6.0 + git bisect bad v6.1.5 + + c) Build, install, and boot a kernel as described earlier in *segment 1, + section b*. + + In case building or booting the kernel fails for unrelated reasons, run + ``git bisect skip``. In all other outcomes, check if the regressed feature + works with the newly built kernel. If it does, tell Git by executing + ``git bisect good``; if it does not, run ``git bisect bad`` instead. + + All three commands will make Git checkout another commit; then re-execute + this step (e.g. build, install, boot, and test a kernel to then tell Git + the outcome). Do so again and again until Git shows which commit broke + things. If you run short of disk space during this process, check the + "Supplementary tasks" section below. + + d) Once your finished the bisection, put a few things away:: + + cd ~/linux/ + git bisect log > ~/bisect-log + cp .config ~/bisection-config-culprit + git bisect reset + + e) Try to verify the bisection result:: + + git checkout --force --detach mainline/master + git revert --no-edit cafec0cacaca0 + + This is optional, as some commits are impossible to revert. But if the + second command worked flawlessly, build, install, and boot one more kernel + kernel, which should not show the regression. + +* **Supplementary tasks**: cleanup during and after the process. + + a) To avoid running out of disk space during a bisection, you might need to + remove some kernels you built earlier. You most likely want to keep those + you built during segment 1 and 2 around for a while, but you will most + likely no longer need kernels tested during the actual bisection + (Segment 3 c). You can list them in build order using:: + + ls -ltr /lib/modules/*-local* + + To then for example erase a kernel that identifies itself as + '6.0-rc1-local-gcafec0cacaca0', use this:: + + sudo rm -rf /lib/modules/6.0-rc1-local-gcafec0cacaca0 + sudo kernel-install -v remove 6.0-rc1-local-gcafec0cacaca0 + # * Note, on some distributions kernel-install is missing + # or does only part of the job. + + b) If you performed a bisection and successfully validated the result, feel + free to remove all kernels built during the actual bisection (Segment 3 c); + the kernels you built earlier and later you might want to keep around for + a week or two. + +.. _introguide_bissbs: + +Step-by-step guide on how to verify bugs and bisect regressions +=============================================================== + +This guide describes how to set up your own Linux kernels for investigating bugs +or regressions you intent to report. How far you want to follow the instructions +depends on your issue: + +Execute all steps till the end of *segment 1* to **verify if your kernel problem +is present in code supported by Linux kernel developers**. If it is, you are all +set to report the bug -- unless it did not happen with earlier kernel versions, +as then your want to at least continue with *segment 2* to **check if the issue +qualifies as regression** which receive priority treatment. Depending on the +outcome you then are ready to report a bug or submit a preliminary regression +report; instead of the latter your could also head straight on and follow +*segment 3* to **perform a bisection** for a full-fledged regression report +developers are obliged to act upon. + + :ref:`Preparations: set up everything to build your own kernels.<introprep_bissbs>` + + :ref:`Segment 1: try to reproduce the problem with the latest codebase.<introlatestcheck_bissbs>` + + :ref:`Segment 2: check if the kernels you build work fine.<introworkingcheck_bissbs>` + + :ref:`Segment 3: perform a bisection and validate the result.<introbisect_bissbs>` + + :ref:`Supplementary tasks: cleanup during and after following this guide.<introclosure_bissbs>` + +The steps in each segment illustrate the important aspects of the process, while +a comprehensive reference section holds additional details. The latter sometimes +also outlines alternative approaches, pitfalls, as well as problems that might +occur at the particular step -- and how to get things rolling again. + +For further details on how to report Linux kernel issues or regressions check +out Documentation/admin-guide/reporting-issues.rst, which works in conjunction +with this document. It among others explains why you need to verify bugs with +the latest 'mainline' kernel, even if you face a problem with a kernel from a +'stable/longterm' series; for users facing a regression it also explains that +sending a preliminary report after finishing segment 2 might be wise, as the +regression and its culprit might be known already. For further details on +what actually qualifies as a regression check out +Documentation/admin-guide/reporting-regressions.rst. + +.. _introprep_bissbs: + +Preparations: set up everything to build your own kernels +--------------------------------------------------------- + +.. _backup_bissbs: + +* Create a fresh backup and put system repair and restore tools at hand, just + to be prepared for the unlikely case of something going sideways. + + [:ref:`details<backup_bisref>`] + +.. _vanilla_bissbs: + +* Remove all software that depends on externally developed kernel drivers or + builds them automatically. That includes but is not limited to DKMS, openZFS, + VirtualBox, and Nvidia's graphics drivers (including the GPLed kernel module). + + [:ref:`details<vanilla_bisref>`] + +.. _secureboot_bissbs: + +* On platforms with 'Secure Boot' or similar solutions, prepare everything to + ensure the system will permit your self-compiled kernel to boot. The + quickest and easiest way to achieve this on commodity x86 systems is to + disable such techniques in the BIOS setup utility; alternatively, remove + their restrictions through a process initiated by + ``mokutil --disable-validation``. + + [:ref:`details<secureboot_bisref>`] + +.. _rangecheck_bissbs: + +* Determine the kernel versions considered 'good' and 'bad' throughout this + guide. + + Do you follow this guide to verify if a bug is present in the code developers + care for? Then consider the mainline release your 'working' kernel (the newest + one you regularly use) is based on to be the 'good' version; if your 'working' + kernel for example is '6.0.11', then your 'good' kernel is 'v6.0'. + + In case you face a regression, it depends on the version range where the + regression was introduced: + + * Something which used to work in Linux 6.0 broke when switching to Linux + 6.1-rc1? Then henceforth regard 'v6.0' as the last known 'good' version + and 'v6.1-rc1' as the first 'bad' one. + + * Some function stopped working when updating from 6.0.11 to 6.1.4? Then for + the time being consider 'v6.0' as the last 'good' version and 'v6.1.4' as + the 'bad' one. Note, at this point it is merely assumed that 6.0 is fine; + this assumption will be checked in segment 2. + + * A feature you used in 6.0.11 does not work at all or worse in 6.1.13? In + that case you want to bisect within a stable/longterm series: consider + 'v6.0.11' as the last known 'good' version and 'v6.0.13' as the first 'bad' + one. Note, in this case you still want to compile and test a mainline kernel + as explained in segment 1: the outcome will determine if you need to report + your issue to the regular developers or the stable team. + + *Note, do not confuse 'good' version with 'working' kernel; the latter term + throughout this guide will refer to the last kernel that has been working + fine.* + + [:ref:`details<rangecheck_bisref>`] + +.. _bootworking_bissbs: + +* Boot into the 'working' kernel and briefly use the apparently broken feature. + + [:ref:`details<bootworking_bisref>`] + +.. _diskspace_bissbs: + +* Ensure to have enough free space for building Linux. 15 Gigabyte in your home + directory should typically suffice. If you have less available, be sure to pay + attention to later steps about retrieving the Linux sources and handling of + debug symbols: both explain approaches reducing the amount of space, which + should allow you to master these tasks with about 4 Gigabytes free space. + + [:ref:`details<diskspace_bisref>`] + +.. _buildrequires_bissbs: + +* Install all software required to build a Linux kernel. Often you will need: + 'bc', 'binutils' ('ld' et al.), 'bison', 'flex', 'gcc', 'git', 'openssl', + 'pahole', 'perl', and the development headers for 'libelf' and 'openssl'. The + reference section shows how to quickly install those on various popular Linux + distributions. + + [:ref:`details<buildrequires_bisref>`] + +.. _sources_bissbs: + +* Retrieve the mainline Linux sources; then change into the directory holding + them, as all further commands in this guide are meant to be executed from + there. + + *Note, the following describe how to retrieve the sources using a full + mainline clone, which downloads about 2,75 GByte as of early 2024. The* + :ref:`reference section describes two alternatives <sources_bisref>` *: + one downloads less than 500 MByte, the other works better with unreliable + internet connections.* + + Execute the following command to retrieve a fresh mainline codebase while + preparing things to add stable/longterm branches later:: + + git clone -o mainline --no-checkout \ + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ~/linux/ + cd ~/linux/ + git remote add -t master stable \ + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git + + [:ref:`details<sources_bisref>`] + +.. _oldconfig_bissbs: + +* Start preparing a kernel build configuration (the '.config' file). + + Before doing so, ensure you are still running the 'working' kernel an earlier + step told you to boot; if you are unsure, check the current kernel release + identifier using ``uname -r``. + + Afterwards check out the source code for the version earlier established as + 'good' (in this example this is assumed to be 6.0) and create a .config file:: + + git checkout --detach v6.0 + make olddefconfig + + The second command will try to locate the build configuration file for the + running kernel and then adjust it for the needs of the kernel sources you + checked out. While doing so, it will print a few lines you need to check. + + Look out for a line starting with '# using defaults found in'. It should be + followed by a path to a file in '/boot/' that contains the release identifier + of your currently working kernel. If the line instead continues with something + like 'arch/x86/configs/x86_64_defconfig', then the build infra failed to find + the .config file for your running kernel -- in which case you have to put one + there manually, as explained in the reference section. + + In case you can not find such a line, look for one containing '# configuration + written to .config'. If that's the case you have a stale build configuration + lying around. Unless you intend to use it, delete it; afterwards run + 'make olddefconfig' again and check if it now picked up the right config file + as base. + + [:ref:`details<oldconfig_bisref>`] + +.. _localmodconfig_bissbs: + +* Disable any kernel modules apparently superfluous for your setup. This is + optional, but especially wise for bisections, as it speeds up the build + process enormously -- at least unless the .config file picked up in the + previous step was already tailored to your and your hardware needs, in which + case you should skip this step. + + To prepare the trimming, connect external hardware you occasionally use (USB + keys, tokens, ...), quickly start a VM, and bring up VPNs. And if you rebooted + since you started that guide, ensure that you tried using the feature causing + trouble since you started the system. Only then trim your .config:: + + yes '' | make localmodconfig + + There is a catch to this, as the 'apparently' in initial sentence of this step + and the preparation instructions already hinted at: + + The 'localmodconfig' target easily disables kernel modules for features only + used occasionally -- like modules for external peripherals not yet connected + since booting, virtualization software not yet utilized, VPN tunnels, and a + few other things. That's because some tasks rely on kernel modules Linux only + loads when you execute tasks like the aforementioned ones for the first time. + + This drawback of localmodconfig is nothing you should lose sleep over, but + something to keep in mind: if something is misbehaving with the kernels built + during this guide, this is most likely the reason. You can reduce or nearly + eliminate the risk with tricks outlined in the reference section; but when + building a kernel just for quick testing purposes this is usually not worth + spending much effort on, as long as it boots and allows to properly test the + feature that causes trouble. + + [:ref:`details<localmodconfig_bisref>`] + +.. _tagging_bissbs: + +* Ensure all the kernels you will build are clearly identifiable using a special + tag and a unique version number:: + + ./scripts/config --set-str CONFIG_LOCALVERSION '-local' + ./scripts/config -e CONFIG_LOCALVERSION_AUTO + + [:ref:`details<tagging_bisref>`] + +.. _debugsymbols_bissbs: + +* Decide how to handle debug symbols. + + In the context of this document it is often wise to enable them, as there is a + decent chance you will need to decode a stack trace from a 'panic', 'Oops', + 'warning', or 'BUG':: + + ./scripts/config -d DEBUG_INFO_NONE -e KALLSYMS_ALL -e DEBUG_KERNEL \ + -e DEBUG_INFO -e DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT -e KALLSYMS + + But if you are extremely short on storage space, you might want to disable + debug symbols instead:: + + ./scripts/config -d DEBUG_INFO -d DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT \ + -d DEBUG_INFO_DWARF4 -d DEBUG_INFO_DWARF5 -e CONFIG_DEBUG_INFO_NONE + + [:ref:`details<debugsymbols_bisref>`] + +.. _configmods_bissbs: + +* Check if you may want or need to adjust some other kernel configuration + options: + + * Are you running Debian? Then you want to avoid known problems by performing + additional adjustments explained in the reference section. + + [:ref:`details<configmods_distros_bisref>`]. + + * If you want to influence other aspects of the configuration, do so now using + your preferred tool. Note, to use make targets like 'menuconfig' or + 'nconfig', you will need to install the development files of ncurses; for + 'xconfig' you likewise need the Qt5 or Qt6 headers. + + [:ref:`details<configmods_individual_bisref>`]. + +.. _saveconfig_bissbs: + +* Reprocess the .config after the latest adjustments and store it in a safe + place:: + + make olddefconfig + cp .config ~/kernel-config-working + + [:ref:`details<saveconfig_bisref>`] + +.. _introlatestcheck_bissbs: + +Segment 1: try to reproduce the problem with the latest codebase +---------------------------------------------------------------- + +The following steps verify if the problem occurs with the code currently +supported by developers. In case you face a regression, it also checks that the +problem is not caused by some .config change, as reporting the issue then would +be a waste of time. [:ref:`details<introlatestcheck_bisref>`] + +.. _checkoutmaster_bissbs: + +* Check out the latest Linux codebase:: + + cd ~/linux/ + git checkout --force --detach mainline/master + + [:ref:`details<checkoutmaster_bisref>`] + +.. _build_bissbs: + +* Build the image and the modules of your first kernel using the config file you + prepared:: + + cp ~/kernel-config-working .config + make olddefconfig + make -j $(nproc --all) + + If you want your kernel packaged up as deb, rpm, or tar file, see the + reference section for alternatives, which obviously will require other + steps to install as well. + + [:ref:`details<build_bisref>`] + +.. _install_bissbs: + +* Install your newly built kernel. + + Before doing so, consider checking if there is still enough room for it:: + + df -h /boot/ /lib/modules/ + + 150 MByte in /boot/ and 200 in /lib/modules/ usually suffice. Those are rough + estimates assuming the worst case. How much your kernels actually require will + be determined later. + + Now install the kernel, which will be saved in parallel to the kernels from + your Linux distribution:: + + command -v installkernel && sudo make modules_install install + + On many commodity Linux distributions this will take care of everything + required to boot your kernel. You might want to ensure that's the case by + checking if your boot loader's configuration was updated; furthermore ensure + an initramfs (also known as initrd) exists, which on many distributions can be + achieved by running ``ls -l /boot/init*$(make -s kernelrelease)*``. Those + steps are recommended, as there are quite a few Linux distribution where above + command is insufficient: + + * On Arch Linux, its derivatives, many immutable Linux distributions, and a + few others the above command does nothing at, as they lack 'installkernel' + executable. + + * Some distributions install the kernel, but don't add an entry for your + kernel in your boot loader's configuration -- the kernel thus won't show up + in the boot menu. + + * Some distributions add a boot loader menu entry, but don't create an + initramfs on installation -- in that case your kernel most likely will be + unable to mount the root partition during bootup. + + If any of that applies to you, see the reference section for further guidance. + Once you figured out what to do, consider writing down the necessary + installation steps: if you will build more kernels as described in + segment 2 and 3, you will have to execute these commands every time that + ``command -v installkernel [...]`` comes up again. + + [:ref:`details<install_bisref>`] + +.. _storagespace_bissbs: + +* In case you plan to follow this guide further, check how much storage space + the kernel, its modules, and other related files like the initramfs consume:: + + du -ch /boot/*$(make -s kernelrelease)* | tail -n 1 + du -sh /lib/modules/$(make -s kernelrelease)/ + + Write down or remember those two values for later: they enable you to prevent + running out of disk space accidentally during a bisection. + + [:ref:`details<storagespace_bisref>`] + +.. _kernelrelease_bissbs: + +* Show and store the kernelrelease identifier of the kernel you just built:: + + make -s kernelrelease | tee -a ~/kernels-built + + Remember the identifier momentarily, as it will help you pick the right kernel + from the boot menu upon restarting. + +.. _recheckbroken_bissbs: + +* Reboot into the kernel you just built and check if the feature that is + expected to be broken really is. + + Start by making sure the kernel you booted is the one you just built. When + unsure, check if the output of these commands show the exact same release + identifier:: + + tail -n 1 ~/kernels-built + uname -r + + Now verify if the feature that causes trouble works with your newly built + kernel. If things work while investigating a regression, check the reference + section for further details. + + [:ref:`details<recheckbroken_bisref>`] + +.. _recheckstablebroken_bissbs: + +* Are you facing a problem within a stable/longterm release, but failed to + reproduce it with the mainline kernel you just built? Then check if the latest + codebase for the particular series might already fix the problem. To do so, + add the stable series Git branch for your 'good' kernel (again, this here is + assumed to be 6.0) and check out the latest version:: + + cd ~/linux/ + git remote set-branches --add stable linux-6.0.y + git fetch stable + git checkout --force --detach linux-6.0.y + + Now use the checked out code to build and install another kernel using the + commands the earlier steps already described in more detail:: + + cp ~/kernel-config-working .config + make olddefconfig + make -j $(nproc --all) + # * Check if the free space suffices holding another kernel: + df -h /boot/ /lib/modules/ + command -v installkernel && sudo make modules_install install + make -s kernelrelease | tee -a ~/kernels-built + reboot + + Now verify if you booted the kernel you intended to start, to then check if + everything works fine with this kernel:: + + tail -n 1 ~/kernels-built + uname -r + + [:ref:`details<recheckstablebroken_bisref>`] + +Do you follow this guide to verify if a problem is present in the code +currently supported by Linux kernel developers? Then you are done at this +point. If you later want to remove the kernel you just built, check out +:ref:`Supplementary tasks: cleanup during and after following this guide.<introclosure_bissbs>`. + +In case you face a regression, move on and execute at least the next segment +as well. + +.. _introworkingcheck_bissbs: + +Segment 2: check if the kernels you build work fine +--------------------------------------------------- + +In case of a regression, you now want to ensure the trimmed configuration file +you created earlier works as expected; a bisection with the .config file +otherwise would be a waste of time. [:ref:`details<introworkingcheck_bisref>`] + +.. _recheckworking_bissbs: + +* Build your own variant of the 'working' kernel and check if the feature that + regressed works as expected with it. + + Start by checking out the sources for the version earlier established as + 'good' (once again assumed to be 6.0 here):: + + cd ~/linux/ + git checkout --detach v6.0 + + Now use the checked out code to configure, build, and install another kernel + using the commands the previous subsection explained in more detail:: + + cp ~/kernel-config-working .config + make olddefconfig + make -j $(nproc --all) + # * Check if the free space suffices holding another kernel: + df -h /boot/ /lib/modules/ + command -v installkernel && sudo make modules_install install + make -s kernelrelease | tee -a ~/kernels-built + reboot + + When the system booted, you may want to verify once again that the + kernel you started is the one you just built: + + tail -n 1 ~/kernels-built + uname -r + + Now check if this kernel works as expected; if not, consult the reference + section for further instructions. + + [:ref:`details<recheckworking_bisref>`] + +.. _introbisect_bissbs: + +Segment 3: perform the bisection and validate the result +-------------------------------------------------------- + +With all the preparations and precaution builds taken care of, you are now ready +to begin the bisection. This will make you build quite a few kernels -- usually +about 15 in case you encountered a regression when updating to a newer series +(say from 6.0.11 to 6.1.3). But do not worry, due to the trimmed build +configuration created earlier this works a lot faster than many people assume: +overall on average it will often just take about 10 to 15 minutes to compile +each kernel on commodity x86 machines. + +* In case your 'bad' version is a stable/longterm release (say v6.1.5), add its + stable branch, unless you already did so earlier:: + + cd ~/linux/ + git remote set-branches --add stable linux-6.1.y + git fetch stable + +.. _bisectstart_bissbs: + +* Start the bisection and tell Git about the versions earlier established as + 'good' (6.0 in the following example command) and 'bad' (6.1.5):: + + cd ~/linux/ + git bisect start + git bisect good v6.0 + git bisect bad v6.1.5 + + [:ref:`details<bisectstart_bisref>`] + +.. _bisectbuild_bissbs: + +* Now use the code Git checked out to build, install, and boot a kernel using + the commands introduced earlier:: + + cp ~/kernel-config-working .config + make olddefconfig + make -j $(nproc --all) + # * Check if the free space suffices holding another kernel: + df -h /boot/ /lib/modules/ + command -v installkernel && sudo make modules_install install + make -s kernelrelease | tee -a ~/kernels-built + reboot + + If compilation fails for some reason, run ``git bisect skip`` and restart + executing the stack of commands from the beginning. + + In case you skipped the "test latest codebase" step in the guide, check its + description as for why the 'df [...]' and 'make -s kernelrelease [...]' + commands are here. + + Important note: the latter command from this point on will print release + identifiers that might look odd or wrong to you -- which they are not, as it's + totally normal to see release identifiers like '6.0-rc1-local-gcafec0cacaca0' + if you bisect between versions 6.1 and 6.2 for example. + + [:ref:`details<bisectbuild_bisref>`] + +.. _bisecttest_bissbs: + +* Now check if the feature that regressed works in the kernel you just built. + + You again might want to start by making sure the kernel you booted is the one + you just built:: + + cd ~/linux/ + tail -n 1 ~/kernels-built + uname -r + + Now verify if the feature that regressed works at this kernel bisection point. + If it does, run this:: + + git bisect good + + If it does not, run this:: + + git bisect bad + + Be sure about what you tell Git, as getting this wrong just once will send the + rest of the bisection totally off course. + + While the bisection is ongoing, Git will use the information you provided to + find and check out another bisection point for you to test. While doing so, it + will print something like 'Bisecting: 675 revisions left to test after this + (roughly 10 steps)' to indicate how many further changes it expects to be + tested. Now build and install another kernel using the instructions from the + previous step; afterwards follow the instructions in this step again. + + Repeat this again and again until you finish the bisection -- that's the case + when Git after tagging a change as 'good' or 'bad' prints something like + 'cafecaca0c0dacafecaca0c0dacafecaca0c0da is the first bad commit'; right + afterwards it will show some details about the culprit including the patch + description of the change. The latter might fill your terminal screen, so you + might need to scroll up to see the message mentioning the culprit; + alternatively, run ``git bisect log > ~/bisection-log``. + + [:ref:`details<bisecttest_bisref>`] + +.. _bisectlog_bissbs: + +* Store Git's bisection log and the current .config file in a safe place before + telling Git to reset the sources to the state before the bisection:: + + cd ~/linux/ + git bisect log > ~/bisection-log + cp .config ~/bisection-config-culprit + git bisect reset + + [:ref:`details<bisectlog_bisref>`] + +.. _revert_bissbs: + +* Try reverting the culprit on top of latest mainline to see if this fixes your + regression. + + This is optional, as it might be impossible or hard to realize. The former is + the case, if the bisection determined a merge commit as the culprit; the + latter happens if other changes depend on the culprit. But if the revert + succeeds, it is worth building another kernel, as it validates the result of + a bisection, which can easily deroute; it furthermore will let kernel + developers know, if they can resolve the regression with a quick revert. + + Begin by checking out the latest codebase depending on the range you bisected: + + * Did you face a regression within a stable/longterm series (say between + 6.0.11 and 6.0.13) that does not happen in mainline? Then check out the + latest codebase for the affected series like this:: + + git fetch stable + git checkout --force --detach linux-6.0.y + + * In all other cases check out latest mainline:: + + git fetch mainline + git checkout --force --detach mainline/master + + If you bisected a regression within a stable/longterm series that also + happens in mainline, there is one more thing to do: look up the mainline + commit-id. To do so, use a command like ``git show abcdcafecabcd`` to + view the patch description of the culprit. There will be a line near + the top which looks like 'commit cafec0cacaca0 upstream.' or + 'Upstream commit cafec0cacaca0'; use that commit-id in the next command + and not the one the bisection blamed. + + Now try reverting the culprit by specifying its commit id:: + + git revert --no-edit cafec0cacaca0 + + If that fails, give up trying and move on to the next step. But if it works, + build a kernel again using the familiar command sequence:: + + cp ~/kernel-config-working .config + make olddefconfig && + make -j $(nproc --all) && + # * Check if the free space suffices holding another kernel: + df -h /boot/ /lib/modules/ + command -v installkernel && sudo make modules_install install + Make -s kernelrelease | tee -a ~/kernels-built + reboot + + Now check one last time if the feature that made you perform a bisection work + with that kernel. + + [:ref:`details<revert_bisref>`] + +.. _introclosure_bissbs: + +Supplementary tasks: cleanup during and after the bisection +----------------------------------------------------------- + +During and after following this guide you might want or need to remove some of +the kernels you installed: the boot menu otherwise will become confusing or +space might run out. + +.. _makeroom_bissbs: + +* To remove one of the kernels you installed, look up its 'kernelrelease' + identifier. This guide stores them in '~/kernels-built', but the following + command will print them as well:: + + ls -ltr /lib/modules/*-local* + + You in most situations want to remove the oldest kernels built during the + actual bisection (e.g. segment 3 of this guide). The two ones you created + beforehand (e.g. to test the latest codebase and the version considered + 'good') might become handy to verify something later -- thus better keep them + around, unless you are really short on storage space. + + To remove the modules of a kernel with the kernelrelease identifier + '*6.0-rc1-local-gcafec0cacaca0*', start by removing the directory holding its + modules:: + + sudo rm -rf /lib/modules/6.0-rc1-local-gcafec0cacaca0 + + Afterwards try the following command:: + + sudo kernel-install -v remove 6.0-rc1-local-gcafec0cacaca0 + + On quite a few distributions this will delete all other kernel files installed + while also removing the kernel's entry from the boot menu. But on some + distributions kernel-install does not exist or leaves boot-loader entries or + kernel image and related files behind; in that case remove them as described + in the reference section. + + [:ref:`details<makeroom_bisref>`] + +.. _finishingtouch_bissbs: + +* Once you have finished the bisection, do not immediately remove anything you + set up, as you might need a few things again. What is safe to remove depends + on the outcome of the bisection: + + * Could you initially reproduce the regression with the latest codebase and + after the bisection were able to fix the problem by reverting the culprit on + top of the latest codebase? Then you want to keep those two kernels around + for a while, but safely remove all others with a '-local' in the release + identifier. + + * Did the bisection end on a merge-commit or seems questionable for other + reasons? Then you want to keep as many kernels as possible around for a few + days: it's pretty likely that you will be asked to recheck something. + + * In other cases it likely is a good idea to keep the following kernels around + for some time: the one built from the latest codebase, the one created from + the version considered 'good', and the last three or four you compiled + during the actual bisection process. + + [:ref:`details<finishingtouch_bisref>`] + +.. _submit_improvements: + +This concludes the step-by-step guide. + +Did you run into trouble following any of the above steps not cleared up by the +reference section below? Did you spot errors? Or do you have ideas how to +improve the guide? Then please take a moment and let the maintainer of this +document know by email (Thorsten Leemhuis <linux@leemhuis.info>), ideally while +CCing the Linux docs mailing list (linux-doc@vger.kernel.org). Such feedback is +vital to improve this document further, which is in everybody's interest, as it +will enable more people to master the task described here -- and hopefully also +improve similar guides inspired by this one. + + +Reference section for the step-by-step guide +============================================ + +This section holds additional information for almost all the items in the above +step-by-step guide. + +.. _backup_bisref: + +Prepare for emergencies +----------------------- + + *Create a fresh backup and put system repair and restore tools at hand.* + [:ref:`... <backup_bissbs>`] + +Remember, you are dealing with computers, which sometimes do unexpected things +-- especially if you fiddle with crucial parts like the kernel of an operating +system. That's what you are about to do in this process. Hence, better prepare +for something going sideways, even if that should not happen. + +[:ref:`back to step-by-step guide <backup_bissbs>`] + +.. _vanilla_bisref: + +Remove anything related to externally maintained kernel modules +--------------------------------------------------------------- + + *Remove all software that depends on externally developed kernel drivers or + builds them automatically.* [:ref:`...<vanilla_bissbs>`] + +Externally developed kernel modules can easily cause trouble during a bisection. + +But there is a more important reason why this guide contains this step: most +kernel developers will not care about reports about regressions occurring with +kernels that utilize such modules. That's because such kernels are not +considered 'vanilla' anymore, as Documentation/admin-guide/reporting-issues.rst +explains in more detail. + +[:ref:`back to step-by-step guide <vanilla_bissbs>`] + +.. _secureboot_bisref: + +Deal with techniques like Secure Boot +------------------------------------- + + *On platforms with 'Secure Boot' or similar techniques, prepare everything to + ensure the system will permit your self-compiled kernel to boot later.* + [:ref:`... <secureboot_bissbs>`] + +Many modern systems allow only certain operating systems to start; that's why +they reject booting self-compiled kernels by default. + +You ideally deal with this by making your platform trust your self-built kernels +with the help of a certificate. How to do that is not described +here, as it requires various steps that would take the text too far away from +its purpose; 'Documentation/admin-guide/module-signing.rst' and various web +sides already explain everything needed in more detail. + +Temporarily disabling solutions like Secure Boot is another way to make your own +Linux boot. On commodity x86 systems it is possible to do this in the BIOS Setup +utility; the required steps vary a lot between machines and therefore cannot be +described here. + +On mainstream x86 Linux distributions there is a third and universal option: +disable all Secure Boot restrictions for your Linux environment. You can +initiate this process by running ``mokutil --disable-validation``; this will +tell you to create a one-time password, which is safe to write down. Now +restart; right after your BIOS performed all self-tests the bootloader Shim will +show a blue box with a message 'Press any key to perform MOK management'. Hit +some key before the countdown exposes, which will open a menu. Choose 'Change +Secure Boot state'. Shim's 'MokManager' will now ask you to enter three +randomly chosen characters from the one-time password specified earlier. Once +you provided them, confirm you really want to disable the validation. +Afterwards, permit MokManager to reboot the machine. + +[:ref:`back to step-by-step guide <secureboot_bissbs>`] + +.. _bootworking_bisref: + +Boot the last kernel that was working +------------------------------------- + + *Boot into the last working kernel and briefly recheck if the feature that + regressed really works.* [:ref:`...<bootworking_bissbs>`] + +This will make later steps that cover creating and trimming the configuration do +the right thing. + +[:ref:`back to step-by-step guide <bootworking_bissbs>`] + +.. _diskspace_bisref: + +Space requirements +------------------ + + *Ensure to have enough free space for building Linux.* + [:ref:`... <diskspace_bissbs>`] + +The numbers mentioned are rough estimates with a big extra charge to be on the +safe side, so often you will need less. + +If you have space constraints, be sure to hay attention to the :ref:`step about +debug symbols' <debugsymbols_bissbs>` and its :ref:`accompanying reference +section' <debugsymbols_bisref>`, as disabling then will reduce the consumed disk +space by quite a few gigabytes. + +[:ref:`back to step-by-step guide <diskspace_bissbs>`] + +.. _rangecheck_bisref: + +Bisection range +--------------- + + *Determine the kernel versions considered 'good' and 'bad' throughout this + guide.* [:ref:`...<rangecheck_bissbs>`] + +Establishing the range of commits to be checked is mostly straightforward, +except when a regression occurred when switching from a release of one stable +series to a release of a later series (e.g. from 6.0.11 to 6.1.4). In that case +Git will need some hand holding, as there is no straight line of descent. + +That's because with the release of 6.0 mainline carried on to 6.1 while the +stable series 6.0.y branched to the side. It's therefore theoretically possible +that the issue you face with 6.1.4 only worked in 6.0.11, as it was fixed by a +commit that went into one of the 6.0.y releases, but never hit mainline or the +6.1.y series. Thankfully that normally should not happen due to the way the +stable/longterm maintainers maintain the code. It's thus pretty safe to assume +6.0 as a 'good' kernel. That assumption will be tested anyway, as that kernel +will be built and tested in the segment '2' of this guide; Git would force you +to do this as well, if you tried bisecting between 6.0.11 and 6.1.13. + +[:ref:`back to step-by-step guide <rangecheck_bissbs>`] + +.. _buildrequires_bisref: + +Install build requirements +-------------------------- + + *Install all software required to build a Linux kernel.* + [:ref:`...<buildrequires_bissbs>`] + +The kernel is pretty stand-alone, but besides tools like the compiler you will +sometimes need a few libraries to build one. How to install everything needed +depends on your Linux distribution and the configuration of the kernel you are +about to build. + +Here are a few examples what you typically need on some mainstream +distributions: + +* Arch Linux and derivatives:: + + sudo pacman --needed -S bc binutils bison flex gcc git kmod libelf openssl \ + pahole perl zlib ncurses qt6-base + +* Debian, Ubuntu, and derivatives:: + + sudo apt install bc binutils bison dwarves flex gcc git kmod libelf-dev \ + libssl-dev make openssl pahole perl-base pkg-config zlib1g-dev \ + libncurses-dev qt6-base-dev g++ + +* Fedora and derivatives:: + + sudo dnf install binutils \ + /usr/bin/{bc,bison,flex,gcc,git,openssl,make,perl,pahole,rpmbuild} \ + /usr/include/{libelf.h,openssl/pkcs7.h,zlib.h,ncurses.h,qt6/QtGui/QAction} + +* openSUSE and derivatives:: + + sudo zypper install bc binutils bison dwarves flex gcc git \ + kernel-install-tools libelf-devel make modutils openssl openssl-devel \ + perl-base zlib-devel rpm-build ncurses-devel qt6-base-devel + +These commands install a few packages that are often, but not always needed. You +for example might want to skip installing the development headers for ncurses, +which you will only need in case you later might want to adjust the kernel build +configuration using make the targets 'menuconfig' or 'nconfig'; likewise omit +the headers of Qt6 is you do not plan to adjust the .config using 'xconfig'. + +You furthermore might need additional libraries and their development headers +for tasks not covered in this guide -- for example when building utilities from +the kernel's tools/ directory. + +[:ref:`back to step-by-step guide <buildrequires_bissbs>`] + +.. _sources_bisref: + +Download the sources using Git +------------------------------ + + *Retrieve the Linux mainline sources.* + [:ref:`...<sources_bissbs>`] + +The step-by-step guide outlines how to download the Linux sources using a full +Git clone of Linus' mainline repository. There is nothing more to say about +that -- but there are two alternatives ways to retrieve the sources that might +work better for you: + + * If you have an unreliable internet connection, consider + :ref:`using a 'Git bundle'<sources_bundle_bisref>`. + + * If downloading the complete repository would take too long or requires too + much storage space, consider :ref:`using a 'shallow + clone'<sources_shallow_bisref>`. + +.. _sources_bundle_bisref: + +Downloading Linux mainline sources using a bundle +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Use the following commands to retrieve the Linux mainline sources using a +bundle:: + + wget -c \ + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/clone.bundle + git clone --no-checkout clone.bundle ~/linux/ + cd ~/linux/ + git remote remove origin + git remote add mainline \ + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git + git fetch mainline + git remote add -t master stable \ + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git + +In case the 'wget' command fails, just re-execute it, it will pick up where +it left off. + +[:ref:`back to step-by-step guide <sources_bissbs>`] +[:ref:`back to section intro <sources_bisref>`] + +.. _sources_shallow_bisref: + +Downloading Linux mainline sources using a shallow clone +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +First, execute the following command to retrieve the latest mainline codebase:: + + git clone -o mainline --no-checkout --depth 1 -b master \ + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ~/linux/ + cd ~/linux/ + git remote add -t master stable \ + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git + +Now deepen your clone's history to the second predecessor of the mainline +release of your 'good' version. In case the latter are 6.0 or 6.0.11, 5.19 would +be the first predecessor and 5.18 the second -- hence deepen the history up to +that version:: + + git fetch --shallow-exclude=v5.18 mainline + +Afterwards add the stable Git repository as remote and all required stable +branches as explained in the step-by-step guide. + +Note, shallow clones have a few peculiar characteristics: + + * For bisections the history needs to be deepened a few mainline versions + farther than it seems necessary, as explained above already. That's because + Git otherwise will be unable to revert or describe most of the commits within + a range (say v6.1..v6.2), as they are internally based on earlier kernels + releases (like v6.0-rc2 or 5.19-rc3). + + * This document in most places uses ``git fetch`` with ``--shallow-exclude=`` + to specify the earliest version you care about (or to be precise: its git + tag). You alternatively can use the parameter ``--shallow-since=`` to specify + an absolute (say ``'2023-07-15'``) or relative (``'12 months'``) date to + define the depth of the history you want to download. When using them while + bisecting mainline, ensure to deepen the history to at least 7 months before + the release of the mainline release your 'good' kernel is based on. + + * Be warned, when deepening your clone you might encounter an error like + 'fatal: error in object: unshallow cafecaca0c0dacafecaca0c0dacafecaca0c0da'. + In that case run ``git repack -d`` and try again. + +[:ref:`back to step-by-step guide <sources_bissbs>`] +[:ref:`back to section intro <sources_bisref>`] + +.. _oldconfig_bisref: + +Start defining the build configuration for your kernel +------------------------------------------------------ + + *Start preparing a kernel build configuration (the '.config' file).* + [:ref:`... <oldconfig_bissbs>`] + +*Note, this is the first of multiple steps in this guide that create or modify +build artifacts. The commands used in this guide store them right in the source +tree to keep things simple. In case you prefer storing the build artifacts +separately, create a directory like '~/linux-builddir/' and add the parameter +``O=~/linux-builddir/`` to all make calls used throughout this guide. You will +have to point other commands there as well -- among them the ``./scripts/config +[...]`` commands, which will require ``--file ~/linux-builddir/.config`` to +locate the right build configuration.* + +Two things can easily go wrong when creating a .config file as advised: + + * The oldconfig target will use a .config file from your build directory, if + one is already present there (e.g. '~/linux/.config'). That's totally fine if + that's what you intend (see next step), but in all other cases you want to + delete it. This for example is important in case you followed this guide + further, but due to problems come back here to redo the configuration from + scratch. + + * Sometimes olddefconfig is unable to locate the .config file for your running + kernel and will use defaults, as briefly outlined in the guide. In that case + check if your distribution ships the configuration somewhere and manually put + it in the right place (e.g. '~/linux/.config') if it does. On distributions + where /proc/config.gz exists this can be achieved using this command:: + + zcat /proc/config.gz > .config + + Once you put it there, run ``make olddefconfig`` again to adjust it to the + needs of the kernel about to be built. + +Note, the olddefconfig target will set any undefined build options to their +default value. If you prefer to set such configuration options manually, use +``make oldconfig`` instead. Then for each undefined configuration option you +will be asked how to proceed; in case you are unsure what to answer, simply hit +'enter' to apply the default value. Note though that for bisections you normally +want to go with the defaults, as you otherwise might enable a new feature that +causes a problem looking like regressions (for example due to security +restrictions). + +Occasionally odd things happen when trying to use a config file prepared for one +kernel (say 6.1) on an older mainline release -- especially if it is much older +(say v5.15). That's one of the reasons why the previous step in the guide told +you to boot the kernel where everything works. If you manually add a .config +file you thus want to ensure it's from the working kernel and not from a one +that shows the regression. + +In case you want to build kernels for another machine, locate its kernel build +configuration; usually ``ls /boot/config-$(uname -r)`` will print its name. Copy +that file to the build machine and store it as ~/linux/.config; afterwards run +``make olddefconfig`` to adjust it. + +[:ref:`back to step-by-step guide <oldconfig_bissbs>`] + +.. _localmodconfig_bisref: + +Trim the build configuration for your kernel +-------------------------------------------- + + *Disable any kernel modules apparently superfluous for your setup.* + [:ref:`... <localmodconfig_bissbs>`] + +As explained briefly in the step-by-step guide already: with localmodconfig it +can easily happen that your self-built kernels will lack modules for tasks you +did not perform at least once before utilizing this make target. That happens +when a task requires kernel modules which are only autoloaded when you execute +it for the first time. So when you never performed that task since starting your +kernel the modules will not have been loaded -- and from localmodonfig's point +of view look superfluous, which thus disables them to reduce the amount of code +to be compiled. + +You can try to avoid this by performing typical tasks that often will autoload +additional kernel modules: start a VM, establish VPN connections, loop-mount a +CD/DVD ISO, mount network shares (CIFS, NFS, ...), and connect all external +devices (2FA keys, headsets, webcams, ...) as well as storage devices with file +systems you otherwise do not utilize (btrfs, ext4, FAT, NTFS, XFS, ...). But it +is hard to think of everything that might be needed -- even kernel developers +often forget one thing or another at this point. + +Do not let that risk bother you, especially when compiling a kernel only for +testing purposes: everything typically crucial will be there. And if you forget +something important you can turn on a missing feature manually later and quickly +run the commands again to compile and install a kernel that has everything you +need. + +But if you plan to build and use self-built kernels regularly, you might want to +reduce the risk by recording which modules your system loads over the course of +a few weeks. You can automate this with `modprobed-db +<https://github.com/graysky2/modprobed-db>`_. Afterwards use ``LSMOD=<path>`` to +point localmodconfig to the list of modules modprobed-db noticed being used:: + + yes '' | make LSMOD='${HOME}'/.config/modprobed.db localmodconfig + +That parameter also allows you to build trimmed kernels for another machine in +case you copied a suitable .config over to use as base (see previous step). Just +run ``lsmod > lsmod_foo-machine`` on that system and copy the generated file to +your build's host home directory. Then run these commands instead of the one the +step-by-step guide mentions:: + + yes '' | make LSMOD=~/lsmod_foo-machine localmodconfig + +[:ref:`back to step-by-step guide <localmodconfig_bissbs>`] + +.. _tagging_bisref: + +Tag the kernels about to be build +--------------------------------- + + *Ensure all the kernels you will build are clearly identifiable using a + special tag and a unique version identifier.* [:ref:`... <tagging_bissbs>`] + +This allows you to differentiate your distribution's kernels from those created +during this process, as the file or directories for the latter will contain +'-local' in the name; it also helps picking the right entry in the boot menu and +not lose track of you kernels, as their version numbers will look slightly +confusing during the bisection. + +[:ref:`back to step-by-step guide <tagging_bissbs>`] + +.. _debugsymbols_bisref: + +Decide to enable or disable debug symbols +----------------------------------------- + + *Decide how to handle debug symbols.* [:ref:`... <debugsymbols_bissbs>`] + +Having debug symbols available can be important when your kernel throws a +'panic', 'Oops', 'warning', or 'BUG' later when running, as then you will be +able to find the exact place where the problem occurred in the code. But +collecting and embedding the needed debug information takes time and consumes +quite a bit of space: in late 2022 the build artifacts for a typical x86 kernel +trimmed with localmodconfig consumed around 5 Gigabyte of space with debug +symbols, but less than 1 when they were disabled. The resulting kernel image and +modules are bigger as well, which increases storage requirements for /boot/ and +load times. + +In case you want a small kernel and are unlikely to decode a stack trace later, +you thus might want to disable debug symbols to avoid those downsides. If it +later turns out that you need them, just enable them as shown and rebuild the +kernel. + +You on the other hand definitely want to enable them for this process, if there +is a decent chance that you need to decode a stack trace later. The section +'Decode failure messages' in Documentation/admin-guide/reporting-issues.rst +explains this process in more detail. + +[:ref:`back to step-by-step guide <debugsymbols_bissbs>`] + +.. _configmods_bisref: + +Adjust build configuration +-------------------------- + + *Check if you may want or need to adjust some other kernel configuration + options:* + +Depending on your needs you at this point might want or have to adjust some +kernel configuration options. + +.. _configmods_distros_bisref: + +Distro specific adjustments +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + *Are you running* [:ref:`... <configmods_bissbs>`] + +The following sections help you to avoid build problems that are known to occur +when following this guide on a few commodity distributions. + +**Debian:** + + * Remove a stale reference to a certificate file that would cause your build to + fail:: + + ./scripts/config --set-str SYSTEM_TRUSTED_KEYS '' + + Alternatively, download the needed certificate and make that configuration + option point to it, as `the Debian handbook explains in more detail + <https://debian-handbook.info/browse/stable/sect.kernel-compilation.html>`_ + -- or generate your own, as explained in + Documentation/admin-guide/module-signing.rst. + +[:ref:`back to step-by-step guide <configmods_bissbs>`] + +.. _configmods_individual_bisref: + +Individual adjustments +~~~~~~~~~~~~~~~~~~~~~~ + + *If you want to influence the other aspects of the configuration, do so + now.* [:ref:`... <configmods_bissbs>`] + +You at this point can use a command like ``make menuconfig`` to enable or +disable certain features using a text-based user interface; to use a graphical +configuration utility, call the make target ``xconfig`` or ``gconfig`` instead. +All of them require development libraries from toolkits they are based on +(ncurses, Qt5, Gtk2); an error message will tell you if something required is +missing. + +[:ref:`back to step-by-step guide <configmods_bissbs>`] + +.. _saveconfig_bisref: + +Put the .config file aside +-------------------------- + + *Reprocess the .config after the latest changes and store it in a safe place.* + [:ref:`... <saveconfig_bissbs>`] + +Put the .config you prepared aside, as you want to copy it back to the build +directory every time during this guide before you start building another +kernel. That's because going back and forth between different versions can alter +.config files in odd ways; those occasionally cause side effects that could +confuse testing or in some cases render the result of your bisection +meaningless. + +[:ref:`back to step-by-step guide <saveconfig_bissbs>`] + +.. _introlatestcheck_bisref: + +Try to reproduce the regression +----------------------------------------- + + *Verify the regression is not caused by some .config change and check if it + still occurs with the latest codebase.* [:ref:`... <introlatestcheck_bissbs>`] + +For some readers it might seem unnecessary to check the latest codebase at this +point, especially if you did that already with a kernel prepared by your +distributor or face a regression within a stable/longterm series. But it's +highly recommended for these reasons: + +* You will run into any problems caused by your setup before you actually begin + a bisection. That will make it a lot easier to differentiate between 'this + most likely is some problem in my setup' and 'this change needs to be skipped + during the bisection, as the kernel sources at that stage contain an unrelated + problem that causes building or booting to fail'. + +* These steps will rule out if your problem is caused by some change in the + build configuration between the 'working' and the 'broken' kernel. This for + example can happen when your distributor enabled an additional security + feature in the newer kernel which was disabled or not yet supported by the + older kernel. That security feature might get into the way of something you + do -- in which case your problem from the perspective of the Linux kernel + upstream developers is not a regression, as + Documentation/admin-guide/reporting-regressions.rst explains in more detail. + You thus would waste your time if you'd try to bisect this. + +* If the cause for your regression was already fixed in the latest mainline + codebase, you'd perform the bisection for nothing. This holds true for a + regression you encountered with a stable/longterm release as well, as they are + often caused by problems in mainline changes that were backported -- in which + case the problem will have to be fixed in mainline first. Maybe it already was + fixed there and the fix is already in the process of being backported. + +* For regressions within a stable/longterm series it's furthermore crucial to + know if the issue is specific to that series or also happens in the mainline + kernel, as the report needs to be sent to different people: + + * Regressions specific to a stable/longterm series are the stable team's + responsibility; mainline Linux developers might or might not care. + + * Regressions also happening in mainline are something the regular Linux + developers and maintainers have to handle; the stable team does not care + and does not need to be involved in the report, they just should be told + to backport the fix once it's ready. + + Your report might be ignored if you send it to the wrong party -- and even + when you get a reply there is a decent chance that developers tell you to + evaluate which of the two cases it is before they take a closer look. + +[:ref:`back to step-by-step guide <introlatestcheck_bissbs>`] + +.. _checkoutmaster_bisref: + +Checkout the latest Linux codebase +---------------------------------- + + *Checkout the latest Linux codebase.* + [:ref:`... <introlatestcheck_bissbs>`] + +In case you later want to recheck if an ever newer codebase might fix the +problem, remember to run that ``git fetch --shallow-exclude [...]`` command +again mentioned earlier to update your local Git repository. + +[:ref:`back to step-by-step guide <introlatestcheck_bissbs>`] + +.. _build_bisref: + +Build your kernel +----------------- + + *Build the image and the modules of your first kernel using the config file + you prepared.* [:ref:`... <build_bissbs>`] + +A lot can go wrong at this stage, but the instructions below will help you help +yourself. Another subsection explains how to directly package your kernel up as +deb, rpm or tar file. + +Dealing with build errors +~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a build error occurs, it might be caused by some aspect of your machine's +setup that often can be fixed quickly; other times though the problem lies in +the code and can only be fixed by a developer. A close examination of the +failure messages coupled with some research on the internet will often tell you +which of the two it is. To perform such a investigation, restart the build +process like this:: + + make V=1 + +The ``V=1`` activates verbose output, which might be needed to see the actual +error. To make it easier to spot, this command also omits the ``-j $(nproc +--all)`` used earlier to utilize every CPU core in the system for the job -- but +this parallelism also results in some clutter when failures occur. + +After a few seconds the build process should run into the error again. Now try +to find the most crucial line describing the problem. Then search the internet +for the most important and non-generic section of that line (say 4 to 8 words); +avoid or remove anything that looks remotely system-specific, like your username +or local path names like ``/home/username/linux/``. First try your regular +internet search engine with that string, afterwards search Linux kernel mailing +lists via `lore.kernel.org/all/ <https://lore.kernel.org/all/>`_. + +This most of the time will find something that will explain what is wrong; quite +often one of the hits will provide a solution for your problem, too. If you +do not find anything that matches your problem, try again from a different angle +by modifying your search terms or using another line from the error messages. + +In the end, most trouble you are to run into has likely been encountered and +reported by others already. That includes issues where the cause is not your +system, but lies the code. If you run into one of those, you might thus find a +solution (e.g. a patch) or workaround for your problem, too. + +Package your kernel up +~~~~~~~~~~~~~~~~~~~~~~ + +The step-by-step guide uses the default make targets (e.g. 'bzImage' and +'modules' on x86) to build the image and the modules of your kernel, which later +steps of the guide then install. You instead can also directly build everything +and directly package it up by using one of the following targets: + + * ``make -j $(nproc --all) bindeb-pkg`` to generate a deb package + + * ``make -j $(nproc --all) binrpm-pkg`` to generate a rpm package + + * ``make -j $(nproc --all) tarbz2-pkg`` to generate a bz2 compressed tarball + +This is just a selection of available make targets for this purpose, see +``make help`` for others. You can also use these targets after running +``make -j $(nproc --all)``, as they will pick up everything already built. + +If you employ the targets to generate deb or rpm packages, ignore the +step-by-step guide's instructions on installing and removing your kernel; +instead install and remove the packages using the package utility for the format +(e.g. dpkg and rpm) or a package management utility build on top of them (apt, +aptitude, dnf/yum, zypper, ...). Be aware that the packages generated using +these two make targets are designed to work on various distributions utilizing +those formats, they thus will sometimes behave differently than your +distribution's kernel packages. + +[:ref:`back to step-by-step guide <build_bissbs>`] + +.. _install_bisref: + +Put the kernel in place +----------------------- + + *Install the kernel you just built.* [:ref:`... <install_bissbs>`] + +What you need to do after executing the command in the step-by-step guide +depends on the existence and the implementation of an ``installkernel`` +executable. Many commodity Linux distributions ship such a kernel installer in +'/sbin/' that does everything needed, hence there is nothing left for you +except rebooting. But some distributions contain an installkernel that does +only part of the job -- and a few lack it completely and leave all the work to +you. + +If ``installkernel`` is found, the kernel's build system will delegate the +actual installation of your kernel's image and related files to this executable. +On almost all Linux distributions it will store the image as '/boot/vmlinuz- +<kernelrelease identifier>' and put a 'System.map-<kernelrelease +identifier>' alongside it. Your kernel will thus be installed in parallel to any +existing ones, unless you already have one with exactly the same release name. + +Installkernel on many distributions will afterwards generate an 'initramfs' +(often also called 'initrd'), which commodity distributions rely on for booting; +hence be sure to keep the order of the two make targets used in the step-by-step +guide, as things will go sideways if you install your kernel's image before its +modules. Often installkernel will then add your kernel to the bootloader +configuration, too. You have to take care of one or both of these tasks +yourself, if your distributions installkernel doesn't handle them. + +A few distributions like Arch Linux and its derivatives totally lack an +installkernel executable. On those just install the modules using the kernel's +build system and then install the image and the System.map file manually:: + + sudo make modules_install + sudo install -m 0600 $(make -s image_name) /boot/vmlinuz-$(make -s kernelrelease) + sudo install -m 0600 System.map /boot/System.map-$(make -s kernelrelease) + +If your distribution boots with the help of an initramfs, now generate one for +your kernel using the tools your distribution provides for this process. +Afterwards add your kernel to your bootloader configuration and reboot. + +[:ref:`back to step-by-step guide <install_bissbs>`] + +.. _storagespace_bisref: + +Storage requirements per kernel +------------------------------- + + *Check how much storage space the kernel, its modules, and other related files + like the initramfs consume.* [:ref:`... <storagespace_bissbs>`] + +The kernels built during a bisection consume quite a bit of space in /boot/ and +/lib/modules/, especially if you enabled debug symbols. That makes it easy to +fill up volumes during a bisection -- and due to that even kernels which used to +work earlier might fail to boot. To prevent that you will need to know how much +space each installed kernel typically requires. + +Note, most of the time the pattern '/boot/*$(make -s kernelrelease)*' used in +the guide will match all files needed to boot your kernel -- but neither the +path nor the naming scheme are mandatory. On some distributions you thus will +need to look in different places. + +[:ref:`back to step-by-step guide <storagespace_bissbs>`] + +.. _recheckbroken_bisref: + +Check the kernel built from the latest codebase +----------------------------------------------- + + *Reboot into the kernel you just built and check if the feature that regressed + is really broken there.* [:ref:`... <recheckbroken_bissbs>`] + +There are a couple of reasons why the regression you face might not show up with +your own kernel built from the latest codebase. These are the most frequent: + +* The cause for the regression was fixed meanwhile. + +* The regression with the broken kernel was caused by a change in the build + configuration the provider of your kernel carried out. + +* Your problem might be a race condition that does not show up with your kernel; + the trimmed build configuration, a different setting for debug symbols, the + compiler used, and various other things can cause this. + +* In case you encountered the regression with a stable/longterm kernel it might + be a problem that is specific to that series; the next step in this guide will + check this. + +[:ref:`back to step-by-step guide <recheckbroken_bissbs>`] + +.. _recheckstablebroken_bisref: + +Check the kernel built from the latest stable/longterm codebase +--------------------------------------------------------------- + + *Are you facing a regression within a stable/longterm release, but failed to + reproduce it with the kernel you just built using the latest mainline sources? + Then check if the latest codebase for the particular series might already fix + the problem.* [:ref:`... <recheckstablebroken_bissbs>`] + +If this kernel does not show the regression either, there most likely is no need +for a bisection. + +[:ref:`back to step-by-step guide <recheckstablebroken_bissbs>`] + +.. _introworkingcheck_bisref: + +Ensure the 'good' version is really working well +------------------------------------------------ + + *Check if the kernels you build work fine.* + [:ref:`... <introworkingcheck_bissbs>`] + +This section will reestablish a known working base. Skipping it might be +appealing, but is usually a bad idea, as it does something important: + +It will ensure the .config file you prepared earlier actually works as expected. +That is in your own interest, as trimming the configuration is not foolproof -- +and you might be building and testing ten or more kernels for nothing before +starting to suspect something might be wrong with the build configuration. + +That alone is reason enough to spend the time on this, but not the only reason. + +Many readers of this guide normally run kernels that are patched, use add-on +modules, or both. Those kernels thus are not considered 'vanilla' -- therefore +it's possible that the thing that regressed might never have worked in vanilla +builds of the 'good' version in the first place. + +There is a third reason for those that noticed a regression between +stable/longterm kernels of different series (e.g. v6.0.13..v6.1.5): it will +ensure the kernel version you assumed to be 'good' earlier in the process (e.g. +v6.0) actually is working. + +[:ref:`back to step-by-step guide <introworkingcheck_bissbs>`] + +.. _recheckworking_bisref: + +Build your own version of the 'good' kernel +------------------------------------------- + + *Build your own variant of the working kernel and check if the feature that + regressed works as expected with it.* [:ref:`... <recheckworking_bissbs>`] + +In case the feature that broke with newer kernels does not work with your first +self-built kernel, find and resolve the cause before moving on. There are a +multitude of reasons why this might happen. Some ideas where to look: + +* Maybe localmodconfig did something odd and disabled the module required to + test the feature? Then you might want to recreate a .config file based on the + one from the last working kernel and skip trimming it down; manually disabling + some features in the .config might work as well to reduce the build time. + +* Maybe it's not a kernel regression and something that is caused by some fluke, + a broken initramfs (also known as initrd), new firmware files, or an updated + userland software? + +* Maybe it was a feature added to your distributor's kernel which vanilla Linux + at that point never supported? + +Note, if you found and fixed problems with the .config file, you want to use it +to build another kernel from the latest codebase, as your earlier tests with +mainline and the latest version from an affected stable/longterm series most +likely has been flawed. + +[:ref:`back to step-by-step guide <recheckworking_bissbs>`] + +.. _bisectstart_bisref: + +Start the bisection +------------------- + + *Start the bisection and tell Git about the versions earlier established as + 'good' and 'bad'.* [:ref:`... <bisectstart_bissbs>`] + +This will start the bisection process; the last of the commands will make Git +checkout a commit round about half-way between the 'good' and the 'bad' changes +for your to test. + +[:ref:`back to step-by-step guide <bisectstart_bissbs>`] + +.. _bisectbuild_bisref: + +Build a kernel from the bisection point +--------------------------------------- + + *Build, install, and boot a kernel from the code Git checked out using the + same commands you used earlier.* [:ref:`... <bisectbuild_bissbs>`] + +There are two things worth of note here: + +* Occasionally building the kernel will fail or it might not boot due some + problem in the code at the bisection point. In that case run this command:: + + git bisect skip + + Git will then check out another commit nearby which with a bit of luck should + work better. Afterwards restart executing this step. + +* Those slightly odd looking version identifiers can happen during bisections, + because the Linux kernel subsystems prepare their changes for a new mainline + release (say 6.2) before its predecessor (e.g. 6.1) is finished. They thus + base them on a somewhat earlier point like v6.1-rc1 or even v6.0 -- and then + get merged for 6.2 without rebasing nor squashing them once 6.1 is out. This + leads to those slightly odd looking version identifiers coming up during + bisections. + +[:ref:`back to step-by-step guide <bisectbuild_bissbs>`] + +.. _bisecttest_bisref: + +Bisection checkpoint +-------------------- + + *Check if the feature that regressed works in the kernel you just built.* + [:ref:`... <bisecttest_bissbs>`] + +Ensure what you tell Git is accurate: getting it wrong just one time will bring +the rest of the bisection totally of course, hence all testing after that point +will be for nothing. + +[:ref:`back to step-by-step guide <bisecttest_bissbs>`] + +.. _bisectlog_bisref: + +Put the bisection log away +-------------------------- + + *Store Git's bisection log and the current .config file in a safe place.* + [:ref:`... <bisectlog_bissbs>`] + +As indicated above: declaring just one kernel wrongly as 'good' or 'bad' will +render the end result of a bisection useless. In that case you'd normally have +to restart the bisection from scratch. The log can prevent that, as it might +allow someone to point out where a bisection likely went sideways -- and then +instead of testing ten or more kernels you might only have to build a few to +resolve things. + +The .config file is put aside, as there is a decent chance that developers might +ask for it after you reported the regression. + +[:ref:`back to step-by-step guide <bisectlog_bissbs>`] + +.. _revert_bisref: + +Try reverting the culprit +------------------------- + + *Try reverting the culprit on top of the latest codebase to see if this fixes + your regression.* [:ref:`... <revert_bissbs>`] + +This is an optional step, but whenever possible one you should try: there is a +decent chance that developers will ask you to perform this step when you bring +the bisection result up. So give it a try, you are in the flow already, building +one more kernel shouldn't be a big deal at this point. + +The step-by-step guide covers everything relevant already except one slightly +rare thing: did you bisected a regression that also happened with mainline using +a stable/longterm series, but Git failed to revert the commit in mainline? Then +try to revert the culprit in the affected stable/longterm series -- and if that +succeeds, test that kernel version instead. + +[:ref:`back to step-by-step guide <revert_bissbs>`] + + +Supplementary tasks: cleanup during and after the bisection +----------------------------------------------------------- + +.. _makeroom_bisref: + +Cleaning up during the bisection +-------------------------------- + + *To remove one of the kernels you installed, look up its 'kernelrelease' + identifier.* [:ref:`... <makeroom_bissbs>`] + +The kernels you install during this process are easy to remove later, as its +parts are only stored in two places and clearly identifiable. You thus do not +need to worry to mess up your machine when you install a kernel manually (and +thus bypass your distribution's packaging system): all parts of your kernels are +relatively easy to remove later. + +One of the two places is a directory in /lib/modules/, which holds the modules +for each installed kernel. This directory is named after the kernel's release +identifier; hence, to remove all modules for one of the kernels you built, +simply remove its modules directory in /lib/modules/. + +The other place is /boot/, where typically two up to five files will be placed +during installation of a kernel. All of them usually contain the release name in +their file name, but how many files and their exact name depends somewhat on +your distribution's installkernel executable and its initramfs generator. On +some distributions the ``kernel-install remove...`` command mentioned in the +step-by-step guide will delete all of these files for you while also removing +the menu entry for the kernel from your bootloader configuration. On others you +have to take care of these two tasks yourself. The following command should +interactively remove the three main files of a kernel with the release name +'6.0-rc1-local-gcafec0cacaca0':: + + rm -i /boot/{System.map,vmlinuz,initr}-6.0-rc1-local-gcafec0cacaca0 + +Afterwards check for other files in /boot/ that have +'6.0-rc1-local-gcafec0cacaca0' in their name and consider deleting them as well. +Now remove the boot entry for the kernel from your bootloader's configuration; +the steps to do that vary quite a bit between Linux distributions. + +Note, be careful with wildcards like '*' when deleting files or directories +for kernels manually: you might accidentally remove files of a 6.0.11 kernel +when all you want is to remove 6.0 or 6.0.1. + +[:ref:`back to step-by-step guide <makeroom_bissbs>`] + +Cleaning up after the bisection +------------------------------- + +.. _finishingtouch_bisref: + + *Once you have finished the bisection, do not immediately remove anything + you set up, as you might need a few things again.* + [:ref:`... <finishingtouch_bissbs>`] + +When you are really short of storage space removing the kernels as described in +the step-by-step guide might not free as much space as you would like. In that +case consider running ``rm -rf ~/linux/*`` as well now. This will remove the +build artifacts and the Linux sources, but will leave the Git repository +(~/linux/.git/) behind -- a simple ``git reset --hard`` thus will bring the +sources back. + +Removing the repository as well would likely be unwise at this point: there is a +decent chance developers will ask you to build another kernel to perform +additional tests. This is often required to debug an issue or check proposed +fixes. Before doing so you want to run the ``git fetch mainline`` command again +followed by ``git checkout mainline/master`` to bring your clone up to date and +checkout the latest codebase. Then apply the patch using ``git apply +<filename>`` or ``git am <filename>`` and build yet another kernel using the +familiar commands. + +Additional tests are also the reason why you want to keep the +~/kernel-config-working file around for a few weeks. + +[:ref:`back to step-by-step guide <finishingtouch_bissbs>`] + + +Additional reading material +=========================== + +Further sources +--------------- + +* The `man page for 'git bisect' <https://git-scm.com/docs/git-bisect>`_ and + `fighting regressions with 'git bisect' <https://git-scm.com/docs/git-bisect-lk2009.html>`_ + in the Git documentation. +* `Working with git bisect <https://nathanchance.dev/posts/working-with-git-bisect/>`_ + from kernel developer Nathan Chancellor. +* `Using Git bisect to figure out when brokenness was introduced <http://webchick.net/node/99>`_. +* `Fully automated bisecting with 'git bisect run' <https://lwn.net/Articles/317154>`_. + +.. + end-of-content +.. + This document is maintained by Thorsten Leemhuis <linux@leemhuis.info>. If + you spot a typo or small mistake, feel free to let him know directly and + he'll fix it. You are free to do the same in a mostly informal way if you + want to contribute changes to the text -- but for copyright reasons please CC + linux-doc@vger.kernel.org and 'sign-off' your contribution as + Documentation/process/submitting-patches.rst explains in the section 'Sign + your work - the Developer's Certificate of Origin'. +.. + This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top + of the file. If you want to distribute this text under CC-BY-4.0 only, + please use 'The Linux kernel development community' for author attribution + and link this as source: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst + +.. + Note: Only the content of this RST file as found in the Linux kernel sources + is available under CC-BY-4.0, as versions of this text that were processed + (for example by the kernel's build system) might contain content taken from + files which use a more restrictive license. diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst index e8c2ce1f9df6..45a7f4932fe0 100644 --- a/Documentation/arch/arm64/silicon-errata.rst +++ b/Documentation/arch/arm64/silicon-errata.rst @@ -243,3 +243,10 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ASR | ASR8601 | #8601001 | N/A | +----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Microsoft | Azure Cobalt 100| #2139208 | ARM64_ERRATUM_2139208 | ++----------------+-----------------+-----------------+-----------------------------+ +| Microsoft | Azure Cobalt 100| #2067961 | ARM64_ERRATUM_2067961 | ++----------------+-----------------+-----------------+-----------------------------+ +| Microsoft | Azure Cobalt 100| #2253138 | ARM64_ERRATUM_2253138 | ++----------------+-----------------+-----------------+-----------------------------+ diff --git a/Documentation/arch/x86/amd-memory-encryption.rst b/Documentation/arch/x86/amd-memory-encryption.rst index 07caa8fff852..414bc7402ae7 100644 --- a/Documentation/arch/x86/amd-memory-encryption.rst +++ b/Documentation/arch/x86/amd-memory-encryption.rst @@ -87,14 +87,14 @@ The state of SME in the Linux kernel can be documented as follows: kernel is non-zero). SME can also be enabled and activated in the BIOS. If SME is enabled and -activated in the BIOS, then all memory accesses will be encrypted and it will -not be necessary to activate the Linux memory encryption support. If the BIOS -merely enables SME (sets bit 23 of the MSR_AMD64_SYSCFG), then Linux can activate -memory encryption by default (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y) or -by supplying mem_encrypt=on on the kernel command line. However, if BIOS does -not enable SME, then Linux will not be able to activate memory encryption, even -if configured to do so by default or the mem_encrypt=on command line parameter -is specified. +activated in the BIOS, then all memory accesses will be encrypted and it +will not be necessary to activate the Linux memory encryption support. + +If the BIOS merely enables SME (sets bit 23 of the MSR_AMD64_SYSCFG), +then memory encryption can be enabled by supplying mem_encrypt=on on the +kernel command line. However, if BIOS does not enable SME, then Linux +will not be able to activate memory encryption, even if configured to do +so by default or the mem_encrypt=on command line parameter is specified. Secure Nested Paging (SNP) ========================== diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst index c513855a54bb..4fd492cb4970 100644 --- a/Documentation/arch/x86/boot.rst +++ b/Documentation/arch/x86/boot.rst @@ -878,7 +878,8 @@ Protocol: 2.10+ address if possible. A non-relocatable kernel will unconditionally move itself and to run - at this address. + at this address. A relocatable kernel will move itself to this address if it + loaded below this address. ============ ======= Field name: init_size diff --git a/Documentation/arch/x86/mds.rst b/Documentation/arch/x86/mds.rst index e73fdff62c0a..c58c72362911 100644 --- a/Documentation/arch/x86/mds.rst +++ b/Documentation/arch/x86/mds.rst @@ -95,6 +95,9 @@ The kernel provides a function to invoke the buffer clearing: mds_clear_cpu_buffers() +Also macro CLEAR_CPU_BUFFERS can be used in ASM late in exit-to-user path. +Other than CFLAGS.ZF, this macro doesn't clobber any registers. + The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state (idle) transitions. @@ -138,17 +141,30 @@ Mitigation points When transitioning from kernel to user space the CPU buffers are flushed on affected CPUs when the mitigation is not disabled on the kernel - command line. The migitation is enabled through the static key - mds_user_clear. - - The mitigation is invoked in prepare_exit_to_usermode() which covers - all but one of the kernel to user space transitions. The exception - is when we return from a Non Maskable Interrupt (NMI), which is - handled directly in do_nmi(). - - (The reason that NMI is special is that prepare_exit_to_usermode() can - enable IRQs. In NMI context, NMIs are blocked, and we don't want to - enable IRQs with NMIs blocked.) + command line. The mitigation is enabled through the feature flag + X86_FEATURE_CLEAR_CPU_BUF. + + The mitigation is invoked just before transitioning to userspace after + user registers are restored. This is done to minimize the window in + which kernel data could be accessed after VERW e.g. via an NMI after + VERW. + + **Corner case not handled** + Interrupts returning to kernel don't clear CPUs buffers since the + exit-to-user path is expected to do that anyways. But, there could be + a case when an NMI is generated in kernel after the exit-to-user path + has cleared the buffers. This case is not handled and NMI returning to + kernel don't clear CPU buffers because: + + 1. It is rare to get an NMI after VERW, but before returning to userspace. + 2. For an unprivileged user, there is no known way to make that NMI + less rare or target it. + 3. It would take a large number of these precisely-timed NMIs to mount + an actual attack. There's presumably not enough bandwidth. + 4. The NMI in question occurs after a VERW, i.e. when user state is + restored and most interesting data is already scrubbed. Whats left + is only the data that NMI touches, and that may or may not be of + any interest. 2. C-State transition diff --git a/Documentation/arch/x86/pti.rst b/Documentation/arch/x86/pti.rst index e08d35177bc0..57e8392f61d3 100644 --- a/Documentation/arch/x86/pti.rst +++ b/Documentation/arch/x86/pti.rst @@ -26,9 +26,9 @@ comments in pti.c). This approach helps to ensure that side-channel attacks leveraging the paging structures do not function when PTI is enabled. It can be -enabled by setting CONFIG_PAGE_TABLE_ISOLATION=y at compile time. -Once enabled at compile-time, it can be disabled at boot with the -'nopti' or 'pti=' kernel parameters (see kernel-parameters.txt). +enabled by setting CONFIG_MITIGATION_PAGE_TABLE_ISOLATION=y at compile +time. Once enabled at compile-time, it can be disabled at boot with +the 'nopti' or 'pti=' kernel parameters (see kernel-parameters.txt). Page Table Management ===================== diff --git a/Documentation/arch/x86/topology.rst b/Documentation/arch/x86/topology.rst index 08ebf9edbfc1..7352ab89a55a 100644 --- a/Documentation/arch/x86/topology.rst +++ b/Documentation/arch/x86/topology.rst @@ -47,17 +47,21 @@ AMD nomenclature for package is 'Node'. Package-related topology information in the kernel: - - cpuinfo_x86.x86_max_cores: + - topology_num_threads_per_package() - The number of cores in a package. This information is retrieved via CPUID. + The number of threads in a package. - - cpuinfo_x86.x86_max_dies: + - topology_num_cores_per_package() - The number of dies in a package. This information is retrieved via CPUID. + The number of cores in a package. + + - topology_max_dies_per_package() + + The maximum number of dies in a package. - cpuinfo_x86.topo.die_id: - The physical ID of the die. This information is retrieved via CPUID. + The physical ID of the die. - cpuinfo_x86.topo.pkg_id: @@ -96,16 +100,6 @@ are SMT- or CMT-type threads. AMDs nomenclature for a CMT core is "Compute Unit". The kernel always uses "core". -Core-related topology information in the kernel: - - - smp_num_siblings: - - The number of threads in a core. The number of threads in a package can be - calculated by:: - - threads_per_package = cpuinfo_x86.x86_max_cores * smp_num_siblings - - Threads ======= A thread is a single scheduling unit. It's the equivalent to a logical Linux diff --git a/Documentation/arch/x86/x86_64/fred.rst b/Documentation/arch/x86/x86_64/fred.rst new file mode 100644 index 000000000000..9f57e7b91f7e --- /dev/null +++ b/Documentation/arch/x86/x86_64/fred.rst @@ -0,0 +1,96 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================================= +Flexible Return and Event Delivery (FRED) +========================================= + +Overview +======== + +The FRED architecture defines simple new transitions that change +privilege level (ring transitions). The FRED architecture was +designed with the following goals: + +1) Improve overall performance and response time by replacing event + delivery through the interrupt descriptor table (IDT event + delivery) and event return by the IRET instruction with lower + latency transitions. + +2) Improve software robustness by ensuring that event delivery + establishes the full supervisor context and that event return + establishes the full user context. + +The new transitions defined by the FRED architecture are FRED event +delivery and, for returning from events, two FRED return instructions. +FRED event delivery can effect a transition from ring 3 to ring 0, but +it is used also to deliver events incident to ring 0. One FRED +instruction (ERETU) effects a return from ring 0 to ring 3, while the +other (ERETS) returns while remaining in ring 0. Collectively, FRED +event delivery and the FRED return instructions are FRED transitions. + +In addition to these transitions, the FRED architecture defines a new +instruction (LKGS) for managing the state of the GS segment register. +The LKGS instruction can be used by 64-bit operating systems that do +not use the new FRED transitions. + +Furthermore, the FRED architecture is easy to extend for future CPU +architectures. + +Software based event dispatching +================================ + +FRED operates differently from IDT in terms of event handling. Instead +of directly dispatching an event to its handler based on the event +vector, FRED requires the software to dispatch an event to its handler +based on both the event's type and vector. Therefore, an event dispatch +framework must be implemented to facilitate the event-to-handler +dispatch process. The FRED event dispatch framework takes control +once an event is delivered, and employs a two-level dispatch. + +The first level dispatching is event type based, and the second level +dispatching is event vector based. + +Full supervisor/user context +============================ + +FRED event delivery atomically save and restore full supervisor/user +context upon event delivery and return. Thus it avoids the problem of +transient states due to %cr2 and/or %dr6, and it is no longer needed +to handle all the ugly corner cases caused by half baked entry states. + +FRED allows explicit unblock of NMI with new event return instructions +ERETS/ERETU, avoiding the mess caused by IRET which unconditionally +unblocks NMI, e.g., when an exception happens during NMI handling. + +FRED always restores the full value of %rsp, thus ESPFIX is no longer +needed when FRED is enabled. + +LKGS +==== + +LKGS behaves like the MOV to GS instruction except that it loads the +base address into the IA32_KERNEL_GS_BASE MSR instead of the GS +segment’s descriptor cache. With LKGS, it ends up with avoiding +mucking with kernel GS, i.e., an operating system can always operate +with its own GS base address. + +Because FRED event delivery from ring 3 and ERETU both swap the value +of the GS base address and that of the IA32_KERNEL_GS_BASE MSR, plus +the introduction of LKGS instruction, the SWAPGS instruction is no +longer needed when FRED is enabled, thus is disallowed (#UD). + +Stack levels +============ + +4 stack levels 0~3 are introduced to replace the nonreentrant IST for +event handling, and each stack level should be configured to use a +dedicated stack. + +The current stack level could be unchanged or go higher upon FRED +event delivery. If unchanged, the CPU keeps using the current event +stack. If higher, the CPU switches to a new event stack specified by +the MSR of the new stack level, i.e., MSR_IA32_FRED_RSP[123]. + +Only execution of a FRED return instruction ERET[US], could lower the +current stack level, causing the CPU to switch back to the stack it was +on before a previous event delivery that promoted the stack level. diff --git a/Documentation/arch/x86/x86_64/index.rst b/Documentation/arch/x86/x86_64/index.rst index a56070fc8e77..ad15e9bd623f 100644 --- a/Documentation/arch/x86/x86_64/index.rst +++ b/Documentation/arch/x86/x86_64/index.rst @@ -15,3 +15,4 @@ x86_64 Support cpu-hotplug-spec machinecheck fsgs + fred diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index 7985c6615f3c..a8f5782bd833 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -177,10 +177,10 @@ In addition to kfuncs' arguments, verifier may need more information about the type of kfunc(s) being registered with the BPF subsystem. To do so, we define flags on a set of kfuncs as follows:: - BTF_SET8_START(bpf_task_set) + BTF_KFUNCS_START(bpf_task_set) BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) - BTF_SET8_END(bpf_task_set) + BTF_KFUNCS_END(bpf_task_set) This set encodes the BTF ID of each kfunc listed above, and encodes the flags along with it. Ofcourse, it is also allowed to specify no flags. @@ -347,10 +347,10 @@ Once the kfunc is prepared for use, the final step to making it visible is registering it with the BPF subsystem. Registration is done per BPF program type. An example is shown below:: - BTF_SET8_START(bpf_task_set) + BTF_KFUNCS_START(bpf_task_set) BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) - BTF_SET8_END(bpf_task_set) + BTF_KFUNCS_END(bpf_task_set) static const struct btf_kfunc_id_set bpf_task_kfunc_set = { .owner = THIS_MODULE, diff --git a/Documentation/bpf/map_lpm_trie.rst b/Documentation/bpf/map_lpm_trie.rst index 74d64a30f500..f9cd579496c9 100644 --- a/Documentation/bpf/map_lpm_trie.rst +++ b/Documentation/bpf/map_lpm_trie.rst @@ -17,7 +17,7 @@ significant byte. LPM tries may be created with a maximum prefix length that is a multiple of 8, in the range from 8 to 2048. The key used for lookup and update -operations is a ``struct bpf_lpm_trie_key``, extended by +operations is a ``struct bpf_lpm_trie_key_u8``, extended by ``max_prefixlen/8`` bytes. - For IPv4 addresses the data length is 4 bytes diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst index 245b6defc298..a5ab00ac0b14 100644 --- a/Documentation/bpf/standardization/instruction-set.rst +++ b/Documentation/bpf/standardization/instruction-set.rst @@ -1,11 +1,11 @@ .. contents:: .. sectnum:: -======================================= -BPF Instruction Set Specification, v1.0 -======================================= +====================================== +BPF Instruction Set Architecture (ISA) +====================================== -This document specifies version 1.0 of the BPF instruction set. +This document specifies the BPF instruction set architecture (ISA). Documentation conventions ========================= @@ -24,22 +24,22 @@ a type's signedness (`S`) and bit width (`N`), respectively. .. table:: Meaning of signedness notation. ==== ========= - `S` Meaning + S Meaning ==== ========= - `u` unsigned - `s` signed + u unsigned + s signed ==== ========= .. table:: Meaning of bit-width notation. ===== ========= - `N` Bit width + N Bit width ===== ========= - `8` 8 bits - `16` 16 bits - `32` 32 bits - `64` 64 bits - `128` 128 bits + 8 8 bits + 16 16 bits + 32 32 bits + 64 64 bits + 128 128 bits ===== ========= For example, `u32` is a type whose valid values are all the 32-bit unsigned @@ -48,31 +48,31 @@ numbers. Functions --------- -* `htobe16`: Takes an unsigned 16-bit number in host-endian format and +* htobe16: Takes an unsigned 16-bit number in host-endian format and returns the equivalent number as an unsigned 16-bit number in big-endian format. -* `htobe32`: Takes an unsigned 32-bit number in host-endian format and +* htobe32: Takes an unsigned 32-bit number in host-endian format and returns the equivalent number as an unsigned 32-bit number in big-endian format. -* `htobe64`: Takes an unsigned 64-bit number in host-endian format and +* htobe64: Takes an unsigned 64-bit number in host-endian format and returns the equivalent number as an unsigned 64-bit number in big-endian format. -* `htole16`: Takes an unsigned 16-bit number in host-endian format and +* htole16: Takes an unsigned 16-bit number in host-endian format and returns the equivalent number as an unsigned 16-bit number in little-endian format. -* `htole32`: Takes an unsigned 32-bit number in host-endian format and +* htole32: Takes an unsigned 32-bit number in host-endian format and returns the equivalent number as an unsigned 32-bit number in little-endian format. -* `htole64`: Takes an unsigned 64-bit number in host-endian format and +* htole64: Takes an unsigned 64-bit number in host-endian format and returns the equivalent number as an unsigned 64-bit number in little-endian format. -* `bswap16`: Takes an unsigned 16-bit number in either big- or little-endian +* bswap16: Takes an unsigned 16-bit number in either big- or little-endian format and returns the equivalent number with the same bit width but opposite endianness. -* `bswap32`: Takes an unsigned 32-bit number in either big- or little-endian +* bswap32: Takes an unsigned 32-bit number in either big- or little-endian format and returns the equivalent number with the same bit width but opposite endianness. -* `bswap64`: Takes an unsigned 64-bit number in either big- or little-endian +* bswap64: Takes an unsigned 64-bit number in either big- or little-endian format and returns the equivalent number with the same bit width but opposite endianness. @@ -97,40 +97,101 @@ Definitions A: 10000110 B: 11111111 10000110 +Conformance groups +------------------ + +An implementation does not need to support all instructions specified in this +document (e.g., deprecated instructions). Instead, a number of conformance +groups are specified. An implementation must support the base32 conformance +group and may support additional conformance groups, where supporting a +conformance group means it must support all instructions in that conformance +group. + +The use of named conformance groups enables interoperability between a runtime +that executes instructions, and tools as such compilers that generate +instructions for the runtime. Thus, capability discovery in terms of +conformance groups might be done manually by users or automatically by tools. + +Each conformance group has a short ASCII label (e.g., "base32") that +corresponds to a set of instructions that are mandatory. That is, each +instruction has one or more conformance groups of which it is a member. + +This document defines the following conformance groups: + +* base32: includes all instructions defined in this + specification unless otherwise noted. +* base64: includes base32, plus instructions explicitly noted + as being in the base64 conformance group. +* atomic32: includes 32-bit atomic operation instructions (see `Atomic operations`_). +* atomic64: includes atomic32, plus 64-bit atomic operation instructions. +* divmul32: includes 32-bit division, multiplication, and modulo instructions. +* divmul64: includes divmul32, plus 64-bit division, multiplication, + and modulo instructions. +* packet: deprecated packet access instructions. + Instruction encoding ==================== BPF has two instruction encodings: * the basic instruction encoding, which uses 64 bits to encode an instruction -* the wide instruction encoding, which appends a second 64-bit immediate (i.e., - constant) value after the basic instruction for a total of 128 bits. +* the wide instruction encoding, which appends a second 64 bits + after the basic instruction for a total of 128 bits. -The fields conforming an encoded basic instruction are stored in the -following order:: +Basic instruction encoding +-------------------------- - opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF. - opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF. +A basic instruction is encoded as follows:: -**imm** - signed integer immediate value + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | opcode | regs | offset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | imm | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -**offset** - signed integer offset used with pointer arithmetic +**opcode** + operation to perform, encoded as follows:: -**src_reg** - the source register number (0-10), except where otherwise specified - (`64-bit immediate instructions`_ reuse this field for other purposes) + +-+-+-+-+-+-+-+-+ + |specific |class| + +-+-+-+-+-+-+-+-+ -**dst_reg** - destination register number (0-10) + **specific** + The format of these bits varies by instruction class -**opcode** - operation to perform + **class** + The instruction class (see `Instruction classes`_) + +**regs** + The source and destination register numbers, encoded as follows + on a little-endian host:: + + +-+-+-+-+-+-+-+-+ + |src_reg|dst_reg| + +-+-+-+-+-+-+-+-+ + + and as follows on a big-endian host:: + + +-+-+-+-+-+-+-+-+ + |dst_reg|src_reg| + +-+-+-+-+-+-+-+-+ + + **src_reg** + the source register number (0-10), except where otherwise specified + (`64-bit immediate instructions`_ reuse this field for other purposes) + + **dst_reg** + destination register number (0-10) + +**offset** + signed integer offset used with pointer arithmetic + +**imm** + signed integer immediate value -Note that the contents of multi-byte fields ('imm' and 'offset') are -stored using big-endian byte ordering in big-endian BPF and -little-endian byte ordering in little-endian BPF. +Note that the contents of multi-byte fields ('offset' and 'imm') are +stored using big-endian byte ordering on big-endian hosts and +little-endian byte ordering on little-endian hosts. For example:: @@ -143,71 +204,83 @@ For example:: Note that most instructions do not use all of the fields. Unused fields shall be cleared to zero. -As discussed below in `64-bit immediate instructions`_, a 64-bit immediate -instruction uses a 64-bit immediate value that is constructed as follows. -The 64 bits following the basic instruction contain a pseudo instruction -using the same format but with opcode, dst_reg, src_reg, and offset all set to zero, -and imm containing the high 32 bits of the immediate value. +Wide instruction encoding +-------------------------- + +Some instructions are defined to use the wide instruction encoding, +which uses two 32-bit immediate values. The 64 bits following +the basic instruction format contain a pseudo instruction +with 'opcode', 'dst_reg', 'src_reg', and 'offset' all set to zero. This is depicted in the following figure:: - basic_instruction - .-----------------------------. - | | - code:8 regs:8 offset:16 imm:32 unused:32 imm:32 - | | - '--------------' - pseudo instruction + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | opcode | regs | offset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | imm | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | reserved | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | next_imm | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +**opcode** + operation to perform, encoded as explained above -Thus the 64-bit immediate value is constructed as follows: +**regs** + The source and destination register numbers, encoded as explained above + +**offset** + signed integer offset used with pointer arithmetic + +**imm** + signed integer immediate value - imm64 = (next_imm << 32) | imm +**reserved** + unused, set to zero -where 'next_imm' refers to the imm value of the pseudo instruction -following the basic instruction. The unused bytes in the pseudo -instruction are reserved and shall be cleared to zero. +**next_imm** + second signed integer immediate value Instruction classes ------------------- -The three LSB bits of the 'opcode' field store the instruction class: - -========= ===== =============================== =================================== -class value description reference -========= ===== =============================== =================================== -BPF_LD 0x00 non-standard load operations `Load and store instructions`_ -BPF_LDX 0x01 load into register operations `Load and store instructions`_ -BPF_ST 0x02 store from immediate operations `Load and store instructions`_ -BPF_STX 0x03 store from register operations `Load and store instructions`_ -BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump instructions`_ -BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump instructions`_ -BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump instructions`_ -BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump instructions`_ -========= ===== =============================== =================================== +The three least significant bits of the 'opcode' field store the instruction class: + +===== ===== =============================== =================================== +class value description reference +===== ===== =============================== =================================== +LD 0x0 non-standard load operations `Load and store instructions`_ +LDX 0x1 load into register operations `Load and store instructions`_ +ST 0x2 store from immediate operations `Load and store instructions`_ +STX 0x3 store from register operations `Load and store instructions`_ +ALU 0x4 32-bit arithmetic operations `Arithmetic and jump instructions`_ +JMP 0x5 64-bit jump operations `Arithmetic and jump instructions`_ +JMP32 0x6 32-bit jump operations `Arithmetic and jump instructions`_ +ALU64 0x7 64-bit arithmetic operations `Arithmetic and jump instructions`_ +===== ===== =============================== =================================== Arithmetic and jump instructions ================================ -For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and -``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts: +For arithmetic and jump instructions (``ALU``, ``ALU64``, ``JMP`` and +``JMP32``), the 8-bit 'opcode' field is divided into three parts:: -============== ====== ================= -4 bits (MSB) 1 bit 3 bits (LSB) -============== ====== ================= -code source instruction class -============== ====== ================= + +-+-+-+-+-+-+-+-+ + | code |s|class| + +-+-+-+-+-+-+-+-+ **code** the operation code, whose meaning varies by instruction class -**source** +**s (source)** the source operand location, which unless otherwise specified is one of: ====== ===== ============================================== source value description ====== ===== ============================================== - BPF_K 0x00 use 32-bit 'imm' value as source operand - BPF_X 0x08 use 'src_reg' register value as source operand + K 0 use 32-bit 'imm' value as source operand + X 1 use 'src_reg' register value as source operand ====== ===== ============================================== **instruction class** @@ -216,70 +289,75 @@ code source instruction class Arithmetic instructions ----------------------- -``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for -otherwise identical operations. +``ALU`` uses 32-bit wide operands while ``ALU64`` uses 64-bit wide operands for +otherwise identical operations. ``ALU64`` instructions belong to the +base64 conformance group unless noted otherwise. The 'code' field encodes the operation as below, where 'src' and 'dst' refer to the values of the source and destination registers, respectively. -========= ===== ======= ========================================================== -code value offset description -========= ===== ======= ========================================================== -BPF_ADD 0x00 0 dst += src -BPF_SUB 0x10 0 dst -= src -BPF_MUL 0x20 0 dst \*= src -BPF_DIV 0x30 0 dst = (src != 0) ? (dst / src) : 0 -BPF_SDIV 0x30 1 dst = (src != 0) ? (dst s/ src) : 0 -BPF_OR 0x40 0 dst \|= src -BPF_AND 0x50 0 dst &= src -BPF_LSH 0x60 0 dst <<= (src & mask) -BPF_RSH 0x70 0 dst >>= (src & mask) -BPF_NEG 0x80 0 dst = -dst -BPF_MOD 0x90 0 dst = (src != 0) ? (dst % src) : dst -BPF_SMOD 0x90 1 dst = (src != 0) ? (dst s% src) : dst -BPF_XOR 0xa0 0 dst ^= src -BPF_MOV 0xb0 0 dst = src -BPF_MOVSX 0xb0 8/16/32 dst = (s8,s16,s32)src -BPF_ARSH 0xc0 0 :term:`sign extending<Sign Extend>` dst >>= (src & mask) -BPF_END 0xd0 0 byte swap operations (see `Byte swap instructions`_ below) -========= ===== ======= ========================================================== +===== ===== ======= ========================================================== +name code offset description +===== ===== ======= ========================================================== +ADD 0x0 0 dst += src +SUB 0x1 0 dst -= src +MUL 0x2 0 dst \*= src +DIV 0x3 0 dst = (src != 0) ? (dst / src) : 0 +SDIV 0x3 1 dst = (src != 0) ? (dst s/ src) : 0 +OR 0x4 0 dst \|= src +AND 0x5 0 dst &= src +LSH 0x6 0 dst <<= (src & mask) +RSH 0x7 0 dst >>= (src & mask) +NEG 0x8 0 dst = -dst +MOD 0x9 0 dst = (src != 0) ? (dst % src) : dst +SMOD 0x9 1 dst = (src != 0) ? (dst s% src) : dst +XOR 0xa 0 dst ^= src +MOV 0xb 0 dst = src +MOVSX 0xb 8/16/32 dst = (s8,s16,s32)src +ARSH 0xc 0 :term:`sign extending<Sign Extend>` dst >>= (src & mask) +END 0xd 0 byte swap operations (see `Byte swap instructions`_ below) +===== ===== ======= ========================================================== Underflow and overflow are allowed during arithmetic operations, meaning the 64-bit or 32-bit value will wrap. If BPF program execution would result in division by zero, the destination register is instead set to zero. -If execution would result in modulo by zero, for ``BPF_ALU64`` the value of -the destination register is unchanged whereas for ``BPF_ALU`` the upper +If execution would result in modulo by zero, for ``ALU64`` the value of +the destination register is unchanged whereas for ``ALU`` the upper 32 bits of the destination register are zeroed. -``BPF_ADD | BPF_X | BPF_ALU`` means:: +``{ADD, X, ALU}``, where 'code' = ``ADD``, 'source' = ``X``, and 'class' = ``ALU``, means:: dst = (u32) ((u32) dst + (u32) src) where '(u32)' indicates that the upper 32 bits are zeroed. -``BPF_ADD | BPF_X | BPF_ALU64`` means:: +``{ADD, X, ALU64}`` means:: dst = dst + src -``BPF_XOR | BPF_K | BPF_ALU`` means:: +``{XOR, K, ALU}`` means:: - dst = (u32) dst ^ (u32) imm32 + dst = (u32) dst ^ (u32) imm -``BPF_XOR | BPF_K | BPF_ALU64`` means:: +``{XOR, K, ALU64}`` means:: - dst = dst ^ imm32 + dst = dst ^ imm Note that most instructions have instruction offset of 0. Only three instructions -(``BPF_SDIV``, ``BPF_SMOD``, ``BPF_MOVSX``) have a non-zero offset. +(``SDIV``, ``SMOD``, ``MOVSX``) have a non-zero offset. +Division, multiplication, and modulo operations for ``ALU`` are part +of the "divmul32" conformance group, and division, multiplication, and +modulo operations for ``ALU64`` are part of the "divmul64" conformance +group. The division and modulo operations support both unsigned and signed flavors. -For unsigned operations (``BPF_DIV`` and ``BPF_MOD``), for ``BPF_ALU``, -'imm' is interpreted as a 32-bit unsigned value. For ``BPF_ALU64``, +For unsigned operations (``DIV`` and ``MOD``), for ``ALU``, +'imm' is interpreted as a 32-bit unsigned value. For ``ALU64``, 'imm' is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then interpreted as a 64-bit unsigned value. -For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``, -'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm' +For signed operations (``SDIV`` and ``SMOD``), for ``ALU``, +'imm' is interpreted as a 32-bit signed value. For ``ALU64``, 'imm' is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then interpreted as a 64-bit signed value. @@ -291,11 +369,15 @@ etc. This specification requires that signed modulo use truncated division a % n = a - n * trunc(a / n) -The ``BPF_MOVSX`` instruction does a move operation with sign extension. -``BPF_ALU | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into 32 +The ``MOVSX`` instruction does a move operation with sign extension. +``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into 32 bit operands, and zeroes the remaining upper 32 bits. -``BPF_ALU64 | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit -operands into 64 bit operands. +``{MOVSX, X, ALU64}`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit +operands into 64 bit operands. Unlike other arithmetic instructions, +``MOVSX`` is only defined for register source operands (``X``). + +The ``NEG`` instruction is only defined when the source bit is clear +(``K``). Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31) for 32-bit operations. @@ -303,43 +385,45 @@ for 32-bit operations. Byte swap instructions ---------------------- -The byte swap instructions use instruction classes of ``BPF_ALU`` and ``BPF_ALU64`` -and a 4-bit 'code' field of ``BPF_END``. +The byte swap instructions use instruction classes of ``ALU`` and ``ALU64`` +and a 4-bit 'code' field of ``END``. The byte swap instructions operate on the destination register only and do not use a separate source register or immediate value. -For ``BPF_ALU``, the 1-bit source operand field in the opcode is used to +For ``ALU``, the 1-bit source operand field in the opcode is used to select what byte order the operation converts from or to. For -``BPF_ALU64``, the 1-bit source operand field in the opcode is reserved +``ALU64``, the 1-bit source operand field in the opcode is reserved and must be set to 0. -========= ========= ===== ================================================= -class source value description -========= ========= ===== ================================================= -BPF_ALU BPF_TO_LE 0x00 convert between host byte order and little endian -BPF_ALU BPF_TO_BE 0x08 convert between host byte order and big endian -BPF_ALU64 Reserved 0x00 do byte swap unconditionally -========= ========= ===== ================================================= +===== ======== ===== ================================================= +class source value description +===== ======== ===== ================================================= +ALU TO_LE 0 convert between host byte order and little endian +ALU TO_BE 1 convert between host byte order and big endian +ALU64 Reserved 0 do byte swap unconditionally +===== ======== ===== ================================================= The 'imm' field encodes the width of the swap operations. The following widths -are supported: 16, 32 and 64. +are supported: 16, 32 and 64. Width 64 operations belong to the base64 +conformance group and other swap operations belong to the base32 +conformance group. Examples: -``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: +``{END, TO_LE, ALU}`` with imm = 16/32/64 means:: dst = htole16(dst) dst = htole32(dst) dst = htole64(dst) -``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 16/32/64 means:: +``{END, TO_BE, ALU}`` with imm = 16/32/64 means:: dst = htobe16(dst) dst = htobe32(dst) dst = htobe64(dst) -``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: +``{END, TO_LE, ALU64}`` with imm = 16/32/64 means:: dst = bswap16(dst) dst = bswap32(dst) @@ -348,56 +432,61 @@ Examples: Jump instructions ----------------- -``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for -otherwise identical operations. +``JMP32`` uses 32-bit wide operands and indicates the base32 +conformance group, while ``JMP`` uses 64-bit wide operands for +otherwise identical operations, and indicates the base64 conformance +group unless otherwise specified. The 'code' field encodes the operation as below: -======== ===== === =========================================== ========================================= -code value src description notes -======== ===== === =========================================== ========================================= -BPF_JA 0x0 0x0 PC += offset BPF_JMP class -BPF_JA 0x0 0x0 PC += imm BPF_JMP32 class -BPF_JEQ 0x1 any PC += offset if dst == src -BPF_JGT 0x2 any PC += offset if dst > src unsigned -BPF_JGE 0x3 any PC += offset if dst >= src unsigned -BPF_JSET 0x4 any PC += offset if dst & src -BPF_JNE 0x5 any PC += offset if dst != src -BPF_JSGT 0x6 any PC += offset if dst > src signed -BPF_JSGE 0x7 any PC += offset if dst >= src signed -BPF_CALL 0x8 0x0 call helper function by address see `Helper functions`_ -BPF_CALL 0x8 0x1 call PC += imm see `Program-local functions`_ -BPF_CALL 0x8 0x2 call helper function by BTF ID see `Helper functions`_ -BPF_EXIT 0x9 0x0 return BPF_JMP only -BPF_JLT 0xa any PC += offset if dst < src unsigned -BPF_JLE 0xb any PC += offset if dst <= src unsigned -BPF_JSLT 0xc any PC += offset if dst < src signed -BPF_JSLE 0xd any PC += offset if dst <= src signed -======== ===== === =========================================== ========================================= - -The BPF program needs to store the return value into register R0 before doing a -``BPF_EXIT``. +======== ===== ======= =============================== =================================================== +code value src_reg description notes +======== ===== ======= =============================== =================================================== +JA 0x0 0x0 PC += offset {JA, K, JMP} only +JA 0x0 0x0 PC += imm {JA, K, JMP32} only +JEQ 0x1 any PC += offset if dst == src +JGT 0x2 any PC += offset if dst > src unsigned +JGE 0x3 any PC += offset if dst >= src unsigned +JSET 0x4 any PC += offset if dst & src +JNE 0x5 any PC += offset if dst != src +JSGT 0x6 any PC += offset if dst > src signed +JSGE 0x7 any PC += offset if dst >= src signed +CALL 0x8 0x0 call helper function by address {CALL, K, JMP} only, see `Helper functions`_ +CALL 0x8 0x1 call PC += imm {CALL, K, JMP} only, see `Program-local functions`_ +CALL 0x8 0x2 call helper function by BTF ID {CALL, K, JMP} only, see `Helper functions`_ +EXIT 0x9 0x0 return {CALL, K, JMP} only +JLT 0xa any PC += offset if dst < src unsigned +JLE 0xb any PC += offset if dst <= src unsigned +JSLT 0xc any PC += offset if dst < src signed +JSLE 0xd any PC += offset if dst <= src signed +======== ===== ======= =============================== =================================================== + +The BPF program needs to store the return value into register R0 before doing an +``EXIT``. Example: -``BPF_JSGE | BPF_X | BPF_JMP32`` (0x7e) means:: +``{JSGE, X, JMP32}`` means:: if (s32)dst s>= (s32)src goto +offset where 's>=' indicates a signed '>=' comparison. -``BPF_JA | BPF_K | BPF_JMP32`` (0x06) means:: +``{JA, K, JMP32}`` means:: gotol +imm where 'imm' means the branch offset comes from insn 'imm' field. -Note that there are two flavors of ``BPF_JA`` instructions. The -``BPF_JMP`` class permits a 16-bit jump offset specified by the 'offset' -field, whereas the ``BPF_JMP32`` class permits a 32-bit jump offset +Note that there are two flavors of ``JA`` instructions. The +``JMP`` class permits a 16-bit jump offset specified by the 'offset' +field, whereas the ``JMP32`` class permits a 32-bit jump offset specified by the 'imm' field. A > 16-bit conditional jump may be converted to a < 16-bit conditional jump plus a 32-bit unconditional jump. +All ``CALL`` and ``JA`` instructions belong to the +base32 conformance group. + Helper functions ~~~~~~~~~~~~~~~~ @@ -416,78 +505,83 @@ Program-local functions ~~~~~~~~~~~~~~~~~~~~~~~ Program-local functions are functions exposed by the same BPF program as the caller, and are referenced by offset from the call instruction, similar to -``BPF_JA``. The offset is encoded in the imm field of the call instruction. -A ``BPF_EXIT`` within the program-local function will return to the caller. +``JA``. The offset is encoded in the imm field of the call instruction. +A ``EXIT`` within the program-local function will return to the caller. Load and store instructions =========================== -For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the -8-bit 'opcode' field is divided as: - -============ ====== ================= -3 bits (MSB) 2 bits 3 bits (LSB) -============ ====== ================= -mode size instruction class -============ ====== ================= - -The mode modifier is one of: - - ============= ===== ==================================== ============= - mode modifier value description reference - ============= ===== ==================================== ============= - BPF_IMM 0x00 64-bit immediate instructions `64-bit immediate instructions`_ - BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_ - BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_ - BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_ - BPF_MEMSX 0x80 sign-extension load operations `Sign-extension load operations`_ - BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_ - ============= ===== ==================================== ============= - -The size modifier is one of: - - ============= ===== ===================== - size modifier value description - ============= ===== ===================== - BPF_W 0x00 word (4 bytes) - BPF_H 0x08 half word (2 bytes) - BPF_B 0x10 byte - BPF_DW 0x18 double word (8 bytes) - ============= ===== ===================== +For load and store instructions (``LD``, ``LDX``, ``ST``, and ``STX``), the +8-bit 'opcode' field is divided as:: + + +-+-+-+-+-+-+-+-+ + |mode |sz |class| + +-+-+-+-+-+-+-+-+ + +**mode** + The mode modifier is one of: + + ============= ===== ==================================== ============= + mode modifier value description reference + ============= ===== ==================================== ============= + IMM 0 64-bit immediate instructions `64-bit immediate instructions`_ + ABS 1 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_ + IND 2 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_ + MEM 3 regular load and store operations `Regular load and store operations`_ + MEMSX 4 sign-extension load operations `Sign-extension load operations`_ + ATOMIC 6 atomic operations `Atomic operations`_ + ============= ===== ==================================== ============= + +**sz (size)** + The size modifier is one of: + + ==== ===== ===================== + size value description + ==== ===== ===================== + W 0 word (4 bytes) + H 1 half word (2 bytes) + B 2 byte + DW 3 double word (8 bytes) + ==== ===== ===================== + + Instructions using ``DW`` belong to the base64 conformance group. + +**class** + The instruction class (see `Instruction classes`_) Regular load and store operations --------------------------------- -The ``BPF_MEM`` mode modifier is used to encode regular load and store +The ``MEM`` mode modifier is used to encode regular load and store instructions that transfer data between a register and memory. -``BPF_MEM | <size> | BPF_STX`` means:: +``{MEM, <size>, STX}`` means:: *(size *) (dst + offset) = src -``BPF_MEM | <size> | BPF_ST`` means:: +``{MEM, <size>, ST}`` means:: - *(size *) (dst + offset) = imm32 + *(size *) (dst + offset) = imm -``BPF_MEM | <size> | BPF_LDX`` means:: +``{MEM, <size>, LDX}`` means:: dst = *(unsigned size *) (src + offset) -Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW`` and -'unsigned size' is one of u8, u16, u32 or u64. +Where '<size>' is one of: ``B``, ``H``, ``W``, or ``DW``, and +'unsigned size' is one of: u8, u16, u32, or u64. Sign-extension load operations ------------------------------ -The ``BPF_MEMSX`` mode modifier is used to encode :term:`sign-extension<Sign Extend>` load +The ``MEMSX`` mode modifier is used to encode :term:`sign-extension<Sign Extend>` load instructions that transfer data between a register and memory. -``BPF_MEMSX | <size> | BPF_LDX`` means:: +``{MEMSX, <size>, LDX}`` means:: dst = *(signed size *) (src + offset) -Where size is one of: ``BPF_B``, ``BPF_H`` or ``BPF_W``, and -'signed size' is one of s8, s16 or s32. +Where size is one of: ``B``, ``H``, or ``W``, and +'signed size' is one of: s8, s16, or s32. Atomic operations ----------------- @@ -497,10 +591,12 @@ interrupted or corrupted by other access to the same memory region by other BPF programs or means outside of this specification. All atomic operations supported by BPF are encoded as store operations -that use the ``BPF_ATOMIC`` mode modifier as follows: +that use the ``ATOMIC`` mode modifier as follows: -* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations -* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations +* ``{ATOMIC, W, STX}`` for 32-bit operations, which are + part of the "atomic32" conformance group. +* ``{ATOMIC, DW, STX}`` for 64-bit operations, which are + part of the "atomic64" conformance group. * 8-bit and 16-bit wide atomic operations are not supported. The 'imm' field is used to encode the actual atomic operation. @@ -510,18 +606,18 @@ arithmetic operations in the 'imm' field to encode the atomic operation: ======== ===== =========== imm value description ======== ===== =========== -BPF_ADD 0x00 atomic add -BPF_OR 0x40 atomic or -BPF_AND 0x50 atomic and -BPF_XOR 0xa0 atomic xor +ADD 0x00 atomic add +OR 0x40 atomic or +AND 0x50 atomic and +XOR 0xa0 atomic xor ======== ===== =========== -``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means:: +``{ATOMIC, W, STX}`` with 'imm' = ADD means:: *(u32 *)(dst + offset) += src -``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means:: +``{ATOMIC, DW, STX}`` with 'imm' = ADD means:: *(u64 *)(dst + offset) += src @@ -531,20 +627,20 @@ two complex atomic operations: =========== ================ =========================== imm value description =========== ================ =========================== -BPF_FETCH 0x01 modifier: return old value -BPF_XCHG 0xe0 | BPF_FETCH atomic exchange -BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange +FETCH 0x01 modifier: return old value +XCHG 0xe0 | FETCH atomic exchange +CMPXCHG 0xf0 | FETCH atomic compare and exchange =========== ================ =========================== -The ``BPF_FETCH`` modifier is optional for simple atomic operations, and -always set for the complex atomic operations. If the ``BPF_FETCH`` flag +The ``FETCH`` modifier is optional for simple atomic operations, and +always set for the complex atomic operations. If the ``FETCH`` flag is set, then the operation also overwrites ``src`` with the value that was in memory before it was modified. -The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value +The ``XCHG`` operation atomically exchanges ``src`` with the value addressed by ``dst + offset``. -The ``BPF_CMPXCHG`` operation atomically compares the value addressed by +The ``CMPXCHG`` operation atomically compares the value addressed by ``dst + offset`` with ``R0``. If they match, the value addressed by ``dst + offset`` is replaced with ``src``. In either case, the value that was at ``dst + offset`` before the operation is zero-extended @@ -553,25 +649,25 @@ and loaded back to ``R0``. 64-bit immediate instructions ----------------------------- -Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction -encoding defined in `Instruction encoding`_, and use the 'src' field of the +Instructions with the ``IMM`` 'mode' modifier use the wide instruction +encoding defined in `Instruction encoding`_, and use the 'src_reg' field of the basic instruction to hold an opcode subtype. -The following table defines a set of ``BPF_IMM | BPF_DW | BPF_LD`` instructions -with opcode subtypes in the 'src' field, using new terms such as "map" +The following table defines a set of ``{IMM, DW, LD}`` instructions +with opcode subtypes in the 'src_reg' field, using new terms such as "map" defined further below: -========================= ====== === ========================================= =========== ============== -opcode construction opcode src pseudocode imm type dst type -========================= ====== === ========================================= =========== ============== -BPF_IMM | BPF_DW | BPF_LD 0x18 0x0 dst = imm64 integer integer -BPF_IMM | BPF_DW | BPF_LD 0x18 0x1 dst = map_by_fd(imm) map fd map -BPF_IMM | BPF_DW | BPF_LD 0x18 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer -BPF_IMM | BPF_DW | BPF_LD 0x18 0x3 dst = var_addr(imm) variable id data pointer -BPF_IMM | BPF_DW | BPF_LD 0x18 0x4 dst = code_addr(imm) integer code pointer -BPF_IMM | BPF_DW | BPF_LD 0x18 0x5 dst = map_by_idx(imm) map index map -BPF_IMM | BPF_DW | BPF_LD 0x18 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer -========================= ====== === ========================================= =========== ============== +======= ========================================= =========== ============== +src_reg pseudocode imm type dst type +======= ========================================= =========== ============== +0x0 dst = (next_imm << 32) | imm integer integer +0x1 dst = map_by_fd(imm) map fd map +0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer +0x3 dst = var_addr(imm) variable id data pointer +0x4 dst = code_addr(imm) integer code pointer +0x5 dst = map_by_idx(imm) map index map +0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer +======= ========================================= =========== ============== where @@ -609,5 +705,9 @@ Legacy BPF Packet access instructions ------------------------------------- BPF previously introduced special instructions for access to packet data that were -carried over from classic BPF. However, these instructions are -deprecated and should no longer be used. +carried over from classic BPF. These instructions used an instruction +class of ``LD``, a size modifier of ``W``, ``H``, or ``B``, and a +mode modifier of ``ABS`` or ``IND``. The 'dst_reg' and 'offset' fields were +set to zero, and 'src_reg' was set to zero for ``ABS``. However, these +instructions are deprecated and should no longer be used. All legacy packet +access instructions belong to the "packet" conformance group. diff --git a/Documentation/bpf/verifier.rst b/Documentation/bpf/verifier.rst index f0ec19db301c..356894399fbf 100644 --- a/Documentation/bpf/verifier.rst +++ b/Documentation/bpf/verifier.rst @@ -562,7 +562,7 @@ works:: * ``checkpoint[0].r1`` is marked as read; * At instruction #5 exit is reached and ``checkpoint[0]`` can now be processed - by ``clean_live_states()``. After this processing ``checkpoint[0].r0`` has a + by ``clean_live_states()``. After this processing ``checkpoint[0].r1`` has a read mark and all other registers and stack slots are marked as ``NOT_INIT`` or ``STACK_INVALID`` diff --git a/Documentation/conf.py b/Documentation/conf.py index 5830b01c5642..d148f3e8dd57 100644 --- a/Documentation/conf.py +++ b/Documentation/conf.py @@ -346,9 +346,9 @@ sys.stderr.write("Using %s theme\n" % html_theme) html_static_path = ['sphinx-static'] # If true, Docutils "smart quotes" will be used to convert quotes and dashes -# to typographically correct entities. This will convert "--" to "—", -# which is not always what we want, so disable it. -smartquotes = False +# to typographically correct entities. However, conversion of "--" to "—" +# is not always what we want, so enable only quotes. +smartquotes_action = 'q' # Custom sidebar templates, maps document names to template names. # Note that the RTD theme ignores this @@ -388,6 +388,12 @@ latex_elements = { verbatimhintsturnover=false, ''', + # + # Some of our authors are fond of deep nesting; tell latex to + # cope. + # + 'maxlistdepth': '10', + # For CJK One-half spacing, need to be in front of hyperref 'extrapackages': r'\usepackage{setspace}', diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/workqueue.rst index 3599cf9267b4..ed73c612174d 100644 --- a/Documentation/core-api/workqueue.rst +++ b/Documentation/core-api/workqueue.rst @@ -77,10 +77,12 @@ wants a function to be executed asynchronously it has to set up a work item pointing to that function and queue that work item on a workqueue. -Special purpose threads, called worker threads, execute the functions -off of the queue, one after the other. If no work is queued, the -worker threads become idle. These worker threads are managed in so -called worker-pools. +A work item can be executed in either a thread or the BH (softirq) context. + +For threaded workqueues, special purpose threads, called [k]workers, execute +the functions off of the queue, one after the other. If no work is queued, +the worker threads become idle. These worker threads are managed in +worker-pools. The cmwq design differentiates between the user-facing workqueues that subsystems and drivers queue work items on and the backend mechanism @@ -91,6 +93,12 @@ for high priority ones, for each possible CPU and some extra worker-pools to serve work items queued on unbound workqueues - the number of these backing pools is dynamic. +BH workqueues use the same framework. However, as there can only be one +concurrent execution context, there's no need to worry about concurrency. +Each per-CPU BH worker pool contains only one pseudo worker which represents +the BH execution context. A BH workqueue can be considered a convenience +interface to softirq. + Subsystems and drivers can create and queue work items through special workqueue API functions as they see fit. They can influence some aspects of the way the work items are executed by setting flags on the @@ -106,7 +114,7 @@ unless specifically overridden, a work item of a bound workqueue will be queued on the worklist of either normal or highpri worker-pool that is associated to the CPU the issuer is running on. -For any worker pool implementation, managing the concurrency level +For any thread pool implementation, managing the concurrency level (how many execution contexts are active) is an important issue. cmwq tries to keep the concurrency at a minimal but sufficient level. Minimal to save resources and sufficient in that the system is used at @@ -164,6 +172,17 @@ resources, scheduled and executed. ``flags`` --------- +``WQ_BH`` + BH workqueues can be considered a convenience interface to softirq. BH + workqueues are always per-CPU and all BH work items are executed in the + queueing CPU's softirq context in the queueing order. + + All BH workqueues must have 0 ``max_active`` and ``WQ_HIGHPRI`` is the + only allowed additional flag. + + BH work items cannot sleep. All other features such as delayed queueing, + flushing and canceling are supported. + ``WQ_UNBOUND`` Work items queued to an unbound wq are served by the special worker-pools which host workers which are not bound to any @@ -237,15 +256,11 @@ may queue at the same time. Unless there is a specific need for throttling the number of active work items, specifying '0' is recommended. -Some users depend on the strict execution ordering of ST wq. The -combination of ``@max_active`` of 1 and ``WQ_UNBOUND`` used to -achieve this behavior. Work items on such wq were always queued to the -unbound worker-pools and only one work item could be active at any given -time thus achieving the same ordering property as ST wq. - -In the current implementation the above configuration only guarantees -ST behavior within a given NUMA node. Instead ``alloc_ordered_workqueue()`` should -be used to achieve system-wide ST behavior. +Some users depend on strict execution ordering where only one work item +is in flight at any given time and the work items are processed in +queueing order. While the combination of ``@max_active`` of 1 and +``WQ_UNBOUND`` used to achieve this behavior, this is no longer the +case. Use ``alloc_ordered_queue()`` instead. Example Execution Scenarios diff --git a/Documentation/dev-tools/checkpatch.rst b/Documentation/dev-tools/checkpatch.rst index c3389c6f3838..127968995847 100644 --- a/Documentation/dev-tools/checkpatch.rst +++ b/Documentation/dev-tools/checkpatch.rst @@ -168,7 +168,7 @@ Available options: - --fix - This is an EXPERIMENTAL feature. If correctable errors exists, a file + This is an EXPERIMENTAL feature. If correctable errors exist, a file <inputfile>.EXPERIMENTAL-checkpatch-fixes is created which has the automatically fixable errors corrected. @@ -181,7 +181,7 @@ Available options: - --ignore-perl-version - Override checking of perl version. Runtime errors maybe encountered after + Override checking of perl version. Runtime errors may be encountered after enabling this flag if the perl version does not meet the minimum specified. - --codespell diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index 858c77fe7dc4..d56f298a9d7c 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -277,6 +277,27 @@ traces point to places in code that interacted with the object but that are not directly present in the bad access stack trace. Currently, this includes call_rcu() and workqueue queuing. +CONFIG_KASAN_EXTRA_INFO +~~~~~~~~~~~~~~~~~~~~~~~ + +Enabling CONFIG_KASAN_EXTRA_INFO allows KASAN to record and report more +information. The extra information currently supported is the CPU number and +timestamp at allocation and free. More information can help find the cause of +the bug and correlate the error with other system events, at the cost of using +extra memory to record more information (more cost details in the help text of +CONFIG_KASAN_EXTRA_INFO). + +Here is the report with CONFIG_KASAN_EXTRA_INFO enabled (only the +different parts are shown):: + + ================================================================== + ... + Allocated by task 134 on cpu 5 at 229.133855s: + ... + Freed by task 136 on cpu 3 at 230.199335s: + ... + ================================================================== + Implementation details ---------------------- diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index ab376b316c36..ff10dc6eef5d 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -245,6 +245,10 @@ Contributing new tests (details) TEST_PROGS, TEST_GEN_PROGS mean it is the executable tested by default. + TEST_GEN_MODS_DIR should be used by tests that require modules to be built + before the test starts. The variable will contain the name of the directory + containing the modules. + TEST_CUSTOM_PROGS should be used by tests that require custom build rules and prevent common build rule use. @@ -255,9 +259,21 @@ Contributing new tests (details) TEST_PROGS_EXTENDED, TEST_GEN_PROGS_EXTENDED mean it is the executable which is not tested by default. + TEST_FILES, TEST_GEN_FILES mean it is the file which is used by test. + TEST_INCLUDES is similar to TEST_FILES, it lists files which should be + included when exporting or installing the tests, with the following + differences: + + * symlinks to files in other directories are preserved + * the part of paths below tools/testing/selftests/ is preserved when + copying the files to the output directory + + TEST_INCLUDES is meant to list dependencies located in other directories of + the selftests hierarchy. + * First use the headers inside the kernel source and/or git repo, and then the system headers. Headers for the kernel release as opposed to headers installed by the distro on the system should be the primary focus to be able diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst index a9efab50eed8..22955d56b379 100644 --- a/Documentation/dev-tools/kunit/usage.rst +++ b/Documentation/dev-tools/kunit/usage.rst @@ -671,8 +671,23 @@ Testing Static Functions ------------------------ If we do not want to expose functions or variables for testing, one option is to -conditionally ``#include`` the test file at the end of your .c file. For -example: +conditionally export the used symbol. For example: + +.. code-block:: c + + /* In my_file.c */ + + VISIBLE_IF_KUNIT int do_interesting_thing(); + EXPORT_SYMBOL_IF_KUNIT(do_interesting_thing); + + /* In my_file.h */ + + #if IS_ENABLED(CONFIG_KUNIT) + int do_interesting_thing(void); + #endif + +Alternatively, you could conditionally ``#include`` the test file at the end of +your .c file. For example: .. code-block:: c diff --git a/Documentation/dev-tools/ubsan.rst b/Documentation/dev-tools/ubsan.rst index 2de7c63415da..e3591f8e9d5b 100644 --- a/Documentation/dev-tools/ubsan.rst +++ b/Documentation/dev-tools/ubsan.rst @@ -49,34 +49,22 @@ Report example Usage ----- -To enable UBSAN configure kernel with:: +To enable UBSAN, configure the kernel with:: - CONFIG_UBSAN=y + CONFIG_UBSAN=y -and to check the entire kernel:: - - CONFIG_UBSAN_SANITIZE_ALL=y - -To enable instrumentation for specific files or directories, add a line -similar to the following to the respective kernel Makefile: - -- For a single file (e.g. main.o):: - - UBSAN_SANITIZE_main.o := y - -- For all files in one directory:: - - UBSAN_SANITIZE := y - -To exclude files from being instrumented even if -``CONFIG_UBSAN_SANITIZE_ALL=y``, use:: +To exclude files from being instrumented use:: UBSAN_SANITIZE_main.o := n -and:: +and to exclude all targets in one directory use:: UBSAN_SANITIZE := n +When disabled for all targets, specific files can be enabled using:: + + UBSAN_SANITIZE_main.o := y + Detection of unaligned accesses controlled through the separate option - CONFIG_UBSAN_ALIGNMENT. It's off by default on architectures that support unaligned accesses (CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y). One could diff --git a/Documentation/devicetree/bindings/Makefile b/Documentation/devicetree/bindings/Makefile index 2323fd5b7cda..129cf698fa8a 100644 --- a/Documentation/devicetree/bindings/Makefile +++ b/Documentation/devicetree/bindings/Makefile @@ -28,7 +28,10 @@ $(obj)/%.example.dts: $(src)/%.yaml check_dtschema_version FORCE find_all_cmd = find $(srctree)/$(src) \( -name '*.yaml' ! \ -name 'processed-schema*' \) -find_cmd = $(find_all_cmd) | sed 's|^$(srctree)/$(src)/||' | grep -F -e "$(subst :," -e ",$(DT_SCHEMA_FILES))" | sed 's|^|$(srctree)/$(src)/|' +find_cmd = $(find_all_cmd) | \ + sed 's|^$(srctree)/||' | \ + grep -F -e "$(subst :," -e ",$(DT_SCHEMA_FILES))" | \ + sed 's|^|$(srctree)/|' CHK_DT_DOCS := $(shell $(find_cmd)) quiet_cmd_yamllint = LINT $(src) diff --git a/Documentation/devicetree/bindings/arm/amlogic.yaml b/Documentation/devicetree/bindings/arm/amlogic.yaml index caab7ceeda45..949537cea6be 100644 --- a/Documentation/devicetree/bindings/arm/amlogic.yaml +++ b/Documentation/devicetree/bindings/arm/amlogic.yaml @@ -7,19 +7,11 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# title: Amlogic SoC based Platforms maintainers: + - Neil Armstrong <neil.armstrong@linaro.org> + - Martin Blumenstingl <martin.blumenstingl@googlemail.com> + - Jerome Brunet <jbrunet@baylibre.com> - Kevin Hilman <khilman@baylibre.com> -description: |+ - Work in progress statement: - - Device tree files and bindings applying to Amlogic SoCs and boards are - considered "unstable". Any Amlogic device tree binding may change at - any time. Be sure to use a device tree binary and a kernel image - generated from the same source tree. - - Please refer to Documentation/devicetree/bindings/ABI.rst for a definition of a - stable binding/ABI. - properties: $nodename: const: '/' @@ -146,6 +138,7 @@ properties: - enum: - amediatech,x96-max - amlogic,u200 + - freebox,fbx8am - radxa,zero - seirobotics,sei510 - const: amlogic,g12a diff --git a/Documentation/devicetree/bindings/arm/arm,realview.yaml b/Documentation/devicetree/bindings/arm/arm,realview.yaml index d1bdee98f9af..3c5f1688dbd7 100644 --- a/Documentation/devicetree/bindings/arm/arm,realview.yaml +++ b/Documentation/devicetree/bindings/arm/arm,realview.yaml @@ -10,9 +10,9 @@ maintainers: - Linus Walleij <linus.walleij@linaro.org> description: |+ - The ARM RealView series of reference designs were built to explore the ARM - 11, Cortex A-8 and Cortex A-9 CPUs. This included new features compared to - the earlier CPUs such as TrustZone and multicore (MPCore). + The ARM RealView series of reference designs were built to explore the Arm11, + Cortex-A8, and Cortex-A9 CPUs. This included new features compared to the + earlier CPUs such as TrustZone and multicore (MPCore). properties: $nodename: diff --git a/Documentation/devicetree/bindings/arm/atmel-at91.yaml b/Documentation/devicetree/bindings/arm/atmel-at91.yaml index 89d75fbb1de4..82f37328cc69 100644 --- a/Documentation/devicetree/bindings/arm/atmel-at91.yaml +++ b/Documentation/devicetree/bindings/arm/atmel-at91.yaml @@ -179,6 +179,12 @@ properties: - const: microchip,sama7g5 - const: microchip,sama7 + - description: Microchip SAMA7G54 Curiosity Board + items: + - const: microchip,sama7g54-curiosity + - const: microchip,sama7g5 + - const: microchip,sama7 + - description: Microchip LAN9662 Evaluation Boards. items: - enum: diff --git a/Documentation/devicetree/bindings/arm/fsl.yaml b/Documentation/devicetree/bindings/arm/fsl.yaml index 228dcc5c7d6f..0027201e19f8 100644 --- a/Documentation/devicetree/bindings/arm/fsl.yaml +++ b/Documentation/devicetree/bindings/arm/fsl.yaml @@ -384,7 +384,8 @@ properties: - toradex,apalis_imx6q-ixora # Apalis iMX6Q/D Module on Ixora Carrier Board - toradex,apalis_imx6q-ixora-v1.1 # Apalis iMX6Q/D Module on Ixora V1.1 Carrier Board - toradex,apalis_imx6q-ixora-v1.2 # Apalis iMX6Q/D Module on Ixora V1.2 Carrier Board - - toradex,apalis_imx6q-eval # Apalis iMX6Q/D Module on Apalis Evaluation Board + - toradex,apalis_imx6q-eval # Apalis iMX6Q/D Module on Apalis Evaluation Board v1.0/v1.1 + - toradex,apalis_imx6q-eval-v1.2 # Apalis iMX6Q/D Module on Apalis Evaluation Board v1.2 - const: toradex,apalis_imx6q - const: fsl,imx6q @@ -469,6 +470,7 @@ properties: - prt,prtvt7 # Protonic VT7 board - rex,imx6dl-rex-basic # Rex Basic i.MX6 Dual Lite Board - riot,imx6s-riotboard # RIoTboard i.MX6S + - sielaff,imx6dl-board # Sielaff i.MX6 Solo Board - skov,imx6dl-skov-revc-lt2 # SKOV IMX6 CPU SoloCore lt2 - skov,imx6dl-skov-revc-lt6 # SKOV IMX6 CPU SoloCore lt6 - solidrun,cubox-i/dl # SolidRun Cubox-i Solo/DualLite @@ -708,6 +710,7 @@ properties: - toradex,colibri-imx6ull # Colibri iMX6ULL Modules - toradex,colibri-imx6ull-emmc # Colibri iMX6ULL 1GB (eMMC) Module - toradex,colibri-imx6ull-wifi # Colibri iMX6ULL Wi-Fi / BT Modules + - uni-t,uti260b # UNI-T UTi260B Thermal Camera - const: fsl,imx6ull - description: i.MX6ULL Armadeus Systems OPOS6ULDev Board @@ -1026,7 +1029,7 @@ properties: items: - enum: - dimonoff,gateway-evk # i.MX8MN Dimonoff Gateway EVK Board - - rve,rve-gateway # i.MX8MN RVE Gateway Board + - rve,gateway # i.MX8MN RVE Gateway Board - variscite,var-som-mx8mn-symphony - const: variscite,var-som-mx8mn - const: fsl,imx8mn @@ -1194,7 +1197,8 @@ properties: - description: i.MX8QM Boards with Toradex Apalis iMX8 Modules items: - enum: - - toradex,apalis-imx8-eval # Apalis iMX8 Module on Apalis Evaluation Board + - toradex,apalis-imx8-eval # Apalis iMX8 Module on Apalis Evaluation V1.0/V1.1 Board + - toradex,apalis-imx8-eval-v1.2 # Apalis iMX8 Module on Apalis Evaluation V1.2 Board - toradex,apalis-imx8-ixora-v1.1 # Apalis iMX8 Module on Ixora V1.1 Carrier Board - const: toradex,apalis-imx8 - const: fsl,imx8qm @@ -1202,7 +1206,8 @@ properties: - description: i.MX8QM Boards with Toradex Apalis iMX8 V1.1 Modules items: - enum: - - toradex,apalis-imx8-v1.1-eval # Apalis iMX8 V1.1 Module on Apalis Eval. Board + - toradex,apalis-imx8-v1.1-eval # Apalis iMX8 V1.1 Module on Apalis Eval. V1.0/V1.1 Board + - toradex,apalis-imx8-v1.1-eval-v1.2 # Apalis iMX8 V1.1 Module on Apalis Eval. V1.2 Board - toradex,apalis-imx8-v1.1-ixora-v1.1 # Apalis iMX8 V1.1 Module on Ixora V1.1 C. Board - toradex,apalis-imx8-v1.1-ixora-v1.2 # Apalis iMX8 V1.1 Module on Ixora V1.2 C. Board - const: toradex,apalis-imx8-v1.1 @@ -1232,6 +1237,22 @@ properties: - const: toradex,colibri-imx8x - const: fsl,imx8qxp + - description: + TQMa8Xx is a series of SOM featuring NXP i.MX8X system-on-chip + variants. It is designed to be clicked on different carrier boards + MBa8Xx is the starterkit + oneOf: + - items: + - enum: + - tq,imx8dxp-tqma8xdp-mba8xx # TQ-Systems GmbH TQMa8XDP SOM on MBa8Xx + - const: tq,imx8dxp-tqma8xdp # TQ-Systems GmbH TQMa8XDP SOM (with i.MX8DXP) + - const: fsl,imx8dxp + - items: + - enum: + - tq,imx8qxp-tqma8xqp-mba8xx # TQ-Systems GmbH TQMa8XQP SOM on MBa8Xx + - const: tq,imx8qxp-tqma8xqp # TQ-Systems GmbH TQMa8XQP SOM (with i.MX8QXP) + - const: fsl,imx8qxp + - description: i.MX8ULP based Boards items: - enum: @@ -1275,6 +1296,18 @@ properties: - const: tq,imx93-tqma9352 # TQ-Systems GmbH i.MX93 TQMa93xxCA/LA SOM - const: fsl,imx93 + - description: PHYTEC phyCORE-i.MX93 SoM based boards + items: + - const: phytec,imx93-phyboard-segin # phyBOARD-Segin with i.MX93 + - const: phytec,imx93-phycore-som # phyCORE-i.MX93 SoM + - const: fsl,imx93 + + - description: Variscite VAR-SOM-MX93 based boards + items: + - const: variscite,var-som-mx93-symphony + - const: variscite,var-som-mx93 + - const: fsl,imx93 + - description: Freescale Vybrid Platform Device Tree Bindings diff --git a/Documentation/devicetree/bindings/arm/marvell/armada-38x.txt b/Documentation/devicetree/bindings/arm/marvell/armada-38x.txt deleted file mode 100644 index 202953f1887e..000000000000 --- a/Documentation/devicetree/bindings/arm/marvell/armada-38x.txt +++ /dev/null @@ -1,27 +0,0 @@ -Marvell Armada 38x Platforms Device Tree Bindings -------------------------------------------------- - -Boards with a SoC of the Marvell Armada 38x family shall have the -following property: - -Required root node property: - - - compatible: must contain "marvell,armada380" - -In addition, boards using the Marvell Armada 385 SoC shall have the -following property before the previous one: - -Required root node property: - -compatible: must contain "marvell,armada385" - -In addition, boards using the Marvell Armada 388 SoC shall have the -following property before the previous one: - -Required root node property: - -compatible: must contain "marvell,armada388" - -Example: - -compatible = "marvell,a385-rd", "marvell,armada385", "marvell,armada380"; diff --git a/Documentation/devicetree/bindings/arm/marvell/armada-38x.yaml b/Documentation/devicetree/bindings/arm/marvell/armada-38x.yaml new file mode 100644 index 000000000000..cdf805b5db95 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/marvell/armada-38x.yaml @@ -0,0 +1,70 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/arm/marvell/armada-38x.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Marvell Armada 38x Platforms + +maintainers: + - Gregory CLEMENT <gregory.clement@bootlin.com> + +properties: + $nodename: + const: '/' + compatible: + oneOf: + + - description: + Netgear Armada 380 GS110EM Managed Switch. + items: + - const: netgear,gs110emx + - const: marvell,armada380 + + - description: + Marvell Armada 385 Development Boards. + items: + - enum: + - marvell,a385-db-amc + - marvell,a385-db-ap + - const: marvell,armada385 + - const: marvell,armada380 + + - description: + SolidRun Armada 385 based single-board computers. + items: + - enum: + - solidrun,clearfog-gtr-l8 + - solidrun,clearfog-gtr-s4 + - const: marvell,armada385 + - const: marvell,armada380 + + - description: + Kobol Armada 388 based Helios-4 NAS. + items: + - const: kobol,helios4 + - const: marvell,armada388 + - const: marvell,armada385 + - const: marvell,armada380 + + - description: + Marvell Armada 388 Development Boards. + items: + - enum: + - marvell,a388-gp + - const: marvell,armada388 + - const: marvell,armada385 + - const: marvell,armada380 + + - description: + SolidRun Armada 388 clearfog family single-board computers. + items: + - enum: + - solidrun,clearfog-base-a1 + - solidrun,clearfog-pro-a1 + - const: solidrun,clearfog-a1 + - const: marvell,armada388 + - const: marvell,armada385 + - const: marvell,armada380 + +additionalProperties: true diff --git a/Documentation/devicetree/bindings/arm/mediatek.yaml b/Documentation/devicetree/bindings/arm/mediatek.yaml index 6f2f64ae76fc..09f9ffd3ff7b 100644 --- a/Documentation/devicetree/bindings/arm/mediatek.yaml +++ b/Documentation/devicetree/bindings/arm/mediatek.yaml @@ -17,6 +17,7 @@ properties: const: '/' compatible: oneOf: + # Sort by SoC (last) compatible, then board compatible - items: - enum: - mediatek,mt2701-evb @@ -84,6 +85,11 @@ properties: - const: mediatek,mt7629 - items: - enum: + - xiaomi,ax3000t + - const: mediatek,mt7981b + - items: + - enum: + - acelink,ew-7886cax - bananapi,bpi-r3 - mediatek,mt7986a-rfb - const: mediatek,mt7986a @@ -93,6 +99,10 @@ properties: - const: mediatek,mt7986b - items: - enum: + - bananapi,bpi-r4 + - const: mediatek,mt7988a + - items: + - enum: - mediatek,mt8127-moose - const: mediatek,mt8127 - items: @@ -129,75 +139,10 @@ properties: - enum: - mediatek,mt8173-evb - const: mediatek,mt8173 - - items: - - enum: - - mediatek,mt8183-evb - - const: mediatek,mt8183 - - description: Google Hayato rev5 - items: - - const: google,hayato-rev5-sku2 - - const: google,hayato-sku2 - - const: google,hayato - - const: mediatek,mt8192 - - description: Google Hayato - items: - - const: google,hayato-rev1 - - const: google,hayato - - const: mediatek,mt8192 - - description: Google Spherion rev4 (Acer Chromebook 514) - items: - - const: google,spherion-rev4 - - const: google,spherion - - const: mediatek,mt8192 - - description: Google Spherion (Acer Chromebook 514) - items: - - const: google,spherion-rev3 - - const: google,spherion-rev2 - - const: google,spherion-rev1 - - const: google,spherion-rev0 - - const: google,spherion - - const: mediatek,mt8192 - - description: Acer Tomato (Acer Chromebook Spin 513 CP513-2H) - items: - - enum: - - google,tomato-rev2 - - google,tomato-rev1 - - const: google,tomato - - const: mediatek,mt8195 - - description: Acer Tomato rev3 - 4 (Acer Chromebook Spin 513 CP513-2H) - items: - - const: google,tomato-rev4 - - const: google,tomato-rev3 - - const: google,tomato - - const: mediatek,mt8195 - - items: - - enum: - - mediatek,mt8186-evb - - const: mediatek,mt8186 - - items: - - enum: - - mediatek,mt8188-evb - - const: mediatek,mt8188 - - items: - - enum: - - mediatek,mt8192-evb - - const: mediatek,mt8192 - - items: - - enum: - - mediatek,mt8195-demo - - mediatek,mt8195-evb - - const: mediatek,mt8195 - description: Google Burnet (HP Chromebook x360 11MK G3 EE) items: - const: google,burnet - const: mediatek,mt8183 - - description: Google Krane (Lenovo IdeaPad Duet, 10e,...) - items: - - enum: - - google,krane-sku0 - - google,krane-sku176 - - const: google,krane - - const: mediatek,mt8183 - description: Google Cozmo (Acer Chromebook 314) items: - const: google,cozmo @@ -255,6 +200,13 @@ properties: - google,kodama-sku32 - const: google,kodama - const: mediatek,mt8183 + - description: Google Krane (Lenovo IdeaPad Duet, 10e,...) + items: + - enum: + - google,krane-sku0 + - google,krane-sku176 + - const: google,krane + - const: mediatek,mt8183 - description: Google Makomo (Lenovo 100e Chromebook 2nd Gen MTK 2) items: - enum: @@ -278,8 +230,123 @@ properties: - const: mediatek,mt8183 - items: - enum: + - mediatek,mt8183-evb + - const: mediatek,mt8183 + - items: + - enum: - mediatek,mt8183-pumpkin - const: mediatek,mt8183 + - description: Google Magneton (Lenovo IdeaPad Slim 3 Chromebook (14M868)) + items: + - const: google,steelix-sku393219 + - const: google,steelix-sku393216 + - const: google,steelix + - const: mediatek,mt8186 + - description: Google Magneton (Lenovo IdeaPad Slim 3 Chromebook (14M868)) + items: + - const: google,steelix-sku393220 + - const: google,steelix-sku393217 + - const: google,steelix + - const: mediatek,mt8186 + - description: Google Magneton (Lenovo IdeaPad Slim 3 Chromebook (14M868)) + items: + - const: google,steelix-sku393221 + - const: google,steelix-sku393218 + - const: google,steelix + - const: mediatek,mt8186 + - description: Google Rusty (Lenovo 100e Chromebook Gen 4) + items: + - const: google,steelix-sku196609 + - const: google,steelix-sku196608 + - const: google,steelix + - const: mediatek,mt8186 + - description: Google Steelix (Lenovo 300e Yoga Chromebook Gen 4) + items: + - enum: + - google,steelix-sku131072 + - google,steelix-sku131073 + - const: google,steelix + - const: mediatek,mt8186 + - description: Google Tentacruel (ASUS Chromebook CM14 Flip CM1402F) + items: + - const: google,tentacruel-sku262147 + - const: google,tentacruel-sku262146 + - const: google,tentacruel-sku262145 + - const: google,tentacruel-sku262144 + - const: google,tentacruel + - const: mediatek,mt8186 + - description: Google Tentacruel (ASUS Chromebook CM14 Flip CM1402F) + items: + - const: google,tentacruel-sku262151 + - const: google,tentacruel-sku262150 + - const: google,tentacruel-sku262149 + - const: google,tentacruel-sku262148 + - const: google,tentacruel + - const: mediatek,mt8186 + - description: Google Tentacool (ASUS Chromebook CM14 CM1402C) + items: + - const: google,tentacruel-sku327681 + - const: google,tentacruel + - const: mediatek,mt8186 + - description: Google Tentacool (ASUS Chromebook CM14 CM1402C) + items: + - const: google,tentacruel-sku327683 + - const: google,tentacruel + - const: mediatek,mt8186 + - items: + - enum: + - mediatek,mt8186-evb + - const: mediatek,mt8186 + - items: + - enum: + - mediatek,mt8188-evb + - const: mediatek,mt8188 + - description: Google Hayato + items: + - const: google,hayato-rev1 + - const: google,hayato + - const: mediatek,mt8192 + - description: Google Hayato rev5 + items: + - const: google,hayato-rev5-sku2 + - const: google,hayato-sku2 + - const: google,hayato + - const: mediatek,mt8192 + - description: Google Spherion (Acer Chromebook 514) + items: + - const: google,spherion-rev3 + - const: google,spherion-rev2 + - const: google,spherion-rev1 + - const: google,spherion-rev0 + - const: google,spherion + - const: mediatek,mt8192 + - description: Google Spherion rev4 (Acer Chromebook 514) + items: + - const: google,spherion-rev4 + - const: google,spherion + - const: mediatek,mt8192 + - items: + - enum: + - mediatek,mt8192-evb + - const: mediatek,mt8192 + - description: Acer Tomato (Acer Chromebook Spin 513 CP513-2H) + items: + - enum: + - google,tomato-rev2 + - google,tomato-rev1 + - const: google,tomato + - const: mediatek,mt8195 + - description: Acer Tomato rev3 - 4 (Acer Chromebook Spin 513 CP513-2H) + items: + - const: google,tomato-rev4 + - const: google,tomato-rev3 + - const: google,tomato + - const: mediatek,mt8195 + - items: + - enum: + - mediatek,mt8195-demo + - mediatek,mt8195-evb + - const: mediatek,mt8195 - items: - enum: - mediatek,mt8365-evk @@ -287,6 +354,7 @@ properties: - items: - enum: - mediatek,mt8395-evk + - radxa,nio-12l - const: mediatek,mt8395 - const: mediatek,mt8195 - items: diff --git a/Documentation/devicetree/bindings/arm/msm/qcom,saw2.txt b/Documentation/devicetree/bindings/arm/msm/qcom,saw2.txt deleted file mode 100644 index c0e3c3a42bea..000000000000 --- a/Documentation/devicetree/bindings/arm/msm/qcom,saw2.txt +++ /dev/null @@ -1,58 +0,0 @@ -SPM AVS Wrapper 2 (SAW2) - -The SAW2 is a wrapper around the Subsystem Power Manager (SPM) and the -Adaptive Voltage Scaling (AVS) hardware. The SPM is a programmable -power-controller that transitions a piece of hardware (like a processor or -subsystem) into and out of low power modes via a direct connection to -the PMIC. It can also be wired up to interact with other processors in the -system, notifying them when a low power state is entered or exited. - -Multiple revisions of the SAW hardware are supported using these Device Nodes. -SAW2 revisions differ in the register offset and configuration data. Also, the -same revision of the SAW in different SoCs may have different configuration -data due the differences in hardware capabilities. Hence the SoC name, the -version of the SAW hardware in that SoC and the distinction between cpu (big -or Little) or cache, may be needed to uniquely identify the SAW register -configuration and initialization data. The compatible string is used to -indicate this parameter. - -PROPERTIES - -- compatible: - Usage: required - Value type: <string> - Definition: Must have - "qcom,saw2" - A more specific value could be one of: - "qcom,apq8064-saw2-v1.1-cpu" - "qcom,msm8226-saw2-v2.1-cpu" - "qcom,msm8974-saw2-v2.1-cpu" - "qcom,apq8084-saw2-v2.1-cpu" - -- reg: - Usage: required - Value type: <prop-encoded-array> - Definition: the first element specifies the base address and size of - the register region. An optional second element specifies - the base address and size of the alias register region. - -- regulator: - Usage: optional - Value type: boolean - Definition: Indicates that this SPM device acts as a regulator device - device for the core (CPU or Cache) the SPM is attached - to. - -Example 1: - - power-controller@2099000 { - compatible = "qcom,saw2"; - reg = <0x02099000 0x1000>, <0x02009000 0x1000>; - regulator; - }; - -Example 2: - saw0: power-controller@f9089000 { - compatible = "qcom,apq8084-saw2-v2.1-cpu", "qcom,saw2"; - reg = <0xf9089000 0x1000>, <0xf9009000 0x1000>; - }; diff --git a/Documentation/devicetree/bindings/arm/qcom.yaml b/Documentation/devicetree/bindings/arm/qcom.yaml index 1a5fb889a444..66beaac60e1d 100644 --- a/Documentation/devicetree/bindings/arm/qcom.yaml +++ b/Documentation/devicetree/bindings/arm/qcom.yaml @@ -10,17 +10,10 @@ maintainers: - Bjorn Andersson <bjorn.andersson@linaro.org> description: | - Some qcom based bootloaders identify the dtb blob based on a set of - device properties like SoC and platform and revisions of those components. - To support this scheme, we encode this information into the board compatible - string. - - Each board must specify a top-level board compatible string with the following - format: - - compatible = "qcom,<SoC>[-<soc_version>][-<foundry_id>]-<board>[/<subtype>][-<board_version>]" - - The 'SoC' and 'board' elements are required. All other elements are optional. + For devices using the Qualcomm SoC the "compatible" properties consists of + one or several "manufacturer,model" strings, describing the device itself, + followed by one or several "qcom,<SoC>" strings, describing the SoC used in + the device. The 'SoC' element must be one of the following strings: @@ -90,43 +83,9 @@ description: | sm8650 x1e80100 - The 'board' element must be one of the following strings: - - adp - cdp - dragonboard - idp - liquid - mtp - qcp - qrd - rb2 - ride - sbc - x100 - - The 'soc_version' and 'board_version' elements take the form of v<Major>.<Minor> - where the minor number may be omitted when it's zero, i.e. v1.0 is the same - as v1. If all versions of the 'board_version' elements match, then a - wildcard '*' should be used, e.g. 'v*'. - - The 'foundry_id' and 'subtype' elements are one or more digits from 0 to 9. - - Examples: - - "qcom,msm8916-v1-cdp-pm8916-v2.1" - - A CDP board with an msm8916 SoC, version 1 paired with a pm8916 PMIC of version - 2.1. - - "qcom,apq8074-v2.0-2-dragonboard/1-v0.1" - - A dragonboard board v0.1 of subtype 1 with an apq8074 SoC version 2, made in - foundry 2. - There are many devices in the list below that run the standard ChromeOS bootloader setup and use the open source depthcharge bootloader to boot the - OS. These devices do not use the scheme described above. For details, see: + OS. These devices use the bootflow explained at https://docs.kernel.org/arch/arm/google/chromebook-boot-flow.html properties: @@ -187,6 +146,7 @@ properties: - microsoft,superman-lte - microsoft,tesla - motorola,peregrine + - samsung,matisselte - const: qcom,msm8926 - const: qcom,msm8226 @@ -244,11 +204,15 @@ properties: - samsung,a5u-eur - samsung,e5 - samsung,e7 + - samsung,fortuna3g + - samsung,gprimeltecan - samsung,grandmax + - samsung,grandprimelte - samsung,gt510 - samsung,gt58 - samsung,j5 - samsung,j5x + - samsung,rossa - samsung,serranove - thwc,uf896 - thwc,ufi001c @@ -988,6 +952,7 @@ properties: - items: - enum: + - xiaomi,curtana - xiaomi,joyeuse - const: qcom,sm7125 @@ -1035,6 +1000,7 @@ properties: - items: - enum: + - qcom,sm8550-hdk - qcom,sm8550-mtp - qcom,sm8550-qrd - const: qcom,sm8550 diff --git a/Documentation/devicetree/bindings/arm/rockchip.yaml b/Documentation/devicetree/bindings/arm/rockchip.yaml index 5cf5cbef2cf5..fcf7316ecd74 100644 --- a/Documentation/devicetree/bindings/arm/rockchip.yaml +++ b/Documentation/devicetree/bindings/arm/rockchip.yaml @@ -37,29 +37,16 @@ properties: - anbernic,rg351v - const: rockchip,rk3326 - - description: Anbernic RG353P + - description: Anbernic RK3566 Handheld Gaming Console items: - - const: anbernic,rg353p - - const: rockchip,rk3566 - - - description: Anbernic RG353PS - items: - - const: anbernic,rg353ps - - const: rockchip,rk3566 - - - description: Anbernic RG353V - items: - - const: anbernic,rg353v - - const: rockchip,rk3566 - - - description: Anbernic RG353VS - items: - - const: anbernic,rg353vs - - const: rockchip,rk3566 - - - description: Anbernic RG503 - items: - - const: anbernic,rg503 + - enum: + - anbernic,rg353p + - anbernic,rg353ps + - anbernic,rg353v + - anbernic,rg353vs + - anbernic,rg503 + - anbernic,rg-arc-d + - anbernic,rg-arc-s - const: rockchip,rk3566 - description: Asus Tinker board @@ -237,6 +224,13 @@ properties: - friendlyarm,nanopi-r5s - const: rockchip,rk3568 + - description: FriendlyElec NanoPi R6 series boards + items: + - enum: + - friendlyarm,nanopi-r6c + - friendlyarm,nanopi-r6s + - const: rockchip,rk3588s + - description: FriendlyElec NanoPC T6 items: - const: friendlyarm,nanopc-t6 @@ -626,9 +620,9 @@ properties: - const: openailab,eaidk-610 - const: rockchip,rk3399 - - description: Orange Pi RK3399 board + - description: Xunlong Orange Pi RK3399 board items: - - const: rockchip,rk3399-orangepi + - const: xunlong,rk3399-orangepi - const: rockchip,rk3399 - description: Phytec phyCORE-RK3288 Rapid Development Kit @@ -655,6 +649,14 @@ properties: - const: pine64,pinephone-pro - const: rockchip,rk3399 + - description: Pine64 PineTab2 + items: + - enum: + - pine64,pinetab2-v0.1 + - pine64,pinetab2-v2.0 + - const: pine64,pinetab2 + - const: rockchip,rk3566 + - description: Pine64 Rock64 items: - const: pine64,rock64 @@ -692,11 +694,17 @@ properties: - description: Powkiddy RK3566 Handheld Gaming Console items: - enum: + - powkiddy,rgb10max3 - powkiddy,rgb30 - powkiddy,rk2023 - powkiddy,x55 - const: rockchip,rk3566 + - description: QNAP TS-433-4G 4-Bay NAS + items: + - const: qnap,ts433 + - const: rockchip,rk3568 + - description: Radxa Compute Module 3(CM3) items: - enum: @@ -878,6 +886,11 @@ properties: - const: rockchip,rv1108-evb - const: rockchip,rv1108 + - description: Rockchip Toybrick TB-RK3588X board + items: + - const: rockchip,rk3588-toybrick-x0 + - const: rockchip,rk3588 + - description: Theobroma Systems PX30-uQ7 with Haikou baseboard items: - const: tsd,px30-ringneck-haikou @@ -898,6 +911,12 @@ properties: - const: tsd,rk3588-jaguar - const: rockchip,rk3588 + - description: Theobroma Systems RK3588-Q7 with Haikou baseboard + items: + - const: tsd,rk3588-tiger-haikou + - const: tsd,rk3588-tiger + - const: rockchip,rk3588 + - description: Tronsmart Orion R68 Meta items: - const: tronsmart,orion-r68-meta @@ -940,9 +959,9 @@ properties: - const: rockchip,rk3568-evb1-v10 - const: rockchip,rk3568 - - description: Rockchip RK3568 Banana Pi R2 Pro + - description: Sinovoip RK3568 Banana Pi R2 Pro items: - - const: rockchip,rk3568-bpi-r2pro + - const: sinovoip,rk3568-bpi-r2pro - const: rockchip,rk3568 - description: Sonoff iHost Smart Home Hub diff --git a/Documentation/devicetree/bindings/arm/sunxi.yaml b/Documentation/devicetree/bindings/arm/sunxi.yaml index a9d8e85565b8..09d835db6db5 100644 --- a/Documentation/devicetree/bindings/arm/sunxi.yaml +++ b/Documentation/devicetree/bindings/arm/sunxi.yaml @@ -815,6 +815,12 @@ properties: - const: allwinner,r7-tv-dongle - const: allwinner,sun5i-a10s + - description: Remix Mini PC + items: + - const: jide,remix-mini-pc + - const: allwinner,sun50i-h64 + - const: allwinner,sun50i-a64 + - description: RerVision H3-DVK items: - const: rervision,h3-dvk @@ -835,6 +841,12 @@ properties: - const: sinlinx,sina33 - const: allwinner,sun8i-a33 + - description: Sipeed Longan Pi 3H board for the Sipeed Longan Module 3H + items: + - const: sipeed,longan-pi-3h + - const: sipeed,longan-module-3h + - const: allwinner,sun50i-h618 + - description: SourceParts PopStick v1.1 items: - const: sourceparts,popstick-v1.1 diff --git a/Documentation/devicetree/bindings/arm/tegra.yaml b/Documentation/devicetree/bindings/arm/tegra.yaml index fcf956406168..8fb4923517d0 100644 --- a/Documentation/devicetree/bindings/arm/tegra.yaml +++ b/Documentation/devicetree/bindings/arm/tegra.yaml @@ -64,6 +64,14 @@ properties: - items: - const: asus,tf700t - const: nvidia,tegra30 + - description: LG Optimus 4X P880 + items: + - const: lg,p880 + - const: nvidia,tegra30 + - description: LG Optimus Vu P895 + items: + - const: lg,p895 + - const: nvidia,tegra30 - items: - const: toradex,apalis_t30-eval - const: toradex,apalis_t30 diff --git a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra186-pmc.yaml b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra186-pmc.yaml index 0faa403f68c8..ea4fbf655220 100644 --- a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra186-pmc.yaml +++ b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra186-pmc.yaml @@ -27,7 +27,7 @@ properties: - const: pmc - const: wake - const: aotag - - const: scratch + - enum: [ scratch, misc ] - const: misc interrupt-controller: true @@ -41,25 +41,43 @@ properties: description: If present, inverts the PMU interrupt signal. $ref: /schemas/types.yaml#/definitions/flag -if: - properties: - compatible: - contains: - const: nvidia,tegra186-pmc -then: - properties: - reg: - maxItems: 4 - - reg-names: - maxItems: 4 -else: - properties: - reg: - minItems: 5 - - reg-names: - minItems: 5 +allOf: + - if: + properties: + compatible: + contains: + const: nvidia,tegra186-pmc + then: + properties: + reg: + maxItems: 4 + reg-names: + maxItems: 4 + contains: + const: scratch + + - if: + properties: + compatible: + contains: + const: nvidia,tegra194-pmc + then: + properties: + reg: + minItems: 5 + reg-names: + minItems: 5 + + - if: + properties: + compatible: + contains: + const: nvidia,tegra234-pmc + then: + properties: + reg-names: + contains: + const: misc patternProperties: "^[a-z0-9]+-[a-z0-9]+$": diff --git a/Documentation/devicetree/bindings/arm/ti/k3.yaml b/Documentation/devicetree/bindings/arm/ti/k3.yaml index c6506bccfe88..52b51fd7044e 100644 --- a/Documentation/devicetree/bindings/arm/ti/k3.yaml +++ b/Documentation/devicetree/bindings/arm/ti/k3.yaml @@ -87,12 +87,20 @@ properties: - const: tq,am642-tqma6442l - const: ti,am642 + - description: K3 AM642 SoC SolidRun SoM based boards + items: + - enum: + - solidrun,am642-hummingboard-t + - const: solidrun,am642-sr-som + - const: ti,am642 + - description: K3 AM654 SoC items: - enum: - siemens,iot2050-advanced - siemens,iot2050-advanced-m2 - siemens,iot2050-advanced-pg2 + - siemens,iot2050-advanced-sm - siemens,iot2050-basic - siemens,iot2050-basic-pg2 - ti,am654-evm @@ -123,6 +131,12 @@ properties: - ti,j721s2-evm - const: ti,j721s2 + - description: K3 J722S SoC and Boards + items: + - enum: + - ti,j722s-evm + - const: ti,j722s + - description: K3 J784s4 SoC items: - enum: diff --git a/Documentation/devicetree/bindings/ata/ahci-mtk.txt b/Documentation/devicetree/bindings/ata/ahci-mtk.txt deleted file mode 100644 index d2aa696b161b..000000000000 --- a/Documentation/devicetree/bindings/ata/ahci-mtk.txt +++ /dev/null @@ -1,51 +0,0 @@ -MediaTek Serial ATA controller - -Required properties: - - compatible : Must be "mediatek,<chip>-ahci", "mediatek,mtk-ahci". - When using "mediatek,mtk-ahci" compatible strings, you - need SoC specific ones in addition, one of: - - "mediatek,mt7622-ahci" - - reg : Physical base addresses and length of register sets. - - interrupts : Interrupt associated with the SATA device. - - interrupt-names : Associated name must be: "hostc". - - clocks : A list of phandle and clock specifier pairs, one for each - entry in clock-names. - - clock-names : Associated names must be: "ahb", "axi", "asic", "rbc", "pm". - - phys : A phandle and PHY specifier pair for the PHY port. - - phy-names : Associated name must be: "sata-phy". - - ports-implemented : See ./ahci-platform.txt for details. - -Optional properties: - - power-domains : A phandle and power domain specifier pair to the power - domain which is responsible for collapsing and restoring - power to the peripheral. - - resets : Must contain an entry for each entry in reset-names. - See ../reset/reset.txt for details. - - reset-names : Associated names must be: "axi", "sw", "reg". - - mediatek,phy-mode : A phandle to the system controller, used to enable - SATA function. - -Example: - - sata: sata@1a200000 { - compatible = "mediatek,mt7622-ahci", - "mediatek,mtk-ahci"; - reg = <0 0x1a200000 0 0x1100>; - interrupts = <GIC_SPI 233 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "hostc"; - clocks = <&pciesys CLK_SATA_AHB_EN>, - <&pciesys CLK_SATA_AXI_EN>, - <&pciesys CLK_SATA_ASIC_EN>, - <&pciesys CLK_SATA_RBC_EN>, - <&pciesys CLK_SATA_PM_EN>; - clock-names = "ahb", "axi", "asic", "rbc", "pm"; - phys = <&u3port1 PHY_TYPE_SATA>; - phy-names = "sata-phy"; - ports-implemented = <0x1>; - power-domains = <&scpsys MT7622_POWER_DOMAIN_HIF0>; - resets = <&pciesys MT7622_SATA_AXI_BUS_RST>, - <&pciesys MT7622_SATA_PHY_SW_RST>, - <&pciesys MT7622_SATA_PHY_REG_RST>; - reset-names = "axi", "sw", "reg"; - mediatek,phy-mode = <&pciesys>; - }; diff --git a/Documentation/devicetree/bindings/ata/atmel-at91_cf.txt b/Documentation/devicetree/bindings/ata/atmel-at91_cf.txt deleted file mode 100644 index c1d22b3ae134..000000000000 --- a/Documentation/devicetree/bindings/ata/atmel-at91_cf.txt +++ /dev/null @@ -1,19 +0,0 @@ -Atmel AT91RM9200 CompactFlash - -Required properties: -- compatible : "atmel,at91rm9200-cf". -- reg : should specify localbus address and size used. -- gpios : specifies the gpio pins to control the CF device. Detect - and reset gpio's are mandatory while irq and vcc gpio's are - optional and may be set to 0 if not present. - -Example: -compact-flash@50000000 { - compatible = "atmel,at91rm9200-cf"; - reg = <0x50000000 0x30000000>; - gpios = <&pioC 13 0 /* irq */ - &pioC 15 0 /* detect */ - 0 /* vcc */ - &pioC 5 0 /* reset */ - >; -}; diff --git a/Documentation/devicetree/bindings/ata/ceva,ahci-1v84.yaml b/Documentation/devicetree/bindings/ata/ceva,ahci-1v84.yaml index b29ce598f9aa..9952e0ef7767 100644 --- a/Documentation/devicetree/bindings/ata/ceva,ahci-1v84.yaml +++ b/Documentation/devicetree/bindings/ata/ceva,ahci-1v84.yaml @@ -7,7 +7,8 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# title: Ceva AHCI SATA Controller maintainers: - - Piyush Mehta <piyush.mehta@amd.com> + - Mubin Sayyed <mubin.sayyed@amd.com> + - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> description: | The Ceva SATA controller mostly conforms to the AHCI interface with some diff --git a/Documentation/devicetree/bindings/ata/mediatek,mtk-ahci.yaml b/Documentation/devicetree/bindings/ata/mediatek,mtk-ahci.yaml new file mode 100644 index 000000000000..a34bd2e9c352 --- /dev/null +++ b/Documentation/devicetree/bindings/ata/mediatek,mtk-ahci.yaml @@ -0,0 +1,98 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/ata/mediatek,mtk-ahci.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: MediaTek Serial ATA controller + +maintainers: + - Ryder Lee <ryder.lee@mediatek.com> + +allOf: + - $ref: ahci-common.yaml# + +properties: + compatible: + items: + - enum: + - mediatek,mt7622-ahci + - const: mediatek,mtk-ahci + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + interrupt-names: + const: hostc + + clocks: + maxItems: 5 + + clock-names: + items: + - const: ahb + - const: axi + - const: asic + - const: rbc + - const: pm + + power-domains: + maxItems: 1 + + resets: + maxItems: 3 + + reset-names: + items: + - const: axi + - const: sw + - const: reg + + mediatek,phy-mode: + description: System controller phandle, used to enable SATA function + $ref: /schemas/types.yaml#/definitions/phandle + +required: + - reg + - interrupts + - interrupt-names + - clocks + - clock-names + - phys + - phy-names + - ports-implemented + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/clock/mt7622-clk.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + #include <dt-bindings/phy/phy.h> + #include <dt-bindings/power/mt7622-power.h> + #include <dt-bindings/reset/mt7622-reset.h> + + sata@1a200000 { + compatible = "mediatek,mt7622-ahci", "mediatek,mtk-ahci"; + reg = <0x1a200000 0x1100>; + interrupts = <GIC_SPI 233 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "hostc"; + clocks = <&pciesys CLK_SATA_AHB_EN>, + <&pciesys CLK_SATA_AXI_EN>, + <&pciesys CLK_SATA_ASIC_EN>, + <&pciesys CLK_SATA_RBC_EN>, + <&pciesys CLK_SATA_PM_EN>; + clock-names = "ahb", "axi", "asic", "rbc", "pm"; + phys = <&u3port1 PHY_TYPE_SATA>; + phy-names = "sata-phy"; + ports-implemented = <0x1>; + power-domains = <&scpsys MT7622_POWER_DOMAIN_HIF0>; + resets = <&pciesys MT7622_SATA_AXI_BUS_RST>, + <&pciesys MT7622_SATA_PHY_SW_RST>, + <&pciesys MT7622_SATA_PHY_REG_RST>; + reset-names = "axi", "sw", "reg"; + mediatek,phy-mode = <&pciesys>; + }; diff --git a/Documentation/devicetree/bindings/auxdisplay/arm,versatile-lcd.yaml b/Documentation/devicetree/bindings/auxdisplay/arm,versatile-lcd.yaml index 5d02bd032a85..439f7b811a94 100644 --- a/Documentation/devicetree/bindings/auxdisplay/arm,versatile-lcd.yaml +++ b/Documentation/devicetree/bindings/auxdisplay/arm,versatile-lcd.yaml @@ -39,6 +39,6 @@ additionalProperties: false examples: - | lcd@10008000 { - compatible = "arm,versatile-lcd"; - reg = <0x10008000 0x1000>; + compatible = "arm,versatile-lcd"; + reg = <0x10008000 0x1000>; }; diff --git a/Documentation/devicetree/bindings/auxdisplay/gpio-7-segment.yaml b/Documentation/devicetree/bindings/auxdisplay/gpio-7-segment.yaml new file mode 100644 index 000000000000..328954893c64 --- /dev/null +++ b/Documentation/devicetree/bindings/auxdisplay/gpio-7-segment.yaml @@ -0,0 +1,55 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/auxdisplay/gpio-7-segment.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: GPIO based LED segment display + +maintainers: + - Chris Packham <chris.packham@alliedtelesis.co.nz> + +properties: + compatible: + const: gpio-7-segment + + segment-gpios: + description: | + An array of GPIOs one per segment. The first GPIO corresponds to the A + segment, the seventh GPIO corresponds to the G segment. Some LED blocks + also have a decimal point which can be specified as an optional eighth + segment. + + -a- + | | + f b + | | + -g- + | | + e c + | | + -d- dp + + minItems: 7 + maxItems: 8 + +required: + - segment-gpios + +additionalProperties: false + +examples: + - | + + #include <dt-bindings/gpio/gpio.h> + + led-7seg { + compatible = "gpio-7-segment"; + segment-gpios = <&gpio 0 GPIO_ACTIVE_LOW>, + <&gpio 1 GPIO_ACTIVE_LOW>, + <&gpio 2 GPIO_ACTIVE_LOW>, + <&gpio 3 GPIO_ACTIVE_LOW>, + <&gpio 4 GPIO_ACTIVE_LOW>, + <&gpio 5 GPIO_ACTIVE_LOW>, + <&gpio 6 GPIO_ACTIVE_LOW>; + }; diff --git a/Documentation/devicetree/bindings/auxdisplay/hit,hd44780.yaml b/Documentation/devicetree/bindings/auxdisplay/hit,hd44780.yaml index 406a922a714e..3ca0e9863d83 100644 --- a/Documentation/devicetree/bindings/auxdisplay/hit,hd44780.yaml +++ b/Documentation/devicetree/bindings/auxdisplay/hit,hd44780.yaml @@ -84,42 +84,44 @@ additionalProperties: false examples: - | #include <dt-bindings/gpio/gpio.h> - auxdisplay { - compatible = "hit,hd44780"; - - data-gpios = <&hc595 0 GPIO_ACTIVE_HIGH>, - <&hc595 1 GPIO_ACTIVE_HIGH>, - <&hc595 2 GPIO_ACTIVE_HIGH>, - <&hc595 3 GPIO_ACTIVE_HIGH>; - enable-gpios = <&hc595 4 GPIO_ACTIVE_HIGH>; - rs-gpios = <&hc595 5 GPIO_ACTIVE_HIGH>; - - display-height-chars = <2>; - display-width-chars = <16>; + display-controller { + compatible = "hit,hd44780"; + + data-gpios = <&hc595 0 GPIO_ACTIVE_HIGH>, + <&hc595 1 GPIO_ACTIVE_HIGH>, + <&hc595 2 GPIO_ACTIVE_HIGH>, + <&hc595 3 GPIO_ACTIVE_HIGH>; + enable-gpios = <&hc595 4 GPIO_ACTIVE_HIGH>; + rs-gpios = <&hc595 5 GPIO_ACTIVE_HIGH>; + + display-height-chars = <2>; + display-width-chars = <16>; }; + - | #include <dt-bindings/gpio/gpio.h> i2c { - #address-cells = <1>; - #size-cells = <0>; - - pcf8574: pcf8574@27 { - compatible = "nxp,pcf8574"; - reg = <0x27>; - gpio-controller; - #gpio-cells = <2>; - }; + #address-cells = <1>; + #size-cells = <0>; + + pcf8574: gpio-expander@27 { + compatible = "nxp,pcf8574"; + reg = <0x27>; + gpio-controller; + #gpio-cells = <2>; + }; }; - hd44780 { - compatible = "hit,hd44780"; - display-height-chars = <2>; - display-width-chars = <16>; - data-gpios = <&pcf8574 4 0>, - <&pcf8574 5 0>, - <&pcf8574 6 0>, - <&pcf8574 7 0>; - enable-gpios = <&pcf8574 2 0>; - rs-gpios = <&pcf8574 0 0>; - rw-gpios = <&pcf8574 1 0>; - backlight-gpios = <&pcf8574 3 0>; + + display-controller { + compatible = "hit,hd44780"; + display-height-chars = <2>; + display-width-chars = <16>; + data-gpios = <&pcf8574 4 GPIO_ACTIVE_HIGH>, + <&pcf8574 5 GPIO_ACTIVE_HIGH>, + <&pcf8574 6 GPIO_ACTIVE_HIGH>, + <&pcf8574 7 GPIO_ACTIVE_HIGH>; + enable-gpios = <&pcf8574 2 GPIO_ACTIVE_HIGH>; + rs-gpios = <&pcf8574 0 GPIO_ACTIVE_HIGH>; + rw-gpios = <&pcf8574 1 GPIO_ACTIVE_HIGH>; + backlight-gpios = <&pcf8574 3 GPIO_ACTIVE_HIGH>; }; diff --git a/Documentation/devicetree/bindings/auxdisplay/holtek,ht16k33.yaml b/Documentation/devicetree/bindings/auxdisplay/holtek,ht16k33.yaml index be95f6b97b41..b90eec2077b4 100644 --- a/Documentation/devicetree/bindings/auxdisplay/holtek,ht16k33.yaml +++ b/Documentation/devicetree/bindings/auxdisplay/holtek,ht16k33.yaml @@ -74,31 +74,31 @@ examples: #include <dt-bindings/input/input.h> #include <dt-bindings/leds/common.h> i2c { - #address-cells = <1>; - #size-cells = <0>; - - ht16k33: ht16k33@70 { - compatible = "holtek,ht16k33"; - reg = <0x70>; - refresh-rate-hz = <20>; - interrupt-parent = <&gpio4>; - interrupts = <5 (IRQ_TYPE_LEVEL_HIGH | IRQ_TYPE_EDGE_RISING)>; - debounce-delay-ms = <50>; - linux,keymap = <MATRIX_KEY(2, 0, KEY_F6)>, - <MATRIX_KEY(3, 0, KEY_F8)>, - <MATRIX_KEY(4, 0, KEY_F10)>, - <MATRIX_KEY(5, 0, KEY_F4)>, - <MATRIX_KEY(6, 0, KEY_F2)>, - <MATRIX_KEY(2, 1, KEY_F5)>, - <MATRIX_KEY(3, 1, KEY_F7)>, - <MATRIX_KEY(4, 1, KEY_F9)>, - <MATRIX_KEY(5, 1, KEY_F3)>, - <MATRIX_KEY(6, 1, KEY_F1)>; - - led { - color = <LED_COLOR_ID_RED>; - function = LED_FUNCTION_BACKLIGHT; - linux,default-trigger = "backlight"; - }; + #address-cells = <1>; + #size-cells = <0>; + + display-controller@70 { + compatible = "holtek,ht16k33"; + reg = <0x70>; + refresh-rate-hz = <20>; + interrupt-parent = <&gpio4>; + interrupts = <5 (IRQ_TYPE_LEVEL_HIGH | IRQ_TYPE_EDGE_RISING)>; + debounce-delay-ms = <50>; + linux,keymap = <MATRIX_KEY(2, 0, KEY_F6)>, + <MATRIX_KEY(3, 0, KEY_F8)>, + <MATRIX_KEY(4, 0, KEY_F10)>, + <MATRIX_KEY(5, 0, KEY_F4)>, + <MATRIX_KEY(6, 0, KEY_F2)>, + <MATRIX_KEY(2, 1, KEY_F5)>, + <MATRIX_KEY(3, 1, KEY_F7)>, + <MATRIX_KEY(4, 1, KEY_F9)>, + <MATRIX_KEY(5, 1, KEY_F3)>, + <MATRIX_KEY(6, 1, KEY_F1)>; + + led { + color = <LED_COLOR_ID_RED>; + function = LED_FUNCTION_BACKLIGHT; + linux,default-trigger = "backlight"; }; - }; + }; + }; diff --git a/Documentation/devicetree/bindings/auxdisplay/img,ascii-lcd.yaml b/Documentation/devicetree/bindings/auxdisplay/img,ascii-lcd.yaml index 1899b23de7d1..55e9831b3f67 100644 --- a/Documentation/devicetree/bindings/auxdisplay/img,ascii-lcd.yaml +++ b/Documentation/devicetree/bindings/auxdisplay/img,ascii-lcd.yaml @@ -50,6 +50,6 @@ additionalProperties: false examples: - | lcd: lcd@17fff000 { - compatible = "img,boston-lcd"; - reg = <0x17fff000 0x8>; + compatible = "img,boston-lcd"; + reg = <0x17fff000 0x8>; }; diff --git a/Documentation/devicetree/bindings/auxdisplay/maxim,max6959.yaml b/Documentation/devicetree/bindings/auxdisplay/maxim,max6959.yaml new file mode 100644 index 000000000000..20dd9e8c8190 --- /dev/null +++ b/Documentation/devicetree/bindings/auxdisplay/maxim,max6959.yaml @@ -0,0 +1,44 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/auxdisplay/maxim,max6959.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: MAX6958/6959 7-segment LED display controller + +maintainers: + - Andy Shevchenko <andriy.shevchenko@linux.intel.com> + +description: + The Maxim MAX6958/6959 7-segment LED display controller provides + an I2C interface to up to four 7-segment LED digits. The MAX6959, + in comparison to MAX6958, adds input support. Type of the chip can + be autodetected via specific register read, and hence the features + may be enabled in the driver at run-time, in case they are requested + via Device Tree. A given hardware is simple and does not provide + any additional pins, such as reset or power enable. + +properties: + compatible: + const: maxim,max6959 + + reg: + maxItems: 1 + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - | + i2c { + #address-cells = <1>; + #size-cells = <0>; + + display-controller@38 { + compatible = "maxim,max6959"; + reg = <0x38>; + }; + }; diff --git a/Documentation/devicetree/bindings/bus/imx-weim.txt b/Documentation/devicetree/bindings/bus/imx-weim.txt deleted file mode 100644 index e7f502070d77..000000000000 --- a/Documentation/devicetree/bindings/bus/imx-weim.txt +++ /dev/null @@ -1,117 +0,0 @@ -Device tree bindings for i.MX Wireless External Interface Module (WEIM) - -The term "wireless" does not imply that the WEIM is literally an interface -without wires. It simply means that this module was originally designed for -wireless and mobile applications that use low-power technology. - -The actual devices are instantiated from the child nodes of a WEIM node. - -Required properties: - - - compatible: Should contain one of the following: - "fsl,imx1-weim" - "fsl,imx27-weim" - "fsl,imx51-weim" - "fsl,imx50-weim" - "fsl,imx6q-weim" - - reg: A resource specifier for the register space - (see the example below) - - clocks: the clock, see the example below. - - #address-cells: Must be set to 2 to allow memory address translation - - #size-cells: Must be set to 1 to allow CS address passing - - ranges: Must be set up to reflect the memory layout with four - integer values for each chip-select line in use: - - <cs-number> 0 <physical address of mapping> <size> - -Optional properties: - - - fsl,weim-cs-gpr: For "fsl,imx50-weim" and "fsl,imx6q-weim" type of - devices, it should be the phandle to the system General - Purpose Register controller that contains WEIM CS GPR - register, e.g. IOMUXC_GPR1 on i.MX6Q. IOMUXC_GPR1[11:0] - should be set up as one of the following 4 possible - values depending on the CS space configuration. - - IOMUXC_GPR1[11:0] CS0 CS1 CS2 CS3 - --------------------------------------------- - 05 128M 0M 0M 0M - 033 64M 64M 0M 0M - 0113 64M 32M 32M 0M - 01111 32M 32M 32M 32M - - In case that the property is absent, the reset value or - what bootloader sets up in IOMUXC_GPR1[11:0] will be - used. - - - fsl,burst-clk-enable For "fsl,imx50-weim" and "fsl,imx6q-weim" type of - devices, the presence of this property indicates that - the weim bus should operate in Burst Clock Mode. - - - fsl,continuous-burst-clk Make Burst Clock to output continuous clock. - Without this option Burst Clock will output clock - only when necessary. This takes effect only if - "fsl,burst-clk-enable" is set. - -Timing property for child nodes. It is mandatory, not optional. - - - fsl,weim-cs-timing: The timing array, contains timing values for the - child node. We get the CS indexes from the address - ranges in the child node's "reg" property. - The number of registers depends on the selected chip: - For i.MX1, i.MX21 ("fsl,imx1-weim") there are two - registers: CSxU, CSxL. - For i.MX25, i.MX27, i.MX31 and i.MX35 ("fsl,imx27-weim") - there are three registers: CSCRxU, CSCRxL, CSCRxA. - For i.MX50, i.MX53 ("fsl,imx50-weim"), - i.MX51 ("fsl,imx51-weim") and i.MX6Q ("fsl,imx6q-weim") - there are six registers: CSxGCR1, CSxGCR2, CSxRCR1, - CSxRCR2, CSxWCR1, CSxWCR2. - -Example for an imx6q-sabreauto board, the NOR flash connected to the WEIM: - - weim: weim@21b8000 { - compatible = "fsl,imx6q-weim"; - reg = <0x021b8000 0x4000>; - clocks = <&clks 196>; - #address-cells = <2>; - #size-cells = <1>; - ranges = <0 0 0x08000000 0x08000000>; - fsl,weim-cs-gpr = <&gpr>; - - nor@0,0 { - compatible = "cfi-flash"; - reg = <0 0 0x02000000>; - #address-cells = <1>; - #size-cells = <1>; - bank-width = <2>; - fsl,weim-cs-timing = <0x00620081 0x00000001 0x1c022000 - 0x0000c000 0x1404a38e 0x00000000>; - }; - }; - -Example for an imx6q-based board, a multi-chipselect device connected to WEIM: - -In this case, both chip select 0 and 1 will be configured with the same timing -array values. - - weim: weim@21b8000 { - compatible = "fsl,imx6q-weim"; - reg = <0x021b8000 0x4000>; - clocks = <&clks 196>; - #address-cells = <2>; - #size-cells = <1>; - ranges = <0 0 0x08000000 0x02000000 - 1 0 0x0a000000 0x02000000 - 2 0 0x0c000000 0x02000000 - 3 0 0x0e000000 0x02000000>; - fsl,weim-cs-gpr = <&gpr>; - - acme@0 { - compatible = "acme,whatever"; - reg = <0 0 0x100>, <0 0x400000 0x800>, - <1 0x400000 0x800>; - fsl,weim-cs-timing = <0x024400b1 0x00001010 0x20081100 - 0x00000000 0xa0000240 0x00000000>; - }; - }; diff --git a/Documentation/devicetree/bindings/clock/google,gs101-clock.yaml b/Documentation/devicetree/bindings/clock/google,gs101-clock.yaml index 3eebc03a309b..1d2bcea41c85 100644 --- a/Documentation/devicetree/bindings/clock/google,gs101-clock.yaml +++ b/Documentation/devicetree/bindings/clock/google,gs101-clock.yaml @@ -30,14 +30,16 @@ properties: - google,gs101-cmu-top - google,gs101-cmu-apm - google,gs101-cmu-misc + - google,gs101-cmu-peric0 + - google,gs101-cmu-peric1 clocks: minItems: 1 - maxItems: 2 + maxItems: 3 clock-names: minItems: 1 - maxItems: 2 + maxItems: 3 "#clock-cells": const: 1 @@ -85,8 +87,30 @@ allOf: clock-names: items: - - const: dout_cmu_misc_bus - - const: dout_cmu_misc_sss + - const: bus + - const: sss + + - if: + properties: + compatible: + contains: + enum: + - google,gs101-cmu-peric0 + - google,gs101-cmu-peric1 + + then: + properties: + clocks: + items: + - description: External reference clock (24.576 MHz) + - description: Connectivity Peripheral 0/1 bus clock (from CMU_TOP) + - description: Connectivity Peripheral 0/1 IP clock (from CMU_TOP) + + clock-names: + items: + - const: oscclk + - const: bus + - const: ip additionalProperties: false diff --git a/Documentation/devicetree/bindings/clock/qcom,gcc-sc8180x.yaml b/Documentation/devicetree/bindings/clock/qcom,gcc-sc8180x.yaml index 6c4846b34e4b..a1085ef4fd05 100644 --- a/Documentation/devicetree/bindings/clock/qcom,gcc-sc8180x.yaml +++ b/Documentation/devicetree/bindings/clock/qcom,gcc-sc8180x.yaml @@ -31,10 +31,15 @@ properties: - const: bi_tcxo_ao - const: sleep_clk + power-domains: + items: + - description: CX domain + required: - compatible - clocks - clock-names + - power-domains allOf: - $ref: qcom,gcc.yaml# @@ -44,6 +49,7 @@ unevaluatedProperties: false examples: - | #include <dt-bindings/clock/qcom,rpmh.h> + #include <dt-bindings/power/qcom-rpmpd.h> clock-controller@100000 { compatible = "qcom,gcc-sc8180x"; reg = <0x00100000 0x1f0000>; @@ -51,6 +57,7 @@ examples: <&rpmhcc RPMH_CXO_CLK_A>, <&sleep_clk>; clock-names = "bi_tcxo", "bi_tcxo_ao", "sleep_clk"; + power-domains = <&rpmhpd SC8180X_CX>; #clock-cells = <1>; #reset-cells = <1>; #power-domain-cells = <1>; diff --git a/Documentation/devicetree/bindings/clock/qcom,sm8450-camcc.yaml b/Documentation/devicetree/bindings/clock/qcom,sm8450-camcc.yaml index 48986460f994..fa0e5b6b02b8 100644 --- a/Documentation/devicetree/bindings/clock/qcom,sm8450-camcc.yaml +++ b/Documentation/devicetree/bindings/clock/qcom,sm8450-camcc.yaml @@ -17,6 +17,7 @@ description: | include/dt-bindings/clock/qcom,sm8450-camcc.h include/dt-bindings/clock/qcom,sm8550-camcc.h include/dt-bindings/clock/qcom,sc8280xp-camcc.h + include/dt-bindings/clock/qcom,x1e80100-camcc.h allOf: - $ref: qcom,gcc.yaml# @@ -27,6 +28,7 @@ properties: - qcom,sc8280xp-camcc - qcom,sm8450-camcc - qcom,sm8550-camcc + - qcom,x1e80100-camcc clocks: items: diff --git a/Documentation/devicetree/bindings/clock/qcom,sm8450-gpucc.yaml b/Documentation/devicetree/bindings/clock/qcom,sm8450-gpucc.yaml index 1a384e8532a5..36974309cf69 100644 --- a/Documentation/devicetree/bindings/clock/qcom,sm8450-gpucc.yaml +++ b/Documentation/devicetree/bindings/clock/qcom,sm8450-gpucc.yaml @@ -18,6 +18,7 @@ description: | include/dt-bindings/clock/qcom,sm8550-gpucc.h include/dt-bindings/reset/qcom,sm8450-gpucc.h include/dt-bindings/reset/qcom,sm8650-gpucc.h + include/dt-bindings/reset/qcom,x1e80100-gpucc.h properties: compatible: @@ -25,6 +26,7 @@ properties: - qcom,sm8450-gpucc - qcom,sm8550-gpucc - qcom,sm8650-gpucc + - qcom,x1e80100-gpucc clocks: items: diff --git a/Documentation/devicetree/bindings/clock/qcom,sm8550-dispcc.yaml b/Documentation/devicetree/bindings/clock/qcom,sm8550-dispcc.yaml index c129f8c16b50..bad0260764d4 100644 --- a/Documentation/devicetree/bindings/clock/qcom,sm8550-dispcc.yaml +++ b/Documentation/devicetree/bindings/clock/qcom,sm8550-dispcc.yaml @@ -14,12 +14,17 @@ description: | Qualcomm display clock control module provides the clocks, resets and power domains on SM8550. - See also:: include/dt-bindings/clock/qcom,sm8550-dispcc.h + See also: + - include/dt-bindings/clock/qcom,sm8550-dispcc.h + - include/dt-bindings/clock/qcom,sm8650-dispcc.h + - include/dt-bindings/clock/qcom,x1e80100-dispcc.h properties: compatible: enum: - qcom,sm8550-dispcc + - qcom,sm8650-dispcc + - qcom,x1e80100-dispcc clocks: items: diff --git a/Documentation/devicetree/bindings/clock/qcom,sm8550-tcsr.yaml b/Documentation/devicetree/bindings/clock/qcom,sm8550-tcsr.yaml index af16b05eac96..48fdd562d743 100644 --- a/Documentation/devicetree/bindings/clock/qcom,sm8550-tcsr.yaml +++ b/Documentation/devicetree/bindings/clock/qcom,sm8550-tcsr.yaml @@ -23,6 +23,7 @@ properties: - enum: - qcom,sm8550-tcsr - qcom,sm8650-tcsr + - qcom,x1e80100-tcsr - const: syscon clocks: diff --git a/Documentation/devicetree/bindings/clock/qcom,sm8650-dispcc.yaml b/Documentation/devicetree/bindings/clock/qcom,sm8650-dispcc.yaml deleted file mode 100644 index 5e0c45c380f5..000000000000 --- a/Documentation/devicetree/bindings/clock/qcom,sm8650-dispcc.yaml +++ /dev/null @@ -1,106 +0,0 @@ -# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) -%YAML 1.2 ---- -$id: http://devicetree.org/schemas/clock/qcom,sm8650-dispcc.yaml# -$schema: http://devicetree.org/meta-schemas/core.yaml# - -title: Qualcomm Display Clock & Reset Controller for SM8650 - -maintainers: - - Bjorn Andersson <andersson@kernel.org> - - Neil Armstrong <neil.armstrong@linaro.org> - -description: | - Qualcomm display clock control module provides the clocks, resets and power - domains on SM8650. - - See also:: include/dt-bindings/clock/qcom,sm8650-dispcc.h - -properties: - compatible: - enum: - - qcom,sm8650-dispcc - - clocks: - items: - - description: Board XO source - - description: Board Always On XO source - - description: Display's AHB clock - - description: sleep clock - - description: Byte clock from DSI PHY0 - - description: Pixel clock from DSI PHY0 - - description: Byte clock from DSI PHY1 - - description: Pixel clock from DSI PHY1 - - description: Link clock from DP PHY0 - - description: VCO DIV clock from DP PHY0 - - description: Link clock from DP PHY1 - - description: VCO DIV clock from DP PHY1 - - description: Link clock from DP PHY2 - - description: VCO DIV clock from DP PHY2 - - description: Link clock from DP PHY3 - - description: VCO DIV clock from DP PHY3 - - '#clock-cells': - const: 1 - - '#reset-cells': - const: 1 - - '#power-domain-cells': - const: 1 - - reg: - maxItems: 1 - - power-domains: - description: - A phandle and PM domain specifier for the MMCX power domain. - maxItems: 1 - - required-opps: - description: - A phandle to an OPP node describing required MMCX performance point. - maxItems: 1 - -required: - - compatible - - reg - - clocks - - '#clock-cells' - - '#reset-cells' - - '#power-domain-cells' - -additionalProperties: false - -examples: - - | - #include <dt-bindings/clock/qcom,sm8650-gcc.h> - #include <dt-bindings/clock/qcom,rpmh.h> - #include <dt-bindings/power/qcom-rpmpd.h> - #include <dt-bindings/power/qcom,rpmhpd.h> - clock-controller@af00000 { - compatible = "qcom,sm8650-dispcc"; - reg = <0x0af00000 0x10000>; - clocks = <&rpmhcc RPMH_CXO_CLK>, - <&rpmhcc RPMH_CXO_CLK_A>, - <&gcc GCC_DISP_AHB_CLK>, - <&sleep_clk>, - <&dsi0_phy 0>, - <&dsi0_phy 1>, - <&dsi1_phy 0>, - <&dsi1_phy 1>, - <&dp0_phy 0>, - <&dp0_phy 1>, - <&dp1_phy 0>, - <&dp1_phy 1>, - <&dp2_phy 0>, - <&dp2_phy 1>, - <&dp3_phy 0>, - <&dp3_phy 1>; - #clock-cells = <1>; - #reset-cells = <1>; - #power-domain-cells = <1>; - power-domains = <&rpmhpd RPMHPD_MMCX>; - required-opps = <&rpmhpd_opp_low_svs>; - }; -... diff --git a/Documentation/devicetree/bindings/clock/renesas,cpg-mssr.yaml b/Documentation/devicetree/bindings/clock/renesas,cpg-mssr.yaml index 9c3dc6c4fa94..084259d30232 100644 --- a/Documentation/devicetree/bindings/clock/renesas,cpg-mssr.yaml +++ b/Documentation/devicetree/bindings/clock/renesas,cpg-mssr.yaml @@ -50,6 +50,7 @@ properties: - renesas,r8a779a0-cpg-mssr # R-Car V3U - renesas,r8a779f0-cpg-mssr # R-Car S4-8 - renesas,r8a779g0-cpg-mssr # R-Car V4H + - renesas,r8a779h0-cpg-mssr # R-Car V4M reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/display/bridge/fsl,imx8mp-hdmi-tx.yaml b/Documentation/devicetree/bindings/display/bridge/fsl,imx8mp-hdmi-tx.yaml new file mode 100644 index 000000000000..3791c9f4ebab --- /dev/null +++ b/Documentation/devicetree/bindings/display/bridge/fsl,imx8mp-hdmi-tx.yaml @@ -0,0 +1,102 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/display/bridge/fsl,imx8mp-hdmi-tx.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Freescale i.MX8MP DWC HDMI TX Encoder + +maintainers: + - Lucas Stach <l.stach@pengutronix.de> + +description: + The i.MX8MP HDMI transmitter is a Synopsys DesignWare + HDMI 2.0a TX controller IP. + +allOf: + - $ref: /schemas/display/bridge/synopsys,dw-hdmi.yaml# + +properties: + compatible: + enum: + - fsl,imx8mp-hdmi-tx + + reg-io-width: + const: 1 + + clocks: + maxItems: 4 + + clock-names: + items: + - const: iahb + - const: isfr + - const: cec + - const: pix + + power-domains: + maxItems: 1 + + ports: + $ref: /schemas/graph.yaml#/properties/ports + + properties: + port@0: + $ref: /schemas/graph.yaml#/properties/port + description: Parallel RGB input port + + port@1: + $ref: /schemas/graph.yaml#/properties/port + description: HDMI output port + + required: + - port@0 + - port@1 + +required: + - compatible + - reg + - clocks + - clock-names + - interrupts + - power-domains + - ports + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/clock/imx8mp-clock.h> + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/power/imx8mp-power.h> + + hdmi@32fd8000 { + compatible = "fsl,imx8mp-hdmi-tx"; + reg = <0x32fd8000 0x7eff>; + interrupts = <0 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&clk IMX8MP_CLK_HDMI_APB>, + <&clk IMX8MP_CLK_HDMI_REF_266M>, + <&clk IMX8MP_CLK_32K>, + <&hdmi_tx_phy>; + clock-names = "iahb", "isfr", "cec", "pix"; + power-domains = <&hdmi_blk_ctrl IMX8MP_HDMIBLK_PD_HDMI_TX>; + reg-io-width = <1>; + ports { + #address-cells = <1>; + #size-cells = <0>; + port@0 { + reg = <0>; + + hdmi_tx_from_pvi: endpoint { + remote-endpoint = <&pvi_to_hdmi_tx>; + }; + }; + + port@1 { + reg = <1>; + hdmi_tx_out: endpoint { + remote-endpoint = <&hdmi0_con>; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/display/bridge/nxp,tda998x.yaml b/Documentation/devicetree/bindings/display/bridge/nxp,tda998x.yaml index 21d995f29a1e..b8e9cf6ce4e6 100644 --- a/Documentation/devicetree/bindings/display/bridge/nxp,tda998x.yaml +++ b/Documentation/devicetree/bindings/display/bridge/nxp,tda998x.yaml @@ -29,19 +29,22 @@ properties: audio-ports: description: - Array of 8-bit values, 2 values per DAI (Documentation/sound/soc/dai.rst). + Array of 2 values per DAI (Documentation/sound/soc/dai.rst). The implementation allows one or two DAIs. If two DAIs are defined, they must be of different type. $ref: /schemas/types.yaml#/definitions/uint32-matrix + minItems: 1 + maxItems: 2 items: - minItems: 1 items: - description: | The first value defines the DAI type: TDA998x_SPDIF or TDA998x_I2S (see include/dt-bindings/display/tda998x.h). + enum: [ 1, 2 ] - description: The second value defines the tda998x AP_ENA reg content when the DAI in question is used. + maximum: 0xff '#sound-dai-cells': enum: [ 0, 1 ] diff --git a/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi86.yaml b/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi86.yaml index 6ec6d287bff4..c93878b6d718 100644 --- a/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi86.yaml +++ b/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi86.yaml @@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# title: SN65DSI86 DSI to eDP bridge chip maintainers: - - Sandeep Panda <spanda@codeaurora.org> + - Douglas Anderson <dianders@chromium.org> description: | The Texas Instruments SN65DSI86 bridge takes MIPI DSI in and outputs eDP. diff --git a/Documentation/devicetree/bindings/display/imx/fsl,imx8mp-hdmi-pvi.yaml b/Documentation/devicetree/bindings/display/imx/fsl,imx8mp-hdmi-pvi.yaml new file mode 100644 index 000000000000..56da1636014c --- /dev/null +++ b/Documentation/devicetree/bindings/display/imx/fsl,imx8mp-hdmi-pvi.yaml @@ -0,0 +1,84 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/display/imx/fsl,imx8mp-hdmi-pvi.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Freescale i.MX8MP HDMI Parallel Video Interface + +maintainers: + - Lucas Stach <l.stach@pengutronix.de> + +description: + The HDMI parallel video interface is a timing and sync generator block in the + i.MX8MP SoC, that sits between the video source and the HDMI TX controller. + +properties: + compatible: + const: fsl,imx8mp-hdmi-pvi + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + power-domains: + maxItems: 1 + + ports: + $ref: /schemas/graph.yaml#/properties/ports + + properties: + port@0: + $ref: /schemas/graph.yaml#/properties/port + description: Input from the LCDIF controller. + + port@1: + $ref: /schemas/graph.yaml#/properties/port + description: Output to the HDMI TX controller. + + required: + - port@0 + - port@1 + +required: + - compatible + - reg + - interrupts + - power-domains + - ports + +additionalProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/power/imx8mp-power.h> + + display-bridge@32fc4000 { + compatible = "fsl,imx8mp-hdmi-pvi"; + reg = <0x32fc4000 0x44>; + interrupt-parent = <&irqsteer_hdmi>; + interrupts = <12 IRQ_TYPE_LEVEL_HIGH>; + power-domains = <&hdmi_blk_ctrl IMX8MP_HDMIBLK_PD_PVI>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + pvi_from_lcdif3: endpoint { + remote-endpoint = <&lcdif3_to_pvi>; + }; + }; + + port@1 { + reg = <1>; + pvi_to_hdmi_tx: endpoint { + remote-endpoint = <&hdmi_tx_from_pvi>; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/display/msm/dsi-controller-main.yaml b/Documentation/devicetree/bindings/display/msm/dsi-controller-main.yaml index 4219936eda5a..1fa28e976559 100644 --- a/Documentation/devicetree/bindings/display/msm/dsi-controller-main.yaml +++ b/Documentation/devicetree/bindings/display/msm/dsi-controller-main.yaml @@ -19,6 +19,7 @@ properties: - qcom,msm8916-dsi-ctrl - qcom,msm8953-dsi-ctrl - qcom,msm8974-dsi-ctrl + - qcom,msm8976-dsi-ctrl - qcom,msm8996-dsi-ctrl - qcom,msm8998-dsi-ctrl - qcom,qcm2290-dsi-ctrl @@ -248,6 +249,7 @@ allOf: contains: enum: - qcom,msm8953-dsi-ctrl + - qcom,msm8976-dsi-ctrl then: properties: clocks: diff --git a/Documentation/devicetree/bindings/display/msm/gmu.yaml b/Documentation/devicetree/bindings/display/msm/gmu.yaml index 4e1c25b42908..b3837368a260 100644 --- a/Documentation/devicetree/bindings/display/msm/gmu.yaml +++ b/Documentation/devicetree/bindings/display/msm/gmu.yaml @@ -224,6 +224,7 @@ allOf: enum: - qcom,adreno-gmu-730.1 - qcom,adreno-gmu-740.1 + - qcom,adreno-gmu-750.1 then: properties: reg: diff --git a/Documentation/devicetree/bindings/display/msm/gpu.yaml b/Documentation/devicetree/bindings/display/msm/gpu.yaml index b019db954793..40b5c6bd11f8 100644 --- a/Documentation/devicetree/bindings/display/msm/gpu.yaml +++ b/Documentation/devicetree/bindings/display/msm/gpu.yaml @@ -23,7 +23,7 @@ properties: The driver is parsing the compat string for Adreno to figure out the gpu-id and patch level. items: - - pattern: '^qcom,adreno-[3-7][0-9][0-9]\.[0-9]$' + - pattern: '^qcom,adreno-[3-7][0-9][0-9]\.[0-9]+$' - const: qcom,adreno - description: | The driver is parsing the compat string for Imageon to @@ -127,7 +127,7 @@ allOf: properties: compatible: contains: - pattern: '^qcom,adreno-[3-5][0-9][0-9]\.[0-9]$' + pattern: '^qcom,adreno-[3-5][0-9][0-9]\.[0-9]+$' then: properties: @@ -203,7 +203,7 @@ allOf: properties: compatible: contains: - pattern: '^qcom,adreno-[67][0-9][0-9]\.[0-9]$' + pattern: '^qcom,adreno-[67][0-9][0-9]\.[0-9]+$' then: # Starting with A6xx, the clocks are usually defined in the GMU node properties: diff --git a/Documentation/devicetree/bindings/display/msm/qcom,mdss.yaml b/Documentation/devicetree/bindings/display/msm/qcom,mdss.yaml index 0999ea07f47b..e4576546bf0d 100644 --- a/Documentation/devicetree/bindings/display/msm/qcom,mdss.yaml +++ b/Documentation/devicetree/bindings/display/msm/qcom,mdss.yaml @@ -127,6 +127,7 @@ patternProperties: - qcom,dsi-phy-20nm - qcom,dsi-phy-28nm-8226 - qcom,dsi-phy-28nm-hpm + - qcom,dsi-phy-28nm-hpm-fam-b - qcom,dsi-phy-28nm-lp - qcom,hdmi-phy-8084 - qcom,hdmi-phy-8660 diff --git a/Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml b/Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml index a01d15a03317..c4087cc5abbd 100644 --- a/Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml +++ b/Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml @@ -13,7 +13,9 @@ $ref: /schemas/display/msm/dpu-common.yaml# properties: compatible: - const: qcom,sm8650-dpu + enum: + - qcom,sm8650-dpu + - qcom,x1e80100-dpu reg: items: diff --git a/Documentation/devicetree/bindings/display/msm/qcom,sm8650-mdss.yaml b/Documentation/devicetree/bindings/display/msm/qcom,sm8650-mdss.yaml index bd11119dc93d..24cece1e888b 100644 --- a/Documentation/devicetree/bindings/display/msm/qcom,sm8650-mdss.yaml +++ b/Documentation/devicetree/bindings/display/msm/qcom,sm8650-mdss.yaml @@ -37,18 +37,21 @@ properties: patternProperties: "^display-controller@[0-9a-f]+$": type: object + additionalProperties: true properties: compatible: const: qcom,sm8650-dpu "^displayport-controller@[0-9a-f]+$": type: object + additionalProperties: true properties: compatible: const: qcom,sm8650-dp "^dsi@[0-9a-f]+$": type: object + additionalProperties: true properties: compatible: items: @@ -57,6 +60,7 @@ patternProperties: "^phy@[0-9a-f]+$": type: object + additionalProperties: true properties: compatible: const: qcom,sm8650-dsi-phy-4nm diff --git a/Documentation/devicetree/bindings/display/msm/qcom,x1e80100-mdss.yaml b/Documentation/devicetree/bindings/display/msm/qcom,x1e80100-mdss.yaml new file mode 100644 index 000000000000..3b01a0e47333 --- /dev/null +++ b/Documentation/devicetree/bindings/display/msm/qcom,x1e80100-mdss.yaml @@ -0,0 +1,251 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/display/msm/qcom,x1e80100-mdss.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm X1E80100 Display MDSS + +maintainers: + - Abel Vesa <abel.vesa@linaro.org> + +description: + X1E80100 MSM Mobile Display Subsystem(MDSS), which encapsulates sub-blocks like + DPU display controller, DP interfaces, etc. + +$ref: /schemas/display/msm/mdss-common.yaml# + +properties: + compatible: + const: qcom,x1e80100-mdss + + clocks: + items: + - description: Display AHB + - description: Display hf AXI + - description: Display core + + iommus: + maxItems: 1 + + interconnects: + maxItems: 3 + + interconnect-names: + maxItems: 3 + +patternProperties: + "^display-controller@[0-9a-f]+$": + type: object + additionalProperties: true + properties: + compatible: + const: qcom,x1e80100-dpu + + "^displayport-controller@[0-9a-f]+$": + type: object + additionalProperties: true + properties: + compatible: + const: qcom,x1e80100-dp + + "^phy@[0-9a-f]+$": + type: object + additionalProperties: true + properties: + compatible: + const: qcom,x1e80100-dp-phy + +required: + - compatible + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/clock/qcom,rpmh.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + #include <dt-bindings/interconnect/qcom,x1e80100-rpmh.h> + #include <dt-bindings/phy/phy-qcom-qmp.h> + #include <dt-bindings/power/qcom,rpmhpd.h> + + display-subsystem@ae00000 { + compatible = "qcom,x1e80100-mdss"; + reg = <0x0ae00000 0x1000>; + reg-names = "mdss"; + + interconnects = <&mmss_noc MASTER_MDP 0 &gem_noc SLAVE_LLCC 0>, + <&mc_virt MASTER_LLCC 0 &mc_virt SLAVE_EBI1 0>, + <&gem_noc MASTER_APPSS_PROC 0 &config_noc SLAVE_DISPLAY_CFG 0>; + interconnect-names = "mdp0-mem", "mdp1-mem", "cpu-cfg"; + + resets = <&dispcc_core_bcr>; + + power-domains = <&dispcc_gdsc>; + + clocks = <&dispcc_ahb_clk>, + <&gcc_disp_hf_axi_clk>, + <&dispcc_mdp_clk>; + clock-names = "bus", "nrt_bus", "core"; + + interrupts = <GIC_SPI 83 IRQ_TYPE_LEVEL_HIGH>; + interrupt-controller; + #interrupt-cells = <1>; + + iommus = <&apps_smmu 0x1c00 0x2>; + + #address-cells = <1>; + #size-cells = <1>; + ranges; + + display-controller@ae01000 { + compatible = "qcom,x1e80100-dpu"; + reg = <0x0ae01000 0x8f000>, + <0x0aeb0000 0x2008>; + reg-names = "mdp", "vbif"; + + clocks = <&gcc_axi_clk>, + <&dispcc_ahb_clk>, + <&dispcc_mdp_lut_clk>, + <&dispcc_mdp_clk>, + <&dispcc_mdp_vsync_clk>; + clock-names = "nrt_bus", + "iface", + "lut", + "core", + "vsync"; + + assigned-clocks = <&dispcc_mdp_vsync_clk>; + assigned-clock-rates = <19200000>; + + operating-points-v2 = <&mdp_opp_table>; + power-domains = <&rpmhpd RPMHPD_MMCX>; + + interrupt-parent = <&mdss>; + interrupts = <0>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + dpu_intf1_out: endpoint { + remote-endpoint = <&dsi0_in>; + }; + }; + + port@1 { + reg = <1>; + dpu_intf2_out: endpoint { + remote-endpoint = <&dsi1_in>; + }; + }; + }; + + mdp_opp_table: opp-table { + compatible = "operating-points-v2"; + + opp-200000000 { + opp-hz = /bits/ 64 <200000000>; + required-opps = <&rpmhpd_opp_low_svs>; + }; + + opp-325000000 { + opp-hz = /bits/ 64 <325000000>; + required-opps = <&rpmhpd_opp_svs>; + }; + + opp-375000000 { + opp-hz = /bits/ 64 <375000000>; + required-opps = <&rpmhpd_opp_svs_l1>; + }; + + opp-514000000 { + opp-hz = /bits/ 64 <514000000>; + required-opps = <&rpmhpd_opp_nom>; + }; + }; + }; + + displayport-controller@ae90000 { + compatible = "qcom,x1e80100-dp"; + reg = <0 0xae90000 0 0x200>, + <0 0xae90200 0 0x200>, + <0 0xae90400 0 0x600>, + <0 0xae91000 0 0x400>, + <0 0xae91400 0 0x400>; + + interrupt-parent = <&mdss>; + interrupts = <12>; + + clocks = <&dispcc_mdss_ahb_clk>, + <&dispcc_dptx0_aux_clk>, + <&dispcc_dptx0_link_clk>, + <&dispcc_dptx0_link_intf_clk>, + <&dispcc_dptx0_pixel0_clk>; + clock-names = "core_iface", "core_aux", + "ctrl_link", + "ctrl_link_iface", + "stream_pixel"; + + assigned-clocks = <&dispcc_mdss_dptx0_link_clk_src>, + <&dispcc_mdss_dptx0_pixel0_clk_src>; + assigned-clock-parents = <&usb_1_ss0_qmpphy QMP_USB43DP_DP_LINK_CLK>, + <&usb_1_ss0_qmpphy QMP_USB43DP_DP_VCO_DIV_CLK>; + + operating-points-v2 = <&mdss_dp0_opp_table>; + + power-domains = <&rpmhpd RPMHPD_MMCX>; + + phys = <&usb_1_ss0_qmpphy QMP_USB43DP_DP_PHY>; + phy-names = "dp"; + + #sound-dai-cells = <0>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + + mdss_dp0_in: endpoint { + remote-endpoint = <&mdss_intf0_out>; + }; + }; + + port@1 { + reg = <1>; + + mdss_dp0_out: endpoint { + }; + }; + }; + + mdss_dp0_opp_table: opp-table { + compatible = "operating-points-v2"; + + opp-160000000 { + opp-hz = /bits/ 64 <160000000>; + required-opps = <&rpmhpd_opp_low_svs>; + }; + + opp-270000000 { + opp-hz = /bits/ 64 <270000000>; + required-opps = <&rpmhpd_opp_svs>; + }; + + opp-540000000 { + opp-hz = /bits/ 64 <540000000>; + required-opps = <&rpmhpd_opp_svs_l1>; + }; + + opp-810000000 { + opp-hz = /bits/ 64 <810000000>; + required-opps = <&rpmhpd_opp_nom>; + }; + }; + }; + }; +... diff --git a/Documentation/devicetree/bindings/display/panel/boe,th101mb31ig002-28a.yaml b/Documentation/devicetree/bindings/display/panel/boe,th101mb31ig002-28a.yaml new file mode 100644 index 000000000000..32df26cbfeed --- /dev/null +++ b/Documentation/devicetree/bindings/display/panel/boe,th101mb31ig002-28a.yaml @@ -0,0 +1,58 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/display/panel/boe,th101mb31ig002-28a.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: BOE TH101MB31IG002-28A WXGA DSI Display Panel + +maintainers: + - Manuel Traut <manut@mecka.net> + +allOf: + - $ref: panel-common.yaml# + +properties: + compatible: + enum: + # BOE TH101MB31IG002-28A 10.1" WXGA TFT LCD panel + - boe,th101mb31ig002-28a + + reg: true + backlight: true + enable-gpios: true + power-supply: true + port: true + rotation: true + +required: + - compatible + - reg + - enable-gpios + - power-supply + +additionalProperties: false + +examples: + - | + #include <dt-bindings/gpio/gpio.h> + + dsi { + #address-cells = <1>; + #size-cells = <0>; + panel@0 { + compatible = "boe,th101mb31ig002-28a"; + reg = <0>; + backlight = <&backlight_lcd0>; + enable-gpios = <&gpio 45 GPIO_ACTIVE_HIGH>; + rotation = <90>; + power-supply = <&vcc_3v3>; + port { + panel_in_dsi: endpoint { + remote-endpoint = <&dsi_out_con>; + }; + }; + }; + }; + +... diff --git a/Documentation/devicetree/bindings/display/panel/himax,hx83112a.yaml b/Documentation/devicetree/bindings/display/panel/himax,hx83112a.yaml new file mode 100644 index 000000000000..174661d13811 --- /dev/null +++ b/Documentation/devicetree/bindings/display/panel/himax,hx83112a.yaml @@ -0,0 +1,74 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/display/panel/himax,hx83112a.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Himax HX83112A-based DSI display panels + +maintainers: + - Luca Weiss <luca.weiss@fairphone.com> + +description: + The Himax HX83112A is a generic DSI Panel IC used to control + LCD panels. + +allOf: + - $ref: panel-common.yaml# + +properties: + compatible: + contains: + const: djn,9a-3r063-1102b + + vdd1-supply: + description: Digital voltage rail + + vsn-supply: + description: Positive source voltage rail + + vsp-supply: + description: Negative source voltage rail + + reg: true + port: true + +required: + - compatible + - reg + - reset-gpios + - vdd1-supply + - vsn-supply + - vsp-supply + - port + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/gpio/gpio.h> + + dsi { + #address-cells = <1>; + #size-cells = <0>; + + panel@0 { + compatible = "djn,9a-3r063-1102b"; + reg = <0>; + + backlight = <&pm6150l_wled>; + reset-gpios = <&pm6150l_gpios 9 GPIO_ACTIVE_LOW>; + + vdd1-supply = <&vreg_l1e>; + vsn-supply = <&pm6150l_lcdb_ncp>; + vsp-supply = <&pm6150l_lcdb_ldo>; + + port { + panel_in_0: endpoint { + remote-endpoint = <&dsi0_out>; + }; + }; + }; + }; + +... diff --git a/Documentation/devicetree/bindings/display/panel/leadtek,ltk500hd1829.yaml b/Documentation/devicetree/bindings/display/panel/leadtek,ltk500hd1829.yaml index c5944b4d636c..d589f1677214 100644 --- a/Documentation/devicetree/bindings/display/panel/leadtek,ltk500hd1829.yaml +++ b/Documentation/devicetree/bindings/display/panel/leadtek,ltk500hd1829.yaml @@ -14,7 +14,9 @@ allOf: properties: compatible: - const: leadtek,ltk500hd1829 + enum: + - leadtek,ltk101b4029w + - leadtek,ltk500hd1829 reg: true backlight: true reset-gpios: true diff --git a/Documentation/devicetree/bindings/display/panel/novatek,nt35510.yaml b/Documentation/devicetree/bindings/display/panel/novatek,nt35510.yaml index bc92928c805b..91921f4b0e5f 100644 --- a/Documentation/devicetree/bindings/display/panel/novatek,nt35510.yaml +++ b/Documentation/devicetree/bindings/display/panel/novatek,nt35510.yaml @@ -15,7 +15,9 @@ allOf: properties: compatible: items: - - const: hydis,hva40wv1 + - enum: + - frida,frd400b25025 + - hydis,hva40wv1 - const: novatek,nt35510 description: This indicates the panel manufacturer of the panel that is in turn using the NT35510 panel driver. The compatible @@ -29,6 +31,7 @@ properties: vddi-supply: description: regulator that supplies the vddi voltage backlight: true + port: true required: - compatible diff --git a/Documentation/devicetree/bindings/display/panel/novatek,nt36672e.yaml b/Documentation/devicetree/bindings/display/panel/novatek,nt36672e.yaml new file mode 100644 index 000000000000..dc4672f3d01d --- /dev/null +++ b/Documentation/devicetree/bindings/display/panel/novatek,nt36672e.yaml @@ -0,0 +1,66 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/display/panel/novatek,nt36672e.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Novatek NT36672E LCD DSI Panel + +maintainers: + - Ritesh Kumar <quic_riteshk@quicinc.com> + +allOf: + - $ref: panel-common.yaml# + +properties: + compatible: + const: novatek,nt36672e + + reg: + maxItems: 1 + description: DSI virtual channel + + vddi-supply: true + avdd-supply: true + avee-supply: true + port: true + reset-gpios: true + backlight: true + +required: + - compatible + - reg + - vddi-supply + - avdd-supply + - avee-supply + - reset-gpios + - port + +additionalProperties: false + +examples: + - | + #include <dt-bindings/gpio/gpio.h> + dsi { + #address-cells = <1>; + #size-cells = <0>; + panel@0 { + compatible = "novatek,nt36672e"; + reg = <0>; + + reset-gpios = <&tlmm 44 GPIO_ACTIVE_HIGH>; + + vddi-supply = <&vreg_l8c_1p8>; + avdd-supply = <&disp_avdd>; + avee-supply = <&disp_avee>; + + backlight = <&pwm_backlight>; + + port { + panel0_in: endpoint { + remote-endpoint = <&dsi0_out>; + }; + }; + }; + }; +... diff --git a/Documentation/devicetree/bindings/display/panel/panel-lvds.yaml b/Documentation/devicetree/bindings/display/panel/panel-lvds.yaml index 9f1016551e0b..155d8ffa8f6e 100644 --- a/Documentation/devicetree/bindings/display/panel/panel-lvds.yaml +++ b/Documentation/devicetree/bindings/display/panel/panel-lvds.yaml @@ -39,9 +39,13 @@ properties: compatible: items: - enum: + # Admatec 9904379 10.1" 1024x600 LVDS panel + - admatec,9904379 - auo,b101ew05 # Chunghwa Picture Tubes Ltd. 7" WXGA (800x1280) TFT LCD LVDS panel - chunghwa,claa070wp03xg + # EDT ETML0700Z9NDHA 7.0" WSVGA (1024x600) color TFT LCD LVDS panel + - edt,etml0700z9ndha # HannStar Display Corp. HSD101PWW2 10.1" WXGA (1280x800) LVDS panel - hannstar,hsd101pww2 # Hydis Technologies 7" WXGA (800x1280) TFT LCD LVDS panel diff --git a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml index 634a10c6f2dd..a95445f40870 100644 --- a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml +++ b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml @@ -73,6 +73,8 @@ properties: - auo,t215hvn01 # Shanghai AVIC Optoelectronics 7" 1024x600 color TFT-LCD panel - avic,tm070ddh03 + # BOE BP082WX1-100 8.2" WXGA (1280x800) LVDS panel + - boe,bp082wx1-100 # BOE BP101WX1-100 10.1" WXGA (1280x800) LVDS panel - boe,bp101wx1-100 # BOE EV121WXM-N10-1850 12.1" WXGA (1280x800) TFT LCD panel @@ -141,6 +143,8 @@ properties: - edt,etm0700g0edh6 # Emerging Display Technology Corp. LVDS WSVGA TFT Display with capacitive touch - edt,etml0700y5dha + # Emerging Display Technology Corp. 10.1" LVDS WXGA TFT Display with capacitive touch + - edt,etml1010g3dra # Emerging Display Technology Corp. 5.7" VGA TFT LCD panel with # capacitive touch - edt,etmv570g2dhu diff --git a/Documentation/devicetree/bindings/display/panel/rocktech,jh057n00900.yaml b/Documentation/devicetree/bindings/display/panel/rocktech,jh057n00900.yaml index 97cccd8a8479..6ec471284f97 100644 --- a/Documentation/devicetree/bindings/display/panel/rocktech,jh057n00900.yaml +++ b/Documentation/devicetree/bindings/display/panel/rocktech,jh057n00900.yaml @@ -22,6 +22,8 @@ properties: enum: # Anberic RG353V-V2 5.0" 640x480 TFT LCD panel - anbernic,rg353v-panel-v2 + # Powkiddy RGB10MAX3 5.0" 720x1280 TFT LCD panel + - powkiddy,rgb10max3-panel # Powkiddy RGB30 3.0" 720x720 TFT LCD panel - powkiddy,rgb30-panel # Rocktech JH057N00900 5.5" 720x1440 TFT LCD panel @@ -43,6 +45,7 @@ properties: reset-gpios: true backlight: true + rotation: true required: - compatible diff --git a/Documentation/devicetree/bindings/display/panel/visionox,rm69299.yaml b/Documentation/devicetree/bindings/display/panel/visionox,rm69299.yaml index fa745a6f4456..772399067515 100644 --- a/Documentation/devicetree/bindings/display/panel/visionox,rm69299.yaml +++ b/Documentation/devicetree/bindings/display/panel/visionox,rm69299.yaml @@ -7,7 +7,8 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# title: Visionox model RM69299 Panels maintainers: - - Harigovindan P <harigovi@codeaurora.org> + - Abhinav Kumar <quic_abhinavk@quicinc.com> + - Jessica Zhang <quic_jesszhan@quicinc.com> description: | This binding is for display panels using a Visionox RM692999 panel. diff --git a/Documentation/devicetree/bindings/display/renesas,rzg2l-du.yaml b/Documentation/devicetree/bindings/display/renesas,rzg2l-du.yaml new file mode 100644 index 000000000000..08e5b9478051 --- /dev/null +++ b/Documentation/devicetree/bindings/display/renesas,rzg2l-du.yaml @@ -0,0 +1,126 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/display/renesas,rzg2l-du.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Renesas RZ/G2L Display Unit (DU) + +maintainers: + - Biju Das <biju.das.jz@bp.renesas.com> + - Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> + +description: | + These DT bindings describe the Display Unit embedded in the Renesas RZ/G2L + and RZ/V2L SoCs. + +properties: + compatible: + oneOf: + - enum: + - renesas,r9a07g044-du # RZ/G2{L,LC} + - items: + - enum: + - renesas,r9a07g054-du # RZ/V2L + - const: renesas,r9a07g044-du # RZ/G2L fallback + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + items: + - description: Main clock + - description: Register access clock + - description: Video clock + + clock-names: + items: + - const: aclk + - const: pclk + - const: vclk + + resets: + maxItems: 1 + + power-domains: + maxItems: 1 + + ports: + $ref: /schemas/graph.yaml#/properties/ports + description: | + The connections to the DU output video ports are modeled using the OF + graph bindings. The number of ports and their assignment are + model-dependent. Each port shall have a single endpoint. + + patternProperties: + "^port@[0-1]$": + $ref: /schemas/graph.yaml#/properties/port + unevaluatedProperties: false + + required: + - port@0 + + unevaluatedProperties: false + + renesas,vsps: + $ref: /schemas/types.yaml#/definitions/phandle-array + items: + items: + - description: phandle to VSP instance that serves the DU channel + - description: Channel index identifying the LIF instance in that VSP + description: + A list of phandle and channel index tuples to the VSPs that handle the + memory interfaces for the DU channels. + +required: + - compatible + - reg + - interrupts + - clocks + - clock-names + - resets + - power-domains + - ports + - renesas,vsps + +additionalProperties: false + +examples: + # RZ/G2L DU + - | + #include <dt-bindings/clock/r9a07g044-cpg.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + + display@10890000 { + compatible = "renesas,r9a07g044-du"; + reg = <0x10890000 0x10000>; + interrupts = <GIC_SPI 152 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&cpg CPG_MOD R9A07G044_LCDC_CLK_A>, + <&cpg CPG_MOD R9A07G044_LCDC_CLK_P>, + <&cpg CPG_MOD R9A07G044_LCDC_CLK_D>; + clock-names = "aclk", "pclk", "vclk"; + resets = <&cpg R9A07G044_LCDC_RESET_N>; + power-domains = <&cpg>; + + renesas,vsps = <&vspd0 0>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + endpoint { + remote-endpoint = <&dsi0_in>; + }; + }; + port@1 { + reg = <1>; + }; + }; + }; + +... diff --git a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml index 7e59dee15a5f..af638b6c0d21 100644 --- a/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml +++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,dw-hdmi.yaml @@ -94,11 +94,14 @@ properties: - const: default - const: unwedge + power-domains: + maxItems: 1 + ports: $ref: /schemas/graph.yaml#/properties/ports - patternProperties: - "^port(@0)?$": + properties: + port@0: $ref: /schemas/graph.yaml#/properties/port description: Input of the DWC HDMI TX properties: @@ -108,11 +111,14 @@ properties: description: Connection to the VOPB endpoint@1: description: Connection to the VOPL - properties: port@1: $ref: /schemas/graph.yaml#/properties/port description: Output of the DWC HDMI TX + required: + - port@0 + - port@1 + rockchip,grf: $ref: /schemas/types.yaml#/definitions/phandle description: @@ -135,19 +141,25 @@ examples: #include <dt-bindings/clock/rk3288-cru.h> #include <dt-bindings/interrupt-controller/arm-gic.h> #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/power/rk3288-power.h> hdmi: hdmi@ff980000 { compatible = "rockchip,rk3288-dw-hdmi"; reg = <0xff980000 0x20000>; reg-io-width = <4>; - ddc-i2c-bus = <&i2c5>; - rockchip,grf = <&grf>; interrupts = <GIC_SPI 103 IRQ_TYPE_LEVEL_HIGH>; clocks = <&cru PCLK_HDMI_CTRL>, <&cru SCLK_HDMI_HDCP>; clock-names = "iahb", "isfr"; + ddc-i2c-bus = <&i2c5>; + power-domains = <&power RK3288_PD_VIO>; + rockchip,grf = <&grf>; ports { - port { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; #address-cells = <1>; #size-cells = <0>; @@ -155,11 +167,20 @@ examples: reg = <0>; remote-endpoint = <&vopb_out_hdmi>; }; + hdmi_in_vopl: endpoint@1 { reg = <1>; remote-endpoint = <&vopl_out_hdmi>; }; }; + + port@1 { + reg = <1>; + + hdmi_out_con: endpoint { + remote-endpoint = <&hdmi_con_in>; + }; + }; }; }; diff --git a/Documentation/devicetree/bindings/display/samsung/samsung,exynos-mixer.yaml b/Documentation/devicetree/bindings/display/samsung/samsung,exynos-mixer.yaml index 25d53fde92e1..597c9cc6a312 100644 --- a/Documentation/devicetree/bindings/display/samsung/samsung,exynos-mixer.yaml +++ b/Documentation/devicetree/bindings/display/samsung/samsung,exynos-mixer.yaml @@ -85,7 +85,7 @@ allOf: clocks: minItems: 6 maxItems: 6 - regs: + reg: minItems: 2 maxItems: 2 @@ -99,7 +99,7 @@ allOf: clocks: minItems: 4 maxItems: 4 - regs: + reg: minItems: 2 maxItems: 2 @@ -116,7 +116,7 @@ allOf: clocks: minItems: 3 maxItems: 3 - regs: + reg: minItems: 1 maxItems: 1 diff --git a/Documentation/devicetree/bindings/display/solomon,ssd1307fb.yaml b/Documentation/devicetree/bindings/display/solomon,ssd1307fb.yaml index 3afbb52d1b7f..153ff86fb405 100644 --- a/Documentation/devicetree/bindings/display/solomon,ssd1307fb.yaml +++ b/Documentation/devicetree/bindings/display/solomon,ssd1307fb.yaml @@ -131,9 +131,9 @@ allOf: const: sinowealth,sh1106 then: properties: - width: + solomon,width: default: 132 - height: + solomon,height: default: 64 solomon,dclk-div: default: 1 @@ -149,9 +149,9 @@ allOf: - solomon,ssd1305 then: properties: - width: + solomon,width: default: 132 - height: + solomon,height: default: 64 solomon,dclk-div: default: 1 @@ -167,9 +167,9 @@ allOf: - solomon,ssd1306 then: properties: - width: + solomon,width: default: 128 - height: + solomon,height: default: 64 solomon,dclk-div: default: 1 @@ -185,9 +185,9 @@ allOf: - solomon,ssd1307 then: properties: - width: + solomon,width: default: 128 - height: + solomon,height: default: 39 solomon,dclk-div: default: 2 @@ -205,9 +205,9 @@ allOf: - solomon,ssd1309 then: properties: - width: + solomon,width: default: 128 - height: + solomon,height: default: 64 solomon,dclk-div: default: 1 diff --git a/Documentation/devicetree/bindings/display/solomon,ssd132x.yaml b/Documentation/devicetree/bindings/display/solomon,ssd132x.yaml index 37975ee61c5a..dd7939989cf4 100644 --- a/Documentation/devicetree/bindings/display/solomon,ssd132x.yaml +++ b/Documentation/devicetree/bindings/display/solomon,ssd132x.yaml @@ -30,9 +30,9 @@ allOf: const: solomon,ssd1322 then: properties: - width: + solomon,width: default: 480 - height: + solomon,height: default: 128 - if: @@ -42,9 +42,9 @@ allOf: const: solomon,ssd1325 then: properties: - width: + solomon,width: default: 128 - height: + solomon,height: default: 80 - if: @@ -54,9 +54,9 @@ allOf: const: solomon,ssd1327 then: properties: - width: + solomon,width: default: 128 - height: + solomon,height: default: 128 unevaluatedProperties: false diff --git a/Documentation/devicetree/bindings/display/solomon,ssd133x.yaml b/Documentation/devicetree/bindings/display/solomon,ssd133x.yaml new file mode 100644 index 000000000000..b7780038a34b --- /dev/null +++ b/Documentation/devicetree/bindings/display/solomon,ssd133x.yaml @@ -0,0 +1,45 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/display/solomon,ssd133x.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Solomon SSD133x OLED Display Controllers + +maintainers: + - Javier Martinez Canillas <javierm@redhat.com> + +allOf: + - $ref: solomon,ssd-common.yaml# + +properties: + compatible: + enum: + - solomon,ssd1331 + + solomon,width: + default: 96 + + solomon,height: + default: 64 + +required: + - compatible + - reg + +unevaluatedProperties: false + +examples: + - | + spi { + #address-cells = <1>; + #size-cells = <0>; + + oled@0 { + compatible = "solomon,ssd1331"; + reg = <0x0>; + reset-gpios = <&gpio2 7>; + dc-gpios = <&gpio2 8>; + spi-max-frequency = <10000000>; + }; + }; diff --git a/Documentation/devicetree/bindings/display/ti/ti,am65x-dss.yaml b/Documentation/devicetree/bindings/display/ti/ti,am65x-dss.yaml index b6767ef0d24d..55e3e490d0e6 100644 --- a/Documentation/devicetree/bindings/display/ti/ti,am65x-dss.yaml +++ b/Documentation/devicetree/bindings/display/ti/ti,am65x-dss.yaml @@ -37,6 +37,7 @@ properties: - description: OVR2 overlay manager for vp2 - description: VP1 video port 1 - description: VP2 video port 2 + - description: common1 DSS register area reg-names: items: @@ -47,6 +48,7 @@ properties: - const: ovr2 - const: vp1 - const: vp2 + - const: common1 clocks: items: @@ -147,9 +149,10 @@ examples: <0x04a07000 0x1000>, /* ovr1 */ <0x04a08000 0x1000>, /* ovr2 */ <0x04a0a000 0x1000>, /* vp1 */ - <0x04a0b000 0x1000>; /* vp2 */ + <0x04a0b000 0x1000>, /* vp2 */ + <0x04a01000 0x1000>; /* common1 */ reg-names = "common", "vidl1", "vid", - "ovr1", "ovr2", "vp1", "vp2"; + "ovr1", "ovr2", "vp1", "vp2", "common1"; ti,am65x-oldi-io-ctrl = <&dss_oldi_io_ctrl>; power-domains = <&k3_pds 67 TI_SCI_PD_EXCLUSIVE>; clocks = <&k3_clks 67 1>, diff --git a/Documentation/devicetree/bindings/firmware/xilinx/xlnx,zynqmp-firmware.yaml b/Documentation/devicetree/bindings/firmware/xilinx/xlnx,zynqmp-firmware.yaml index 8e584857ddd4..ab8f32c440df 100644 --- a/Documentation/devicetree/bindings/firmware/xilinx/xlnx,zynqmp-firmware.yaml +++ b/Documentation/devicetree/bindings/firmware/xilinx/xlnx,zynqmp-firmware.yaml @@ -26,6 +26,12 @@ properties: - description: For implementations complying for Versal. const: xlnx,versal-firmware + - description: For implementations complying for Versal NET. + items: + - enum: + - xlnx,versal-net-firmware + - const: xlnx,versal-firmware + method: description: | The method of calling the PM-API firmware layer. @@ -41,7 +47,53 @@ properties: "#power-domain-cells": const: 1 - versal_fpga: + clock-controller: + $ref: /schemas/clock/xlnx,versal-clk.yaml# + description: The clock controller is a hardware block of Xilinx versal + clock tree. It reads required input clock frequencies from the devicetree + and acts as clock provider for all clock consumers of PS clocks.list of + clock specifiers which are external input clocks to the given clock + controller. + type: object + + gpio: + $ref: /schemas/gpio/xlnx,zynqmp-gpio-modepin.yaml# + description: The gpio node describes connect to PS_MODE pins via firmware + interface. + type: object + + soc-nvmem: + $ref: /schemas/nvmem/xlnx,zynqmp-nvmem.yaml# + description: The ZynqMP MPSoC provides access to the hardware related data + like SOC revision, IDCODE and specific purpose efuses. + type: object + + pcap: + $ref: /schemas/fpga/xlnx,zynqmp-pcap-fpga.yaml + description: The ZynqMP SoC uses the PCAP (Processor Configuration Port) to + configure the Programmable Logic (PL). The configuration uses the + firmware interface. + type: object + + pinctrl: + $ref: /schemas/pinctrl/xlnx,zynqmp-pinctrl.yaml# + description: The pinctrl node provides access to pinconfig and pincontrol + functionality available in firmware. + type: object + + power-management: + $ref: /schemas/power/reset/xlnx,zynqmp-power.yaml# + description: The zynqmp-power node describes the power management + configurations. It will control remote suspend/shutdown interfaces. + type: object + + reset-controller: + $ref: /schemas/reset/xlnx,zynqmp-reset.yaml# + description: The reset-controller node describes connection to the reset + functionality via firmware interface. + type: object + + versal-fpga: $ref: /schemas/fpga/xlnx,versal-fpga.yaml# description: Compatible of the FPGA device. type: object @@ -53,15 +105,6 @@ properties: vector. type: object - clock-controller: - $ref: /schemas/clock/xlnx,versal-clk.yaml# - description: The clock controller is a hardware block of Xilinx versal - clock tree. It reads required input clock frequencies from the devicetree - and acts as clock provider for all clock consumers of PS clocks.list of - clock specifiers which are external input clocks to the given clock - controller. - type: object - required: - compatible @@ -73,7 +116,38 @@ examples: firmware { zynqmp_firmware: zynqmp-firmware { #power-domain-cells = <1>; + soc-nvmem { + compatible = "xlnx,zynqmp-nvmem-fw"; + nvmem-layout { + compatible = "fixed-layout"; + #address-cells = <1>; + #size-cells = <1>; + + soc_revision: soc-revision@0 { + reg = <0x0 0x4>; + }; + }; + }; + gpio { + compatible = "xlnx,zynqmp-gpio-modepin"; + gpio-controller; + #gpio-cells = <2>; + }; + pcap { + compatible = "xlnx,zynqmp-pcap-fpga"; }; + pinctrl { + compatible = "xlnx,zynqmp-pinctrl"; + }; + power-management { + compatible = "xlnx,zynqmp-power"; + interrupts = <0 35 4>; + }; + reset-controller { + compatible = "xlnx,zynqmp-reset"; + #reset-cells = <1>; + }; + }; }; sata { @@ -84,7 +158,7 @@ examples: compatible = "xlnx,versal-firmware"; method = "smc"; - versal_fpga: versal_fpga { + versal_fpga: versal-fpga { compatible = "xlnx,versal-fpga"; }; diff --git a/Documentation/devicetree/bindings/fpga/xlnx,versal-fpga.yaml b/Documentation/devicetree/bindings/fpga/xlnx,versal-fpga.yaml index 26f18834caa3..80833462f620 100644 --- a/Documentation/devicetree/bindings/fpga/xlnx,versal-fpga.yaml +++ b/Documentation/devicetree/bindings/fpga/xlnx,versal-fpga.yaml @@ -26,7 +26,7 @@ additionalProperties: false examples: - | - versal_fpga: versal_fpga { + versal_fpga: versal-fpga { compatible = "xlnx,versal-fpga"; }; diff --git a/Documentation/devicetree/bindings/gpio/aspeed,ast2400-gpio.yaml b/Documentation/devicetree/bindings/gpio/aspeed,ast2400-gpio.yaml new file mode 100644 index 000000000000..cf11aa7ec8c7 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/aspeed,ast2400-gpio.yaml @@ -0,0 +1,148 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpio/aspeed,ast2400-gpio.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Aspeed GPIO controller + +maintainers: + - Andrew Jeffery <andrew@codeconstruct.com.au> + +properties: + compatible: + enum: + - aspeed,ast2400-gpio + - aspeed,ast2500-gpio + - aspeed,ast2600-gpio + + reg: + maxItems: 1 + + clocks: + maxItems: 1 + description: The clock to use for debounce timings + + gpio-controller: true + gpio-line-names: + minItems: 36 + maxItems: 232 + + gpio-ranges: true + + "#gpio-cells": + const: 2 + + interrupts: + maxItems: 1 + + interrupt-controller: true + + "#interrupt-cells": + const: 2 + + ngpios: + minimum: 36 + maximum: 232 + +required: + - compatible + - reg + - interrupts + - interrupt-controller + - "#interrupt-cells" + - gpio-controller + - "#gpio-cells" + +allOf: + - if: + properties: + compatible: + contains: + const: aspeed,ast2400-gpio + then: + properties: + gpio-line-names: + minItems: 220 + maxItems: 220 + ngpios: + const: 220 + - if: + properties: + compatible: + contains: + const: aspeed,ast2500-gpio + then: + properties: + gpio-line-names: + minItems: 232 + maxItems: 232 + ngpios: + const: 232 + - if: + properties: + compatible: + contains: + const: aspeed,ast2600-gpio + then: + properties: + gpio-line-names: + minItems: 36 + maxItems: 208 + ngpios: + enum: [ 36, 208 ] + required: + - ngpios + +additionalProperties: false + +examples: + - | + gpio@1e780000 { + compatible = "aspeed,ast2400-gpio"; + reg = <0x1e780000 0x1000>; + interrupts = <20>; + interrupt-controller; + #interrupt-cells = <2>; + gpio-controller; + #gpio-cells = <2>; + }; + - | + gpio: gpio@1e780000 { + compatible = "aspeed,ast2500-gpio"; + reg = <0x1e780000 0x200>; + interrupts = <20>; + interrupt-controller; + #interrupt-cells = <2>; + gpio-controller; + #gpio-cells = <2>; + gpio-ranges = <&pinctrl 0 0 232>; + }; + - | + #include <dt-bindings/clock/ast2600-clock.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + #include <dt-bindings/interrupt-controller/irq.h> + gpio0: gpio@1e780000 { + compatible = "aspeed,ast2600-gpio"; + reg = <0x1e780000 0x400>; + clocks = <&syscon ASPEED_CLK_APB2>; + interrupts = <GIC_SPI 40 IRQ_TYPE_LEVEL_HIGH>; + interrupt-controller; + #interrupt-cells = <2>; + #gpio-cells = <2>; + gpio-controller; + gpio-ranges = <&pinctrl 0 0 208>; + ngpios = <208>; + }; + gpio1: gpio@1e780800 { + compatible = "aspeed,ast2600-gpio"; + reg = <0x1e780800 0x800>; + clocks = <&syscon ASPEED_CLK_APB1>; + interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>; + interrupt-controller; + #interrupt-cells = <2>; + gpio-controller; + #gpio-cells = <2>; + gpio-ranges = <&pinctrl 0 208 36>; + ngpios = <36>; + }; diff --git a/Documentation/devicetree/bindings/gpio/gpio-aspeed.txt b/Documentation/devicetree/bindings/gpio/gpio-aspeed.txt deleted file mode 100644 index b2033fc3a71a..000000000000 --- a/Documentation/devicetree/bindings/gpio/gpio-aspeed.txt +++ /dev/null @@ -1,39 +0,0 @@ -Aspeed GPIO controller Device Tree Bindings -------------------------------------------- - -Required properties: -- compatible : Either "aspeed,ast2400-gpio", "aspeed,ast2500-gpio", - or "aspeed,ast2600-gpio". - -- #gpio-cells : Should be two - - First cell is the GPIO line number - - Second cell is used to specify optional - parameters (unused) - -- reg : Address and length of the register set for the device -- gpio-controller : Marks the device node as a GPIO controller. -- interrupts : Interrupt specifier (see interrupt bindings for - details) -- interrupt-controller : Mark the GPIO controller as an interrupt-controller - -Optional properties: - -- clocks : A phandle to the clock to use for debounce timings -- ngpios : Number of GPIOs controlled by this controller. Should be set - when there are multiple GPIO controllers on a SoC (ast2600). - -The gpio and interrupt properties are further described in their respective -bindings documentation: - -- Documentation/devicetree/bindings/gpio/gpio.txt -- Documentation/devicetree/bindings/interrupt-controller/interrupts.txt - - Example: - gpio@1e780000 { - #gpio-cells = <2>; - compatible = "aspeed,ast2400-gpio"; - gpio-controller; - interrupts = <20>; - reg = <0x1e780000 0x1000>; - interrupt-controller; - }; diff --git a/Documentation/devicetree/bindings/gpio/gpio-mvebu.yaml b/Documentation/devicetree/bindings/gpio/gpio-mvebu.yaml index f1bd1e6b2e1f..33d4e4716516 100644 --- a/Documentation/devicetree/bindings/gpio/gpio-mvebu.yaml +++ b/Documentation/devicetree/bindings/gpio/gpio-mvebu.yaml @@ -115,7 +115,7 @@ allOf: required: - reg -unevaluatedProperties: true +unevaluatedProperties: false examples: - | diff --git a/Documentation/devicetree/bindings/gpio/gpio-pca9570.yaml b/Documentation/devicetree/bindings/gpio/gpio-pca9570.yaml index 452f8972a965..6f73961001b7 100644 --- a/Documentation/devicetree/bindings/gpio/gpio-pca9570.yaml +++ b/Documentation/devicetree/bindings/gpio/gpio-pca9570.yaml @@ -28,6 +28,9 @@ properties: minItems: 4 maxItems: 8 + label: + description: A descriptive name for this device. + required: - compatible - reg diff --git a/Documentation/devicetree/bindings/gpio/renesas,rcar-gpio.yaml b/Documentation/devicetree/bindings/gpio/renesas,rcar-gpio.yaml index aa424e2b95f8..cc7a950a6030 100644 --- a/Documentation/devicetree/bindings/gpio/renesas,rcar-gpio.yaml +++ b/Documentation/devicetree/bindings/gpio/renesas,rcar-gpio.yaml @@ -53,6 +53,7 @@ properties: - renesas,gpio-r8a779a0 # R-Car V3U - renesas,gpio-r8a779f0 # R-Car S4-8 - renesas,gpio-r8a779g0 # R-Car V4H + - renesas,gpio-r8a779h0 # R-Car V4M - const: renesas,rcar-gen4-gpio # R-Car Gen4 reg: diff --git a/Documentation/devicetree/bindings/gpio/xlnx,zynqmp-gpio-modepin.yaml b/Documentation/devicetree/bindings/gpio/xlnx,zynqmp-gpio-modepin.yaml index b1fd632718d4..bb93baa88879 100644 --- a/Documentation/devicetree/bindings/gpio/xlnx,zynqmp-gpio-modepin.yaml +++ b/Documentation/devicetree/bindings/gpio/xlnx,zynqmp-gpio-modepin.yaml @@ -12,7 +12,8 @@ description: PS_MODE). Every pin can be configured as input/output. maintainers: - - Piyush Mehta <piyush.mehta@amd.com> + - Mubin Sayyed <mubin.sayyed@amd.com> + - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> properties: compatible: diff --git a/Documentation/devicetree/bindings/gpu/img,powervr.yaml b/Documentation/devicetree/bindings/gpu/img,powervr-rogue.yaml index a13298f1a182..256e252f8087 100644 --- a/Documentation/devicetree/bindings/gpu/img,powervr.yaml +++ b/Documentation/devicetree/bindings/gpu/img,powervr-rogue.yaml @@ -2,10 +2,10 @@ # Copyright (c) 2023 Imagination Technologies Ltd. %YAML 1.2 --- -$id: http://devicetree.org/schemas/gpu/img,powervr.yaml# +$id: http://devicetree.org/schemas/gpu/img,powervr-rogue.yaml# $schema: http://devicetree.org/meta-schemas/core.yaml# -title: Imagination Technologies PowerVR and IMG GPU +title: Imagination Technologies PowerVR and IMG Rogue GPUs maintainers: - Frank Binns <frank.binns@imgtec.com> diff --git a/Documentation/devicetree/bindings/gpu/img,powervr-sgx.yaml b/Documentation/devicetree/bindings/gpu/img,powervr-sgx.yaml new file mode 100644 index 000000000000..f5898b04381c --- /dev/null +++ b/Documentation/devicetree/bindings/gpu/img,powervr-sgx.yaml @@ -0,0 +1,138 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +# Copyright (c) 2023 Imagination Technologies Ltd. +# Copyright (C) 2024 Texas Instruments Incorporated - https://www.ti.com/ +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpu/img,powervr-sgx.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Imagination Technologies PowerVR SGX GPUs + +maintainers: + - Frank Binns <frank.binns@imgtec.com> + +properties: + compatible: + oneOf: + - items: + - enum: + - ti,omap3430-gpu # Rev 121 + - ti,omap3630-gpu # Rev 125 + - const: img,powervr-sgx530 + - items: + - enum: + - ingenic,jz4780-gpu # Rev 130 + - ti,omap4430-gpu # Rev 120 + - const: img,powervr-sgx540 + - items: + - enum: + - allwinner,sun6i-a31-gpu # MP2 Rev 115 + - ti,omap4470-gpu # MP1 Rev 112 + - ti,omap5432-gpu # MP2 Rev 105 + - ti,am5728-gpu # MP2 Rev 116 + - ti,am6548-gpu # MP1 Rev 117 + - const: img,powervr-sgx544 + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + minItems: 1 + maxItems: 3 + + clock-names: + minItems: 1 + items: + - const: core + - const: mem + - const: sys + + power-domains: + maxItems: 1 + +required: + - compatible + - reg + - interrupts + +allOf: + - if: + properties: + compatible: + contains: + const: ti,am6548-gpu + then: + required: + - power-domains + else: + properties: + power-domains: false + - if: + properties: + compatible: + contains: + enum: + - allwinner,sun6i-a31-gpu + - ingenic,jz4780-gpu + then: + required: + - clocks + - clock-names + else: + properties: + clocks: false + clock-names: false + - if: + properties: + compatible: + contains: + const: allwinner,sun6i-a31-gpu + then: + properties: + clocks: + minItems: 2 + maxItems: 2 + clock-names: + minItems: 2 + maxItems: 2 + - if: + properties: + compatible: + contains: + const: ingenic,jz4780-gpu + then: + properties: + clocks: + maxItems: 1 + clock-names: + maxItems: 1 + +additionalProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + #include <dt-bindings/soc/ti,sci_pm_domain.h> + + gpu@7000000 { + compatible = "ti,am6548-gpu", "img,powervr-sgx544"; + reg = <0x7000000 0x10000>; + interrupts = <GIC_SPI 162 IRQ_TYPE_LEVEL_HIGH>; + power-domains = <&k3_pds 65 TI_SCI_PD_EXCLUSIVE>; + }; + + - | + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + + gpu: gpu@1c40000 { + compatible = "allwinner,sun6i-a31-gpu", "img,powervr-sgx544"; + reg = <0x01c40000 0x10000>; + interrupts = <GIC_SPI 97 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&ccu 1>, <&ccu 2>; + clock-names = "core", "mem"; + }; diff --git a/Documentation/devicetree/bindings/hwmon/adi,adm1177.yaml b/Documentation/devicetree/bindings/hwmon/adi,adm1177.yaml index 2e45364d0543..be7e9e91a3a8 100644 --- a/Documentation/devicetree/bindings/hwmon/adi,adm1177.yaml +++ b/Documentation/devicetree/bindings/hwmon/adi,adm1177.yaml @@ -46,7 +46,10 @@ required: - compatible - reg -additionalProperties: false +allOf: + - $ref: hwmon-common.yaml# + +unevaluatedProperties: false examples: - | diff --git a/Documentation/devicetree/bindings/hwmon/adi,adm1275.yaml b/Documentation/devicetree/bindings/hwmon/adi,adm1275.yaml index ab87f51c5aef..b68061294964 100644 --- a/Documentation/devicetree/bindings/hwmon/adi,adm1275.yaml +++ b/Documentation/devicetree/bindings/hwmon/adi,adm1275.yaml @@ -33,10 +33,6 @@ properties: reg: maxItems: 1 - shunt-resistor-micro-ohms: - description: - Shunt resistor value in micro-Ohm. - adi,volt-curr-sample-average: description: | Number of samples to be used to report voltage and current values. @@ -50,6 +46,7 @@ properties: enum: [1, 2, 4, 8, 16, 32, 64, 128] allOf: + - $ref: hwmon-common.yaml# - if: properties: compatible: @@ -107,7 +104,7 @@ required: - compatible - reg -additionalProperties: false +unevaluatedProperties: false examples: - | diff --git a/Documentation/devicetree/bindings/hwmon/adi,ltc2945.yaml b/Documentation/devicetree/bindings/hwmon/adi,ltc2945.yaml index 5cb66e97e816..6401b0a9aff4 100644 --- a/Documentation/devicetree/bindings/hwmon/adi,ltc2945.yaml +++ b/Documentation/devicetree/bindings/hwmon/adi,ltc2945.yaml @@ -31,7 +31,10 @@ required: - compatible - reg -additionalProperties: false +allOf: + - $ref: hwmon-common.yaml# + +unevaluatedProperties: false examples: - | diff --git a/Documentation/devicetree/bindings/hwmon/adi,ltc4282.yaml b/Documentation/devicetree/bindings/hwmon/adi,ltc4282.yaml new file mode 100644 index 000000000000..4854b95a93e3 --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/adi,ltc4282.yaml @@ -0,0 +1,159 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/hwmon/adi,ltc4282.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Analog Devices LTC4282 I2C High Current Hot Swap Controller over I2C + +maintainers: + - Nuno Sa <nuno.sa@analog.com> + +description: | + Analog Devices LTC4282 I2C High Current Hot Swap Controller over I2C. + + https://www.analog.com/media/en/technical-documentation/data-sheets/ltc4282.pdf + +properties: + compatible: + enum: + - adi,ltc4282 + + reg: + maxItems: 1 + + vdd-supply: true + + clocks: + maxItems: 1 + + '#clock-cells': + const: 0 + + adi,rsense-nano-ohms: + description: Value of the sense resistor. + + adi,vin-mode-microvolt: + description: + Selects operating range for the Undervoltage, Overvoltage and Foldback + pins. Also for the ADC. Should be set to the nominal input voltage. + enum: [3300000, 5000000, 12000000, 24000000] + default: 12000000 + + adi,fet-bad-timeout-ms: + description: + From the moment a FET bad conditions is present, this property selects the + wait time/timeout for a FET-bad fault to be signaled. Setting this to 0, + disables FET bad faults to be reported. + default: 255 + maximum: 255 + + adi,overvoltage-dividers: + description: | + Select which dividers to use for VDD Overvoltage detection. Note that + when the internal dividers are used the threshold is referenced to VDD. + The percentages in the datasheet are misleading since the actual values + to look for are in the "Absolute Maximum Ratings" table in the + "Comparator Inputs" section. In there there's a line for each of the 5%, + 10% and 15% settings with the actual min, typical and max tolerances. + $ref: /schemas/types.yaml#/definitions/string + enum: [external, vdd_5_percent, vdd_10_percent, vdd_15_percent] + default: external + + adi,undervoltage-dividers: + description: | + Select which dividers to use for VDD Overvoltage detection. Note that + when the internal dividers are used the threshold is referenced to VDD. + The percentages in the datasheet are misleading since the actual values + to look for are in the "Absolute Maximum Ratings" table in the + "Comparator Inputs" section. In there there's a line for each of the 5%, + 10% and 15% settings with the actual min, typical and max tolerances. + $ref: /schemas/types.yaml#/definitions/string + enum: [external, vdd_5_percent, vdd_10_percent, vdd_15_percent] + default: external + + adi,current-limit-sense-microvolt: + description: + The current limit sense voltage of the chip is adjustable between + 12.5mV and 34.4mV in 3.1mV steps. This effectively limits the current + on the load. + enum: [12500, 15625, 18750, 21875, 25000, 28125, 31250, 34375] + default: 25000 + + adi,overcurrent-retry: + description: + If set, enables the chip to auto-retry 256 timer cycles after an + Overcurrent fault. + type: boolean + + adi,overvoltage-retry-disable: + description: + If set, disables the chip to auto-retry 50ms after an Overvoltage fault. + It's enabled by default. + type: boolean + + adi,undervoltage-retry-disable: + description: + If set, disables the chip to auto-retry 50ms after an Undervoltage fault. + It's enabled by default. + type: boolean + + adi,fault-log-enable: + description: + If set, enables the FAULT_LOG and ADC_ALERT_LOG registers to be written + to the EEPROM when a fault bit transitions high and hence, will be + available after a power cycle (the chip loads the contents of + the EE_FAULT_LOG register - the one in EEPROM - into FAULT_LOG at boot). + type: boolean + + adi,gpio1-mode: + description: Defines the function of the Pin. It can indicate that power is + good (PULL the pin low when power is not good) or that power is bad (Go + into high-z when power is not good). + $ref: /schemas/types.yaml#/definitions/string + enum: [power_bad, power_good] + default: power_good + + adi,gpio2-mode: + description: Defines the function of the Pin. It can be set as the input for + the ADC or indicating that the MOSFET is in stress (dissipating power). + $ref: /schemas/types.yaml#/definitions/string + enum: [adc_input, stress_fet] + default: adc_input + + adi,gpio3-monitor-enable: + description: If set, gpio3 is set as input for the ADC instead of gpio2. + type: boolean + +allOf: + - if: + required: + - adi,gpio3-monitor-enable + then: + properties: + adi,gpio2-mode: + const: stress_fet + +required: + - compatible + - reg + - adi,rsense-nano-ohms + +additionalProperties: false + +examples: + - | + i2c { + #address-cells = <1>; + #size-cells = <0>; + + hwmon@50 { + compatible = "adi,ltc4282"; + reg = <0x50>; + adi,rsense-nano-ohms = <500>; + + adi,gpio1-mode = "power_good"; + adi,gpio2-mode = "adc_input"; + }; + }; +... diff --git a/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml b/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml new file mode 100644 index 000000000000..17351fdbefce --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/amphenol,chipcap2.yaml @@ -0,0 +1,77 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/hwmon/amphenol,chipcap2.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: ChipCap 2 humidity and temperature iio sensor + +maintainers: + - Javier Carrasco <javier.carrasco.cruz@gmail.com> + +description: | + Relative humidity and temperature sensor on I2C bus. + + Datasheets: + https://www.amphenol-sensors.com/en/telaire/humidity/527-humidity-sensors/3095-chipcap-2 + +properties: + compatible: + oneOf: + - const: amphenol,cc2d23 + - items: + - enum: + - amphenol,cc2d23s + - amphenol,cc2d25 + - amphenol,cc2d25s + - amphenol,cc2d33 + - amphenol,cc2d33s + - amphenol,cc2d35 + - amphenol,cc2d35s + - const: amphenol,cc2d23 + + reg: + maxItems: 1 + + interrupts: + items: + - description: measurement ready indicator + - description: low humidity alarm + - description: high humidity alarm + + interrupt-names: + items: + - const: ready + - const: low + - const: high + + vdd-supply: + description: + Dedicated, controllable supply-regulator to reset the device and + enter in command mode. + +required: + - compatible + - reg + - vdd-supply + +additionalProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + i2c { + #address-cells = <1>; + #size-cells = <0>; + + humidity@28 { + compatible = "amphenol,cc2d23s", "amphenol,cc2d23"; + reg = <0x28>; + interrupt-parent = <&gpio>; + interrupts = <4 IRQ_TYPE_EDGE_RISING>, + <5 IRQ_TYPE_EDGE_RISING>, + <6 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "ready", "low", "high"; + vdd-supply = <®_vdd>; + }; + }; diff --git a/Documentation/devicetree/bindings/hwmon/aspeed,g6-pwm-tach.yaml b/Documentation/devicetree/bindings/hwmon/aspeed,g6-pwm-tach.yaml new file mode 100644 index 000000000000..9e5ed901ae54 --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/aspeed,g6-pwm-tach.yaml @@ -0,0 +1,71 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +# Copyright (C) 2023 Aspeed, Inc. +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/hwmon/aspeed,g6-pwm-tach.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: ASPEED G6 PWM and Fan Tach controller + +maintainers: + - Billy Tsai <billy_tsai@aspeedtech.com> + +description: | + The ASPEED PWM controller can support up to 16 PWM outputs. + The ASPEED Fan Tacho controller can support up to 16 fan tach input. + They are independent hardware blocks, which are different from the + previous version of the ASPEED chip. + +properties: + compatible: + enum: + - aspeed,ast2600-pwm-tach + + reg: + maxItems: 1 + + clocks: + maxItems: 1 + + resets: + maxItems: 1 + + "#pwm-cells": + const: 3 + +patternProperties: + "^fan-[0-9]+$": + $ref: fan-common.yaml# + unevaluatedProperties: false + required: + - tach-ch + +required: + - reg + - clocks + - resets + - "#pwm-cells" + - compatible + +additionalProperties: false + +examples: + - | + #include <dt-bindings/clock/aspeed-clock.h> + pwm_tach: pwm-tach-controller@1e610000 { + compatible = "aspeed,ast2600-pwm-tach"; + reg = <0x1e610000 0x100>; + clocks = <&syscon ASPEED_CLK_AHB>; + resets = <&syscon ASPEED_RESET_PWM>; + #pwm-cells = <3>; + + fan-0 { + tach-ch = /bits/ 8 <0x0>; + pwms = <&pwm_tach 0 40000 0>; + }; + + fan-1 { + tach-ch = /bits/ 8 <0x1 0x2>; + pwms = <&pwm_tach 1 40000 0>; + }; + }; diff --git a/Documentation/devicetree/bindings/hwmon/fan-common.yaml b/Documentation/devicetree/bindings/hwmon/fan-common.yaml new file mode 100644 index 000000000000..0fb738081699 --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/fan-common.yaml @@ -0,0 +1,79 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/hwmon/fan-common.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Common Fan Properties + +maintainers: + - Naresh Solanki <naresh.solanki@9elements.com> + - Billy Tsai <billy_tsai@aspeedtech.com> + +properties: + max-rpm: + description: + Max RPM supported by fan. + $ref: /schemas/types.yaml#/definitions/uint32 + maximum: 100000 + + min-rpm: + description: + Min RPM supported by fan. + $ref: /schemas/types.yaml#/definitions/uint32 + maximum: 1000 + + pulses-per-revolution: + description: + The number of pulse from fan sensor per revolution. + $ref: /schemas/types.yaml#/definitions/uint32 + maximum: 4 + + tach-div: + description: + Divisor for the tach sampling clock, which determines the sensitivity of the tach pin. + $ref: /schemas/types.yaml#/definitions/uint32 + + target-rpm: + description: + The default desired fan speed in RPM. + $ref: /schemas/types.yaml#/definitions/uint32 + + fan-driving-mode: + description: + Select the driving mode of the fan.(DC, PWM and so on) + $ref: /schemas/types.yaml#/definitions/string + enum: [ dc, pwm ] + + pwms: + description: + PWM provider. + maxItems: 1 + + "#cooling-cells": + const: 2 + + cooling-levels: + description: + The control value which correspond to thermal cooling states. + $ref: /schemas/types.yaml#/definitions/uint32-array + + tach-ch: + description: + The tach channel used for the fan. + $ref: /schemas/types.yaml#/definitions/uint8-array + + label: + description: + Optional fan label + + fan-supply: + description: + Power supply for fan. + + reg: + maxItems: 1 + +additionalProperties: true + +... diff --git a/Documentation/devicetree/bindings/hwmon/hwmon-common.yaml b/Documentation/devicetree/bindings/hwmon/hwmon-common.yaml new file mode 100644 index 000000000000..dc86b5c72cf2 --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/hwmon-common.yaml @@ -0,0 +1,19 @@ +# SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/hwmon/hwmon-common.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Hardware Monitoring Devices Common Properties + +maintainers: + - Guenter Roeck <linux@roeck-us.net> + +properties: + label: + description: A descriptive name for this device. + + shunt-resistor-micro-ohms: + description: The value of current sense resistor. + +additionalProperties: true diff --git a/Documentation/devicetree/bindings/hwmon/lltc,ltc4151.yaml b/Documentation/devicetree/bindings/hwmon/lltc,ltc4151.yaml index e62aff670478..8f0095bb7f6e 100644 --- a/Documentation/devicetree/bindings/hwmon/lltc,ltc4151.yaml +++ b/Documentation/devicetree/bindings/hwmon/lltc,ltc4151.yaml @@ -25,7 +25,10 @@ required: - compatible - reg -additionalProperties: false +allOf: + - $ref: hwmon-common.yaml# + +unevaluatedProperties: false examples: - | diff --git a/Documentation/devicetree/bindings/hwmon/lltc,ltc4286.yaml b/Documentation/devicetree/bindings/hwmon/lltc,ltc4286.yaml index 98ca163d3486..853df9fef6c8 100644 --- a/Documentation/devicetree/bindings/hwmon/lltc,ltc4286.yaml +++ b/Documentation/devicetree/bindings/hwmon/lltc,ltc4286.yaml @@ -25,15 +25,14 @@ properties: The default is 102.4 volts. type: boolean - shunt-resistor-micro-ohms: - description: - Resistor value micro-ohms. - required: - compatible - reg -additionalProperties: false +allOf: + - $ref: hwmon-common.yaml# + +unevaluatedProperties: false examples: - | diff --git a/Documentation/devicetree/bindings/hwmon/lm75.yaml b/Documentation/devicetree/bindings/hwmon/lm75.yaml index ed269e428a3d..29bd7460cc26 100644 --- a/Documentation/devicetree/bindings/hwmon/lm75.yaml +++ b/Documentation/devicetree/bindings/hwmon/lm75.yaml @@ -57,6 +57,7 @@ required: - reg allOf: + - $ref: hwmon-common.yaml# - if: not: properties: @@ -71,7 +72,7 @@ allOf: properties: interrupts: false -additionalProperties: false +unevaluatedProperties: false examples: - | diff --git a/Documentation/devicetree/bindings/hwmon/nuvoton,nct6775.yaml b/Documentation/devicetree/bindings/hwmon/nuvoton,nct6775.yaml index 358b262431fc..e3db642878d4 100644 --- a/Documentation/devicetree/bindings/hwmon/nuvoton,nct6775.yaml +++ b/Documentation/devicetree/bindings/hwmon/nuvoton,nct6775.yaml @@ -25,6 +25,7 @@ properties: - nuvoton,nct6796 - nuvoton,nct6797 - nuvoton,nct6798 + - nuvoton,nct6799 reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/hwmon/pmbus/infineon,tda38640.yaml b/Documentation/devicetree/bindings/hwmon/pmbus/infineon,tda38640.yaml index ded1c115764b..5c4e52b472ad 100644 --- a/Documentation/devicetree/bindings/hwmon/pmbus/infineon,tda38640.yaml +++ b/Documentation/devicetree/bindings/hwmon/pmbus/infineon,tda38640.yaml @@ -30,6 +30,23 @@ properties: unconnected(has internal pull-down). type: boolean + interrupts: + maxItems: 1 + + regulators: + type: object + description: + list of regulators provided by this controller. + + properties: + vout: + $ref: /schemas/regulator/regulator.yaml# + type: object + + unevaluatedProperties: false + + additionalProperties: false + required: - compatible - reg @@ -38,6 +55,7 @@ additionalProperties: false examples: - | + #include <dt-bindings/interrupt-controller/irq.h> i2c { #address-cells = <1>; #size-cells = <0>; @@ -45,5 +63,15 @@ examples: tda38640@40 { compatible = "infineon,tda38640"; reg = <0x40>; + + interrupt-parent = <&smb_pex_cpu0_event>; + interrupts = <10 IRQ_TYPE_LEVEL_LOW>; + + regulators { + pvnn_main_cpu0: vout { + regulator-name = "pvnn_main_cpu0"; + regulator-enable-ramp-delay = <200>; + }; + }; }; }; diff --git a/Documentation/devicetree/bindings/hwmon/pmbus/ti,lm25066.yaml b/Documentation/devicetree/bindings/hwmon/pmbus/ti,lm25066.yaml index da8292bc32f5..a20f140dc79a 100644 --- a/Documentation/devicetree/bindings/hwmon/pmbus/ti,lm25066.yaml +++ b/Documentation/devicetree/bindings/hwmon/pmbus/ti,lm25066.yaml @@ -34,11 +34,26 @@ properties: Shunt (sense) resistor value in micro-Ohms default: 1000 + regulators: + type: object + + properties: + vout: + $ref: /schemas/regulator/regulator.yaml# + type: object + + unevaluatedProperties: false + + additionalProperties: false + required: - compatible - reg -additionalProperties: false +allOf: + - $ref: /schemas/hwmon/hwmon-common.yaml# + +unevaluatedProperties: false examples: - | diff --git a/Documentation/devicetree/bindings/hwmon/ti,ina2xx.yaml b/Documentation/devicetree/bindings/hwmon/ti,ina2xx.yaml index 378d1f6aeeb3..df86c2c92037 100644 --- a/Documentation/devicetree/bindings/hwmon/ti,ina2xx.yaml +++ b/Documentation/devicetree/bindings/hwmon/ti,ina2xx.yaml @@ -28,10 +28,14 @@ properties: - ti,ina231 - ti,ina237 - ti,ina238 + - ti,ina260 reg: maxItems: 1 + "#io-channel-cells": + const: 1 + shunt-resistor: description: Shunt resistor value in micro-Ohm. @@ -66,7 +70,10 @@ required: - compatible - reg -additionalProperties: false +allOf: + - $ref: hwmon-common.yaml# + +unevaluatedProperties: false examples: - | @@ -77,6 +84,8 @@ examples: power-sensor@44 { compatible = "ti,ina220"; reg = <0x44>; + #io-channel-cells = <1>; + label = "vdd_3v0"; shunt-resistor = <1000>; vs-supply = <&vdd_3v0>; }; diff --git a/Documentation/devicetree/bindings/hwmon/ti,tmp513.yaml b/Documentation/devicetree/bindings/hwmon/ti,tmp513.yaml index cdd1489e0c54..227858e76058 100644 --- a/Documentation/devicetree/bindings/hwmon/ti,tmp513.yaml +++ b/Documentation/devicetree/bindings/hwmon/ti,tmp513.yaml @@ -72,7 +72,10 @@ required: - compatible - reg -additionalProperties: false +allOf: + - $ref: hwmon-common.yaml# + +unevaluatedProperties: false examples: - | diff --git a/Documentation/devicetree/bindings/hwmon/ti,tps23861.yaml b/Documentation/devicetree/bindings/hwmon/ti,tps23861.yaml index ebc8d466c1aa..f58248c29e22 100644 --- a/Documentation/devicetree/bindings/hwmon/ti,tps23861.yaml +++ b/Documentation/devicetree/bindings/hwmon/ti,tps23861.yaml @@ -35,7 +35,10 @@ required: - compatible - reg -additionalProperties: false +allOf: + - $ref: hwmon-common.yaml# + +unevaluatedProperties: false examples: - | diff --git a/Documentation/devicetree/bindings/i2c/i2c-exynos5.yaml b/Documentation/devicetree/bindings/i2c/i2c-exynos5.yaml index df9c57bca2a8..cc8bba5537b9 100644 --- a/Documentation/devicetree/bindings/i2c/i2c-exynos5.yaml +++ b/Documentation/devicetree/bindings/i2c/i2c-exynos5.yaml @@ -33,6 +33,7 @@ properties: - const: samsung,exynos7-hsi2c - items: - enum: + - google,gs101-hsi2c - samsung,exynos850-hsi2c - const: samsung,exynosautov9-hsi2c - const: samsung,exynos5-hsi2c # Exynos5250 and Exynos5420 diff --git a/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml b/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml index 3d06db98e978..a93744763787 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml +++ b/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml @@ -36,6 +36,7 @@ properties: - amlogic,meson-a1-gpio-intc - amlogic,meson-s4-gpio-intc - amlogic,c3-gpio-intc + - amlogic,t7-gpio-intc - const: amlogic,meson-gpio-intc reg: diff --git a/Documentation/devicetree/bindings/interrupt-controller/starfive,jh8100-intc.yaml b/Documentation/devicetree/bindings/interrupt-controller/starfive,jh8100-intc.yaml new file mode 100644 index 000000000000..ada5788602d6 --- /dev/null +++ b/Documentation/devicetree/bindings/interrupt-controller/starfive,jh8100-intc.yaml @@ -0,0 +1,61 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/interrupt-controller/starfive,jh8100-intc.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: StarFive External Interrupt Controller + +description: + StarFive SoC JH8100 contain a external interrupt controller. It can be used + to handle high-level input interrupt signals. It also send the output + interrupt signal to RISC-V PLIC. + +maintainers: + - Changhuang Liang <changhuang.liang@starfivetech.com> + +properties: + compatible: + const: starfive,jh8100-intc + + reg: + maxItems: 1 + + clocks: + description: APB clock for the interrupt controller + maxItems: 1 + + resets: + description: APB reset for the interrupt controller + maxItems: 1 + + interrupts: + maxItems: 1 + + interrupt-controller: true + + "#interrupt-cells": + const: 1 + +required: + - compatible + - reg + - clocks + - resets + - interrupts + - interrupt-controller + - "#interrupt-cells" + +additionalProperties: false + +examples: + - | + interrupt-controller@12260000 { + compatible = "starfive,jh8100-intc"; + reg = <0x12260000 0x10000>; + clocks = <&syscrg_ne 76>; + resets = <&syscrg_ne 13>; + interrupts = <45>; + interrupt-controller; + #interrupt-cells = <1>; + }; diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index a4042ae24770..5c130cf06a21 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -83,6 +83,7 @@ properties: - description: Qcom Adreno GPUs implementing "qcom,smmu-500" and "arm,mmu-500" items: - enum: + - qcom,qcm2290-smmu-500 - qcom,sa8775p-smmu-500 - qcom,sc7280-smmu-500 - qcom,sc8280xp-smmu-500 @@ -93,6 +94,7 @@ properties: - qcom,sm8350-smmu-500 - qcom,sm8450-smmu-500 - qcom,sm8550-smmu-500 + - qcom,sm8650-smmu-500 - const: qcom,adreno-smmu - const: qcom,smmu-500 - const: arm,mmu-500 @@ -462,6 +464,7 @@ allOf: compatible: items: - enum: + - qcom,qcm2290-smmu-500 - qcom,sm6115-smmu-500 - qcom,sm6125-smmu-500 - const: qcom,adreno-smmu @@ -484,7 +487,12 @@ allOf: - if: properties: compatible: - const: qcom,sm8450-smmu-500 + items: + - const: qcom,sm8450-smmu-500 + - const: qcom,adreno-smmu + - const: qcom,smmu-500 + - const: arm,mmu-500 + then: properties: clock-names: @@ -508,7 +516,13 @@ allOf: - if: properties: compatible: - const: qcom,sm8550-smmu-500 + items: + - enum: + - qcom,sm8550-smmu-500 + - qcom,sm8650-smmu-500 + - const: qcom,adreno-smmu + - const: qcom,smmu-500 + - const: arm,mmu-500 then: properties: clock-names: @@ -534,7 +548,6 @@ allOf: - cavium,smmu-v2 - marvell,ap806-smmu-500 - nvidia,smmu-500 - - qcom,qcm2290-smmu-500 - qcom,qdu1000-smmu-500 - qcom,sc7180-smmu-500 - qcom,sc8180x-smmu-500 @@ -544,7 +557,6 @@ allOf: - qcom,sdx65-smmu-500 - qcom,sm6350-smmu-500 - qcom,sm6375-smmu-500 - - qcom,sm8650-smmu-500 - qcom,x1e80100-smmu-500 then: properties: diff --git a/Documentation/devicetree/bindings/leds/common.yaml b/Documentation/devicetree/bindings/leds/common.yaml index 55a8d1385e21..8a3c2398b10c 100644 --- a/Documentation/devicetree/bindings/leds/common.yaml +++ b/Documentation/devicetree/bindings/leds/common.yaml @@ -200,6 +200,18 @@ properties: #trigger-source-cells property in the source node. $ref: /schemas/types.yaml#/definitions/phandle-array + active-low: + type: boolean + description: + Makes LED active low. To turn the LED ON, line needs to be + set to low voltage instead of high. + + inactive-high-impedance: + type: boolean + description: + Set LED to high-impedance mode to turn the LED OFF. LED might also + describe this mode as tristate. + # Required properties for flash LED child nodes: flash-max-microamp: description: diff --git a/Documentation/devicetree/bindings/leds/leds-bcm63138.yaml b/Documentation/devicetree/bindings/leds/leds-bcm63138.yaml index 52252fb6bb32..bb20394fca5c 100644 --- a/Documentation/devicetree/bindings/leds/leds-bcm63138.yaml +++ b/Documentation/devicetree/bindings/leds/leds-bcm63138.yaml @@ -52,10 +52,6 @@ patternProperties: maxItems: 1 description: LED pin number - active-low: - type: boolean - description: Makes LED active low - required: - reg diff --git a/Documentation/devicetree/bindings/leds/leds-bcm6328.yaml b/Documentation/devicetree/bindings/leds/leds-bcm6328.yaml index 51cc0d82c12e..f3a3ef992929 100644 --- a/Documentation/devicetree/bindings/leds/leds-bcm6328.yaml +++ b/Documentation/devicetree/bindings/leds/leds-bcm6328.yaml @@ -78,10 +78,6 @@ patternProperties: - maximum: 23 description: LED pin number (only LEDs 0 to 23 are valid). - active-low: - type: boolean - description: Makes LED active low. - brcm,hardware-controlled: type: boolean description: Makes this LED hardware controlled. diff --git a/Documentation/devicetree/bindings/leds/leds-bcm6358.txt b/Documentation/devicetree/bindings/leds/leds-bcm6358.txt index 6e51c6b91ee5..211ffc3c4a20 100644 --- a/Documentation/devicetree/bindings/leds/leds-bcm6358.txt +++ b/Documentation/devicetree/bindings/leds/leds-bcm6358.txt @@ -25,8 +25,6 @@ LED sub-node required properties: LED sub-node optional properties: - label : see Documentation/devicetree/bindings/leds/common.txt - - active-low : Boolean, makes LED active low. - Default : false - default-state : see Documentation/devicetree/bindings/leds/common.txt - linux,default-trigger : see diff --git a/Documentation/devicetree/bindings/leds/leds-pwm-multicolor.yaml b/Documentation/devicetree/bindings/leds/leds-pwm-multicolor.yaml index bd6ec04a8727..a31a202afe5c 100644 --- a/Documentation/devicetree/bindings/leds/leds-pwm-multicolor.yaml +++ b/Documentation/devicetree/bindings/leds/leds-pwm-multicolor.yaml @@ -41,9 +41,7 @@ properties: pwm-names: true - active-low: - description: For PWMs where the LED is wired to supply rather than ground. - type: boolean + active-low: true color: true diff --git a/Documentation/devicetree/bindings/leds/leds-pwm.yaml b/Documentation/devicetree/bindings/leds/leds-pwm.yaml index 7de6da58be3c..113b7c218303 100644 --- a/Documentation/devicetree/bindings/leds/leds-pwm.yaml +++ b/Documentation/devicetree/bindings/leds/leds-pwm.yaml @@ -34,11 +34,6 @@ patternProperties: Maximum brightness possible for the LED $ref: /schemas/types.yaml#/definitions/uint32 - active-low: - description: - For PWMs where the LED is wired to supply rather than ground. - type: boolean - required: - pwms - max-brightness diff --git a/Documentation/devicetree/bindings/mailbox/fsl,mu.yaml b/Documentation/devicetree/bindings/mailbox/fsl,mu.yaml index 12e7a7d536a3..00631afcd51d 100644 --- a/Documentation/devicetree/bindings/mailbox/fsl,mu.yaml +++ b/Documentation/devicetree/bindings/mailbox/fsl,mu.yaml @@ -29,8 +29,11 @@ properties: - const: fsl,imx8ulp-mu - const: fsl,imx8-mu-scu - const: fsl,imx8-mu-seco - - const: fsl,imx93-mu-s4 - const: fsl,imx8ulp-mu-s4 + - const: fsl,imx93-mu-s4 + - const: fsl,imx95-mu + - const: fsl,imx95-mu-ele + - const: fsl,imx95-mu-v2x - items: - const: fsl,imx93-mu - const: fsl,imx8ulp-mu @@ -95,6 +98,19 @@ properties: power-domains: maxItems: 1 + ranges: true + + '#address-cells': + const: 1 + + '#size-cells': + const: 1 + +patternProperties: + "^sram@[a-f0-9]+": + $ref: /schemas/sram/sram.yaml# + unevaluatedProperties: false + required: - compatible - reg @@ -122,6 +138,15 @@ allOf: required: - interrupt-names + - if: + not: + properties: + compatible: + const: fsl,imx95-mu + then: + patternProperties: + "^sram@[a-f0-9]+": false + additionalProperties: false examples: @@ -134,3 +159,34 @@ examples: interrupts = <GIC_SPI 176 IRQ_TYPE_LEVEL_HIGH>; #mbox-cells = <2>; }; + + - | + #include <dt-bindings/interrupt-controller/arm-gic.h> + + mailbox@445b0000 { + compatible = "fsl,imx95-mu"; + reg = <0x445b0000 0x10000>; + ranges; + interrupts = <GIC_SPI 226 IRQ_TYPE_LEVEL_HIGH>; + #address-cells = <1>; + #size-cells = <1>; + #mbox-cells = <2>; + + sram@445b1000 { + compatible = "mmio-sram"; + reg = <0x445b1000 0x400>; + ranges = <0x0 0x445b1000 0x400>; + #address-cells = <1>; + #size-cells = <1>; + + scmi-sram-section@0 { + compatible = "arm,scmi-shmem"; + reg = <0x0 0x80>; + }; + + scmi-sram-section@80 { + compatible = "arm,scmi-shmem"; + reg = <0x80 0x80>; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/media/cnm,wave521c.yaml b/Documentation/devicetree/bindings/media/cnm,wave521c.yaml index 6d5569e77b7a..6a11c1d11fb5 100644 --- a/Documentation/devicetree/bindings/media/cnm,wave521c.yaml +++ b/Documentation/devicetree/bindings/media/cnm,wave521c.yaml @@ -17,7 +17,7 @@ properties: compatible: items: - enum: - - ti,k3-j721s2-wave521c + - ti,j721s2-wave521c - const: cnm,wave521c reg: @@ -53,7 +53,7 @@ additionalProperties: false examples: - | vpu: video-codec@12345678 { - compatible = "ti,k3-j721s2-wave521c", "cnm,wave521c"; + compatible = "ti,j721s2-wave521c", "cnm,wave521c"; reg = <0x12345678 0x1000>; clocks = <&clks 42>; interrupts = <42>; diff --git a/Documentation/devicetree/bindings/media/mediatek,vcodec-encoder.yaml b/Documentation/devicetree/bindings/media/mediatek,vcodec-encoder.yaml index a2051b31fa29..b45743d0a9ec 100644 --- a/Documentation/devicetree/bindings/media/mediatek,vcodec-encoder.yaml +++ b/Documentation/devicetree/bindings/media/mediatek,vcodec-encoder.yaml @@ -16,14 +16,18 @@ description: |+ properties: compatible: - enum: - - mediatek,mt8173-vcodec-enc-vp8 - - mediatek,mt8173-vcodec-enc - - mediatek,mt8183-vcodec-enc - - mediatek,mt8188-vcodec-enc - - mediatek,mt8192-vcodec-enc - - mediatek,mt8195-vcodec-enc - + oneOf: + - items: + - enum: + - mediatek,mt8173-vcodec-enc-vp8 + - mediatek,mt8173-vcodec-enc + - mediatek,mt8183-vcodec-enc + - mediatek,mt8188-vcodec-enc + - mediatek,mt8192-vcodec-enc + - mediatek,mt8195-vcodec-enc + - items: + - const: mediatek,mt8186-vcodec-enc + - const: mediatek,mt8183-vcodec-enc reg: maxItems: 1 @@ -109,10 +113,7 @@ allOf: properties: compatible: enum: - - mediatek,mt8173-vcodec-enc - - mediatek,mt8188-vcodec-enc - - mediatek,mt8192-vcodec-enc - - mediatek,mt8195-vcodec-enc + - mediatek,mt8173-vcodec-enc-vp8 then: properties: @@ -122,8 +123,8 @@ allOf: maxItems: 1 clock-names: items: - - const: venc_sel - else: # for vp8 hw encoder + - const: venc_lt_sel + else: properties: clock: items: @@ -131,7 +132,7 @@ allOf: maxItems: 1 clock-names: items: - - const: venc_lt_sel + - const: venc_sel additionalProperties: false diff --git a/Documentation/devicetree/bindings/media/mediatek-jpeg-encoder.yaml b/Documentation/devicetree/bindings/media/mediatek-jpeg-encoder.yaml index 37800e1908cc..83c020a673d6 100644 --- a/Documentation/devicetree/bindings/media/mediatek-jpeg-encoder.yaml +++ b/Documentation/devicetree/bindings/media/mediatek-jpeg-encoder.yaml @@ -38,7 +38,8 @@ properties: maxItems: 1 iommus: - maxItems: 2 + minItems: 2 + maxItems: 4 description: | Points to the respective IOMMU block with master port as argument, see Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml for details. diff --git a/Documentation/devicetree/bindings/memory-controllers/fsl/fsl,imx-weim-peripherals.yaml b/Documentation/devicetree/bindings/memory-controllers/fsl/fsl,imx-weim-peripherals.yaml new file mode 100644 index 000000000000..82fc5f4a1ed6 --- /dev/null +++ b/Documentation/devicetree/bindings/memory-controllers/fsl/fsl,imx-weim-peripherals.yaml @@ -0,0 +1,31 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/memory-controllers/fsl/fsl,imx-weim-peripherals.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: i.MX WEIM Bus Peripheral Nodes + +maintainers: + - Shawn Guo <shawnguo@kernel.org> + - Sascha Hauer <s.hauer@pengutronix.de> + +description: + This binding is meant for the child nodes of the WEIM node. The node + represents any device connected to the WEIM bus. It may be a Flash chip, + RAM chip or Ethernet controller, etc. These properties are meant for + configuring the WEIM settings/timings and will accompany the bindings + supported by the respective device. + +properties: + reg: true + + fsl,weim-cs-timing: + $ref: /schemas/types.yaml#/definitions/uint32-array + description: + Timing values for the child node. + minItems: 2 + maxItems: 6 + +# the WEIM child will have its own native properties +additionalProperties: true diff --git a/Documentation/devicetree/bindings/memory-controllers/fsl/fsl,imx-weim.yaml b/Documentation/devicetree/bindings/memory-controllers/fsl/fsl,imx-weim.yaml new file mode 100644 index 000000000000..3f40ca5b13f6 --- /dev/null +++ b/Documentation/devicetree/bindings/memory-controllers/fsl/fsl,imx-weim.yaml @@ -0,0 +1,204 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/memory-controllers/fsl/fsl,imx-weim.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: i.MX Wireless External Interface Module (WEIM) + +maintainers: + - Shawn Guo <shawnguo@kernel.org> + - Sascha Hauer <s.hauer@pengutronix.de> + +description: + The term "wireless" does not imply that the WEIM is literally an interface + without wires. It simply means that this module was originally designed for + wireless and mobile applications that use low-power technology. The actual + devices are instantiated from the child nodes of a WEIM node. + +properties: + $nodename: + pattern: "^memory-controller@[0-9a-f]+$" + + compatible: + oneOf: + - enum: + - fsl,imx1-weim + - fsl,imx27-weim + - fsl,imx50-weim + - fsl,imx51-weim + - fsl,imx6q-weim + - items: + - enum: + - fsl,imx31-weim + - fsl,imx35-weim + - const: fsl,imx27-weim + - items: + - enum: + - fsl,imx6sx-weim + - fsl,imx6ul-weim + - const: fsl,imx6q-weim + + "#address-cells": + const: 2 + + "#size-cells": + const: 1 + + reg: + maxItems: 1 + + clocks: + maxItems: 1 + + interrupts: + maxItems: 1 + + ranges: true + + fsl,weim-cs-gpr: + $ref: /schemas/types.yaml#/definitions/phandle + description: | + Phandle to the system General Purpose Register controller that contains + WEIM CS GPR register, e.g. IOMUXC_GPR1 on i.MX6Q. IOMUXC_GPR1[11:0] + should be set up as one of the following 4 possible values depending on + the CS space configuration. + + IOMUXC_GPR1[11:0] CS0 CS1 CS2 CS3 + --------------------------------------------- + 05 128M 0M 0M 0M + 033 64M 64M 0M 0M + 0113 64M 32M 32M 0M + 01111 32M 32M 32M 32M + + In case that the property is absent, the reset value or what bootloader + sets up in IOMUXC_GPR1[11:0] will be used. + + fsl,burst-clk-enable: + type: boolean + description: + The presence of this property indicates that the weim bus should operate + in Burst Clock Mode. + + fsl,continuous-burst-clk: + type: boolean + description: + Make Burst Clock to output continuous clock. Without this option Burst + Clock will output clock only when necessary. + +patternProperties: + "^.*@[0-7],[0-9a-f]+$": + type: object + description: Devices attached to chip selects are represented as subnodes. + $ref: fsl,imx-weim-peripherals.yaml + additionalProperties: true + required: + - fsl,weim-cs-timing + +required: + - compatible + - reg + - clocks + - "#address-cells" + - "#size-cells" + - ranges + +allOf: + - if: + properties: + compatible: + not: + contains: + enum: + - fsl,imx50-weim + - fsl,imx6q-weim + then: + properties: + fsl,weim-cs-gpr: false + fsl,burst-clk-enable: false + - if: + not: + required: + - fsl,burst-clk-enable + then: + properties: + fsl,continuous-burst-clk: false + - if: + properties: + compatible: + contains: + const: fsl,imx1-weim + then: + patternProperties: + "^.*@[0-7],[0-9a-f]+$": + properties: + fsl,weim-cs-timing: + items: + items: + - description: CSxU + - description: CSxL + - if: + properties: + compatible: + contains: + enum: + - fsl,imx27-weim + - fsl,imx31-weim + - fsl,imx35-weim + then: + patternProperties: + "^.*@[0-7],[0-9a-f]+$": + properties: + fsl,weim-cs-timing: + items: + items: + - description: CSCRxU + - description: CSCRxL + - description: CSCRxA + - if: + properties: + compatible: + contains: + enum: + - fsl,imx50-weim + - fsl,imx51-weim + - fsl,imx6q-weim + - fsl,imx6sx-weim + - fsl,imx6ul-weim + then: + patternProperties: + "^.*@[0-7],[0-9a-f]+$": + properties: + fsl,weim-cs-timing: + items: + items: + - description: CSxGCR1 + - description: CSxGCR2 + - description: CSxRCR1 + - description: CSxRCR2 + - description: CSxWCR1 + - description: CSxWCR2 + +additionalProperties: false + +examples: + - | + memory-controller@21b8000 { + compatible = "fsl,imx6q-weim"; + reg = <0x021b8000 0x4000>; + clocks = <&clks 196>; + #address-cells = <2>; + #size-cells = <1>; + ranges = <0 0 0x08000000 0x08000000>; + fsl,weim-cs-gpr = <&gpr>; + + flash@0,0 { + compatible = "cfi-flash"; + reg = <0 0 0x02000000>; + #address-cells = <1>; + #size-cells = <1>; + bank-width = <2>; + fsl,weim-cs-timing = <0x00620081 0x00000001 0x1c022000 + 0x0000c000 0x1404a38e 0x00000000>; + }; + }; diff --git a/Documentation/devicetree/bindings/memory-controllers/mc-peripheral-props.yaml b/Documentation/devicetree/bindings/memory-controllers/mc-peripheral-props.yaml index 8d9dae15ade0..00deeb09f87d 100644 --- a/Documentation/devicetree/bindings/memory-controllers/mc-peripheral-props.yaml +++ b/Documentation/devicetree/bindings/memory-controllers/mc-peripheral-props.yaml @@ -37,5 +37,6 @@ allOf: - $ref: ingenic,nemc-peripherals.yaml# - $ref: intel,ixp4xx-expansion-peripheral-props.yaml# - $ref: ti,gpmc-child.yaml# + - $ref: fsl/fsl,imx-weim-peripherals.yaml additionalProperties: true diff --git a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-emc.yaml b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-emc.yaml index f54e553e6c0e..71896cb10692 100644 --- a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-emc.yaml +++ b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-emc.yaml @@ -145,7 +145,7 @@ patternProperties: "^emc-table@[0-9]+$": $ref: "#/$defs/emc-table" - "^emc-tables@[a-z0-9-]+$": + "^emc-tables@[a-f0-9-]+$": type: object properties: reg: diff --git a/Documentation/devicetree/bindings/memory-controllers/renesas,rpc-if.yaml b/Documentation/devicetree/bindings/memory-controllers/renesas,rpc-if.yaml index 25f3bb9890ae..d7745dd53b51 100644 --- a/Documentation/devicetree/bindings/memory-controllers/renesas,rpc-if.yaml +++ b/Documentation/devicetree/bindings/memory-controllers/renesas,rpc-if.yaml @@ -45,6 +45,7 @@ properties: - items: - enum: - renesas,r8a779g0-rpc-if # R-Car V4H + - renesas,r8a779h0-rpc-if # R-Car V4M - const: renesas,rcar-gen4-rpc-if # a generic R-Car gen4 device - items: diff --git a/Documentation/devicetree/bindings/memory-controllers/st,stm32-fmc2-ebi.yaml b/Documentation/devicetree/bindings/memory-controllers/st,stm32-fmc2-ebi.yaml index 14f1833d37c9..84ac6f50a6fc 100644 --- a/Documentation/devicetree/bindings/memory-controllers/st,stm32-fmc2-ebi.yaml +++ b/Documentation/devicetree/bindings/memory-controllers/st,stm32-fmc2-ebi.yaml @@ -23,7 +23,9 @@ maintainers: properties: compatible: - const: st,stm32mp1-fmc2-ebi + enum: + - st,stm32mp1-fmc2-ebi + - st,stm32mp25-fmc2-ebi reg: maxItems: 1 @@ -34,6 +36,9 @@ properties: resets: maxItems: 1 + power-domains: + maxItems: 1 + "#address-cells": const: 2 diff --git a/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml b/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml index 82eb7a24c857..82f7ee8702cb 100644 --- a/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml +++ b/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml @@ -55,8 +55,9 @@ properties: - enum: - fsl,imx8mn-usdhc - fsl,imx8mp-usdhc - - fsl,imx93-usdhc - fsl,imx8ulp-usdhc + - fsl,imx93-usdhc + - fsl,imx95-usdhc - const: fsl,imx8mm-usdhc - items: - enum: @@ -162,6 +163,9 @@ properties: - const: ahb - const: per + iommus: + maxItems: 1 + power-domains: maxItems: 1 @@ -173,6 +177,11 @@ properties: - const: state_100mhz - const: state_200mhz - const: sleep + - minItems: 2 + items: + - const: default + - const: state_100mhz + - const: sleep - minItems: 1 items: - const: default diff --git a/Documentation/devicetree/bindings/mmc/fsl-imx-mmc.yaml b/Documentation/devicetree/bindings/mmc/fsl-imx-mmc.yaml index 221f5bc047bd..7911316fbd6a 100644 --- a/Documentation/devicetree/bindings/mmc/fsl-imx-mmc.yaml +++ b/Documentation/devicetree/bindings/mmc/fsl-imx-mmc.yaml @@ -24,6 +24,14 @@ properties: reg: maxItems: 1 + clocks: + maxItems: 2 + + clock-names: + items: + - const: ipg + - const: per + interrupts: maxItems: 1 @@ -34,6 +42,8 @@ properties: const: rx-tx required: + - clocks + - clock-names - compatible - reg - interrupts @@ -46,6 +56,8 @@ examples: compatible = "fsl,imx27-mmc", "fsl,imx21-mmc"; reg = <0x10014000 0x1000>; interrupts = <11>; + clocks = <&clks 29>, <&clks 60>; + clock-names = "ipg", "per"; dmas = <&dma 7>; dma-names = "rx-tx"; bus-width = <4>; diff --git a/Documentation/devicetree/bindings/mmc/hi3798cv200-dw-mshc.txt b/Documentation/devicetree/bindings/mmc/hi3798cv200-dw-mshc.txt deleted file mode 100644 index a0693b7145f2..000000000000 --- a/Documentation/devicetree/bindings/mmc/hi3798cv200-dw-mshc.txt +++ /dev/null @@ -1,40 +0,0 @@ -* Hisilicon Hi3798CV200 specific extensions to the Synopsys Designware Mobile - Storage Host Controller - -Read synopsys-dw-mshc.txt for more details - -The Synopsys designware mobile storage host controller is used to interface -a SoC with storage medium such as eMMC or SD/MMC cards. This file documents -differences between the core Synopsys dw mshc controller properties described -by synopsys-dw-mshc.txt and the properties used by the Hisilicon Hi3798CV200 -specific extensions to the Synopsys Designware Mobile Storage Host Controller. - -Required Properties: -- compatible: Should contain "hisilicon,hi3798cv200-dw-mshc". -- clocks: A list of phandle + clock-specifier pairs for the clocks listed - in clock-names. -- clock-names: Should contain the following: - "ciu" - The ciu clock described in synopsys-dw-mshc.txt. - "biu" - The biu clock described in synopsys-dw-mshc.txt. - "ciu-sample" - Hi3798CV200 extended phase clock for ciu sampling. - "ciu-drive" - Hi3798CV200 extended phase clock for ciu driving. - -Example: - - emmc: mmc@9830000 { - compatible = "hisilicon,hi3798cv200-dw-mshc"; - reg = <0x9830000 0x10000>; - interrupts = <GIC_SPI 35 IRQ_TYPE_LEVEL_HIGH>; - clocks = <&crg HISTB_MMC_CIU_CLK>, - <&crg HISTB_MMC_BIU_CLK>, - <&crg HISTB_MMC_SAMPLE_CLK>, - <&crg HISTB_MMC_DRV_CLK>; - clock-names = "ciu", "biu", "ciu-sample", "ciu-drive"; - fifo-depth = <256>; - clock-frequency = <200000000>; - cap-mmc-highspeed; - mmc-ddr-1_8v; - mmc-hs200-1_8v; - non-removable; - bus-width = <8>; - }; diff --git a/Documentation/devicetree/bindings/mmc/hisilicon,hi3798cv200-dw-mshc.yaml b/Documentation/devicetree/bindings/mmc/hisilicon,hi3798cv200-dw-mshc.yaml new file mode 100644 index 000000000000..41c9b22523e7 --- /dev/null +++ b/Documentation/devicetree/bindings/mmc/hisilicon,hi3798cv200-dw-mshc.yaml @@ -0,0 +1,97 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/mmc/hisilicon,hi3798cv200-dw-mshc.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Hisilicon HiSTB SoCs specific extensions to the Synopsys DWMMC controller + +maintainers: + - Yang Xiwen <forbidden405@outlook.com> + +properties: + compatible: + enum: + - hisilicon,hi3798cv200-dw-mshc + - hisilicon,hi3798mv200-dw-mshc + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + items: + - description: bus interface unit clock + - description: card interface unit clock + - description: card input sample phase clock + - description: controller output drive phase clock + + clock-names: + items: + - const: ciu + - const: biu + - const: ciu-sample + - const: ciu-drive + + hisilicon,sap-dll-reg: + $ref: /schemas/types.yaml#/definitions/phandle-array + description: | + DWMMC core on Hi3798MV2x SoCs has a delay-locked-loop(DLL) attached to card data input path. + It is integrated into CRG core on the SoC and has to be controlled during tuning. + items: + - description: A phandle pointed to the CRG syscon node + - description: Sample DLL register offset in CRG address space + +required: + - compatible + - reg + - interrupts + - clocks + - clock-names + +allOf: + - $ref: synopsys-dw-mshc-common.yaml# + + - if: + properties: + compatible: + contains: + const: hisilicon,hi3798mv200-dw-mshc + then: + required: + - hisilicon,sap-dll-reg + else: + properties: + hisilicon,sap-dll-reg: false + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/clock/histb-clock.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + + mmc@9830000 { + compatible = "hisilicon,hi3798cv200-dw-mshc"; + reg = <0x9830000 0x10000>; + interrupts = <GIC_SPI 35 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&crg HISTB_MMC_CIU_CLK>, + <&crg HISTB_MMC_BIU_CLK>, + <&crg HISTB_MMC_SAMPLE_CLK>, + <&crg HISTB_MMC_DRV_CLK>; + clock-names = "ciu", "biu", "ciu-sample", "ciu-drive"; + resets = <&crg 0xa0 4>; + reset-names = "reset"; + pinctrl-names = "default"; + pinctrl-0 = <&emmc_pins_1 &emmc_pins_2 + &emmc_pins_3 &emmc_pins_4>; + fifo-depth = <256>; + clock-frequency = <200000000>; + cap-mmc-highspeed; + mmc-ddr-1_8v; + mmc-hs200-1_8v; + non-removable; + bus-width = <8>; + }; diff --git a/Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml b/Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml index f7a4c6bc70f6..29f2400247eb 100644 --- a/Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml +++ b/Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml @@ -67,6 +67,7 @@ properties: - renesas,sdhi-r8a779a0 # R-Car V3U - renesas,sdhi-r8a779f0 # R-Car S4-8 - renesas,sdhi-r8a779g0 # R-Car V4H + - renesas,sdhi-r8a779h0 # R-Car V4M - const: renesas,rcar-gen4-sdhi # R-Car Gen4 reg: diff --git a/Documentation/devicetree/bindings/mmc/snps,dwcmshc-sdhci.yaml b/Documentation/devicetree/bindings/mmc/snps,dwcmshc-sdhci.yaml index 42804d955293..4d3031d9965f 100644 --- a/Documentation/devicetree/bindings/mmc/snps,dwcmshc-sdhci.yaml +++ b/Documentation/devicetree/bindings/mmc/snps,dwcmshc-sdhci.yaml @@ -19,6 +19,8 @@ properties: - rockchip,rk3568-dwcmshc - rockchip,rk3588-dwcmshc - snps,dwcmshc-sdhci + - sophgo,cv1800b-dwcmshc + - sophgo,sg2002-dwcmshc - thead,th1520-dwcmshc reg: diff --git a/Documentation/devicetree/bindings/net/brcm,asp-v2.0.yaml b/Documentation/devicetree/bindings/net/brcm,asp-v2.0.yaml index 75d8138298fb..660e2ca42daf 100644 --- a/Documentation/devicetree/bindings/net/brcm,asp-v2.0.yaml +++ b/Documentation/devicetree/bindings/net/brcm,asp-v2.0.yaml @@ -17,6 +17,10 @@ properties: oneOf: - items: - enum: + - brcm,bcm74165b0-asp + - const: brcm,asp-v2.2 + - items: + - enum: - brcm,bcm74165-asp - const: brcm,asp-v2.1 - items: diff --git a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml index 6684810fcbf0..23dfe0838dca 100644 --- a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml +++ b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml @@ -24,6 +24,7 @@ properties: - brcm,genet-mdio-v5 - brcm,asp-v2.0-mdio - brcm,asp-v2.1-mdio + - brcm,asp-v2.2-mdio - brcm,unimac-mdio reg: diff --git a/Documentation/devicetree/bindings/net/can/tcan4x5x.txt b/Documentation/devicetree/bindings/net/can/tcan4x5x.txt index 170e23f0610d..20c0572c9853 100644 --- a/Documentation/devicetree/bindings/net/can/tcan4x5x.txt +++ b/Documentation/devicetree/bindings/net/can/tcan4x5x.txt @@ -28,6 +28,8 @@ Optional properties: available with tcan4552/4553. - device-wake-gpios: Wake up GPIO to wake up the TCAN device. Not available with tcan4552/4553. + - wakeup-source: Leave the chip running when suspended, and configure + the RX interrupt to wake up the device. Example: tcan4x5x: tcan4x5x@0 { @@ -42,4 +44,5 @@ tcan4x5x: tcan4x5x@0 { device-state-gpios = <&gpio3 21 GPIO_ACTIVE_HIGH>; device-wake-gpios = <&gpio1 15 GPIO_ACTIVE_HIGH>; reset-gpios = <&gpio1 27 GPIO_ACTIVE_HIGH>; + wakeup-source; }; diff --git a/Documentation/devicetree/bindings/net/can/xilinx,can.yaml b/Documentation/devicetree/bindings/net/can/xilinx,can.yaml index 64d57c343e6f..8d4e5af6fd6c 100644 --- a/Documentation/devicetree/bindings/net/can/xilinx,can.yaml +++ b/Documentation/devicetree/bindings/net/can/xilinx,can.yaml @@ -49,6 +49,10 @@ properties: resets: maxItems: 1 + xlnx,has-ecc: + $ref: /schemas/types.yaml#/definitions/flag + description: CAN TX_OL, TX_TL and RX FIFOs have ECC support(AXI CAN) + required: - compatible - reg @@ -137,6 +141,7 @@ examples: interrupts = <GIC_SPI 59 IRQ_TYPE_EDGE_RISING>; tx-fifo-depth = <0x40>; rx-fifo-depth = <0x40>; + xlnx,has-ecc; }; - | diff --git a/Documentation/devicetree/bindings/net/cdns,macb.yaml b/Documentation/devicetree/bindings/net/cdns,macb.yaml index bf8894a0257e..2c71e2cf3a2f 100644 --- a/Documentation/devicetree/bindings/net/cdns,macb.yaml +++ b/Documentation/devicetree/bindings/net/cdns,macb.yaml @@ -59,6 +59,11 @@ properties: - cdns,gem # Generic - cdns,macb # Generic + - items: + - enum: + - microchip,sam9x7-gem # Microchip SAM9X7 gigabit ethernet interface + - const: microchip,sama7g5-gem # Microchip SAMA7G5 gigabit ethernet interface + reg: minItems: 1 items: diff --git a/Documentation/devicetree/bindings/net/dsa/ar9331.txt b/Documentation/devicetree/bindings/net/dsa/ar9331.txt deleted file mode 100644 index f824fdae0da2..000000000000 --- a/Documentation/devicetree/bindings/net/dsa/ar9331.txt +++ /dev/null @@ -1,147 +0,0 @@ -Atheros AR9331 built-in switch -============================= - -It is a switch built-in to Atheros AR9331 WiSoC and addressable over internal -MDIO bus. All PHYs are built-in as well. - -Required properties: - - - compatible: should be: "qca,ar9331-switch" - - reg: Address on the MII bus for the switch. - - resets : Must contain an entry for each entry in reset-names. - - reset-names : Must include the following entries: "switch" - - interrupt-parent: Phandle to the parent interrupt controller - - interrupts: IRQ line for the switch - - interrupt-controller: Indicates the switch is itself an interrupt - controller. This is used for the PHY interrupts. - - #interrupt-cells: must be 1 - - mdio: Container of PHY and devices on the switches MDIO bus. - -See Documentation/devicetree/bindings/net/dsa/dsa.txt for a list of additional -required and optional properties. -Examples: - -eth0: ethernet@19000000 { - compatible = "qca,ar9330-eth"; - reg = <0x19000000 0x200>; - interrupts = <4>; - - resets = <&rst 9>, <&rst 22>; - reset-names = "mac", "mdio"; - clocks = <&pll ATH79_CLK_AHB>, <&pll ATH79_CLK_AHB>; - clock-names = "eth", "mdio"; - - phy-mode = "mii"; - phy-handle = <&phy_port4>; -}; - -eth1: ethernet@1a000000 { - compatible = "qca,ar9330-eth"; - reg = <0x1a000000 0x200>; - interrupts = <5>; - resets = <&rst 13>, <&rst 23>; - reset-names = "mac", "mdio"; - clocks = <&pll ATH79_CLK_AHB>, <&pll ATH79_CLK_AHB>; - clock-names = "eth", "mdio"; - - phy-mode = "gmii"; - - fixed-link { - speed = <1000>; - full-duplex; - }; - - mdio { - #address-cells = <1>; - #size-cells = <0>; - - switch10: switch@10 { - #address-cells = <1>; - #size-cells = <0>; - - compatible = "qca,ar9331-switch"; - reg = <0x10>; - resets = <&rst 8>; - reset-names = "switch"; - - interrupt-parent = <&miscintc>; - interrupts = <12>; - - interrupt-controller; - #interrupt-cells = <1>; - - ports { - #address-cells = <1>; - #size-cells = <0>; - - switch_port0: port@0 { - reg = <0x0>; - ethernet = <ð1>; - - phy-mode = "gmii"; - - fixed-link { - speed = <1000>; - full-duplex; - }; - }; - - switch_port1: port@1 { - reg = <0x1>; - phy-handle = <&phy_port0>; - phy-mode = "internal"; - }; - - switch_port2: port@2 { - reg = <0x2>; - phy-handle = <&phy_port1>; - phy-mode = "internal"; - }; - - switch_port3: port@3 { - reg = <0x3>; - phy-handle = <&phy_port2>; - phy-mode = "internal"; - }; - - switch_port4: port@4 { - reg = <0x4>; - phy-handle = <&phy_port3>; - phy-mode = "internal"; - }; - }; - - mdio { - #address-cells = <1>; - #size-cells = <0>; - - interrupt-parent = <&switch10>; - - phy_port0: phy@0 { - reg = <0x0>; - interrupts = <0>; - }; - - phy_port1: phy@1 { - reg = <0x1>; - interrupts = <0>; - }; - - phy_port2: phy@2 { - reg = <0x2>; - interrupts = <0>; - }; - - phy_port3: phy@3 { - reg = <0x3>; - interrupts = <0>; - }; - - phy_port4: phy@4 { - reg = <0x4>; - interrupts = <0>; - }; - }; - }; - }; -}; diff --git a/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml b/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml index c963dc09e8e1..52acc15ebcbf 100644 --- a/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml +++ b/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml @@ -31,6 +31,7 @@ properties: - microchip,ksz9893 - microchip,ksz9563 - microchip,ksz8563 + - microchip,ksz8567 reset-gpios: description: diff --git a/Documentation/devicetree/bindings/net/dsa/qca,ar9331.yaml b/Documentation/devicetree/bindings/net/dsa/qca,ar9331.yaml new file mode 100644 index 000000000000..fd9ddc59d38c --- /dev/null +++ b/Documentation/devicetree/bindings/net/dsa/qca,ar9331.yaml @@ -0,0 +1,161 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/dsa/qca,ar9331.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm Atheros AR9331 built-in switch + +maintainers: + - Oleksij Rempel <o.rempel@pengutronix.de> + +description: + Qualcomm Atheros AR9331 is a switch built-in to Atheros AR9331 WiSoC and + addressable over internal MDIO bus. All PHYs are built-in as well. + +properties: + compatible: + const: qca,ar9331-switch + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + interrupt-controller: true + + '#interrupt-cells': + const: 1 + + mdio: + $ref: /schemas/net/mdio.yaml# + unevaluatedProperties: false + properties: + interrupt-parent: true + + patternProperties: + '(ethernet-)?phy@[0-4]+$': + type: object + unevaluatedProperties: false + + properties: + reg: true + interrupts: + maxItems: 1 + + resets: + maxItems: 1 + + reset-names: + items: + - const: switch + +required: + - compatible + - reg + - interrupts + - interrupt-controller + - '#interrupt-cells' + - mdio + - ports + - resets + - reset-names + +allOf: + - $ref: dsa.yaml#/$defs/ethernet-ports + +unevaluatedProperties: false + +examples: + - | + mdio { + #address-cells = <1>; + #size-cells = <0>; + + switch10: switch@10 { + compatible = "qca,ar9331-switch"; + reg = <0x10>; + + interrupt-parent = <&miscintc>; + interrupts = <12>; + interrupt-controller; + #interrupt-cells = <1>; + + resets = <&rst 8>; + reset-names = "switch"; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0x0>; + ethernet = <ð1>; + + phy-mode = "gmii"; + + fixed-link { + speed = <1000>; + full-duplex; + }; + }; + + port@1 { + reg = <0x1>; + phy-handle = <&phy_port0>; + phy-mode = "internal"; + }; + + port@2 { + reg = <0x2>; + phy-handle = <&phy_port1>; + phy-mode = "internal"; + }; + + port@3 { + reg = <0x3>; + phy-handle = <&phy_port2>; + phy-mode = "internal"; + }; + + port@4 { + reg = <0x4>; + phy-handle = <&phy_port3>; + phy-mode = "internal"; + }; + }; + + mdio { + #address-cells = <1>; + #size-cells = <0>; + + interrupt-parent = <&switch10>; + + phy_port0: ethernet-phy@0 { + reg = <0x0>; + interrupts = <0>; + }; + + phy_port1: ethernet-phy@1 { + reg = <0x1>; + interrupts = <0>; + }; + + phy_port2: ethernet-phy@2 { + reg = <0x2>; + interrupts = <0>; + }; + + phy_port3: ethernet-phy@3 { + reg = <0x3>; + interrupts = <0>; + }; + + phy_port4: ethernet-phy@4 { + reg = <0x4>; + interrupts = <0>; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/dsa/realtek.yaml b/Documentation/devicetree/bindings/net/dsa/realtek.yaml index cce692f57b08..70b6bda3cf98 100644 --- a/Documentation/devicetree/bindings/net/dsa/realtek.yaml +++ b/Documentation/devicetree/bindings/net/dsa/realtek.yaml @@ -59,6 +59,9 @@ properties: description: GPIO to be used to reset the whole device maxItems: 1 + resets: + maxItems: 1 + realtek,disable-leds: type: boolean description: | @@ -127,7 +130,6 @@ else: - mdc-gpios - mdio-gpios - mdio - - reset-gpios required: - compatible diff --git a/Documentation/devicetree/bindings/net/ethernet-controller.yaml b/Documentation/devicetree/bindings/net/ethernet-controller.yaml index d14d123ad7a0..b2785b03139f 100644 --- a/Documentation/devicetree/bindings/net/ethernet-controller.yaml +++ b/Documentation/devicetree/bindings/net/ethernet-controller.yaml @@ -14,7 +14,6 @@ properties: pattern: "^ethernet(@.*)?$" label: - $ref: /schemas/types.yaml#/definitions/string description: Human readable label on a port of a box. local-mac-address: diff --git a/Documentation/devicetree/bindings/net/ethernet-phy-package.yaml b/Documentation/devicetree/bindings/net/ethernet-phy-package.yaml new file mode 100644 index 000000000000..e567101e6f38 --- /dev/null +++ b/Documentation/devicetree/bindings/net/ethernet-phy-package.yaml @@ -0,0 +1,52 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/ethernet-phy-package.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Ethernet PHY Package Common Properties + +maintainers: + - Christian Marangi <ansuelsmth@gmail.com> + +description: + PHY packages are multi-port Ethernet PHY of the same family + and each Ethernet PHY is affected by the global configuration + of the PHY package. + + Each reg of the PHYs defined in the PHY package node is + absolute and describe the real address of the Ethernet PHY on + the MDIO bus. + +properties: + $nodename: + pattern: "^ethernet-phy-package@[a-f0-9]+$" + + reg: + minimum: 0 + maximum: 31 + description: + The base ID number for the PHY package. + Commonly the ID of the first PHY in the PHY package. + + Some PHY in the PHY package might be not defined but + still occupy ID on the device (just not attached to + anything) hence the PHY package reg might correspond + to a not attached PHY (offset 0). + + '#address-cells': + const: 1 + + '#size-cells': + const: 0 + +patternProperties: + ^ethernet-phy@[a-f0-9]+$: + $ref: ethernet-phy.yaml# + +required: + - reg + - '#address-cells' + - '#size-cells' + +additionalProperties: true diff --git a/Documentation/devicetree/bindings/net/fsl,fec.yaml b/Documentation/devicetree/bindings/net/fsl,fec.yaml index 8948a11c994e..5536c06139ca 100644 --- a/Documentation/devicetree/bindings/net/fsl,fec.yaml +++ b/Documentation/devicetree/bindings/net/fsl,fec.yaml @@ -224,6 +224,9 @@ properties: Can be omitted thus no delay is observed. Delay is in range of 1ms to 1000ms. Other delays are invalid. + iommus: + maxItems: 1 + required: - compatible - reg diff --git a/Documentation/devicetree/bindings/net/marvell,prestera.yaml b/Documentation/devicetree/bindings/net/marvell,prestera.yaml index 5ea8b73663a5..16ff892f7bbd 100644 --- a/Documentation/devicetree/bindings/net/marvell,prestera.yaml +++ b/Documentation/devicetree/bindings/net/marvell,prestera.yaml @@ -78,8 +78,8 @@ examples: pcie@0 { #address-cells = <3>; #size-cells = <2>; - ranges = <0x0 0x0 0x0 0x0 0x0 0x0>; - reg = <0x0 0x0 0x0 0x0 0x0 0x0>; + ranges = <0x02000000 0x0 0x100000 0x10000000 0x0 0x0>; + reg = <0x0 0x1000>; device_type = "pci"; switch@0,0 { diff --git a/Documentation/devicetree/bindings/net/nfc/ti,trf7970a.yaml b/Documentation/devicetree/bindings/net/nfc/ti,trf7970a.yaml index 9cc236ec42f2..d0332eb76ad2 100644 --- a/Documentation/devicetree/bindings/net/nfc/ti,trf7970a.yaml +++ b/Documentation/devicetree/bindings/net/nfc/ti,trf7970a.yaml @@ -73,7 +73,7 @@ examples: #include <dt-bindings/gpio/gpio.h> #include <dt-bindings/interrupt-controller/irq.h> - i2c { + spi { #address-cells = <1>; #size-cells = <0>; diff --git a/Documentation/devicetree/bindings/net/qca,qca808x.yaml b/Documentation/devicetree/bindings/net/qca,qca808x.yaml new file mode 100644 index 000000000000..e2552655902a --- /dev/null +++ b/Documentation/devicetree/bindings/net/qca,qca808x.yaml @@ -0,0 +1,54 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/qca,qca808x.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm Atheros QCA808X PHY + +maintainers: + - Christian Marangi <ansuelsmth@gmail.com> + +description: + QCA808X PHYs can have up to 3 LEDs attached. + All 3 LEDs are disabled by default. + 2 LEDs have dedicated pins with the 3rd LED having the + double function of Interrupt LEDs/GPIO or additional LED. + + By default this special PIN is set to LED function. + +allOf: + - $ref: ethernet-phy.yaml# + +properties: + compatible: + enum: + - ethernet-phy-id004d.d101 + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/leds/common.h> + + mdio { + #address-cells = <1>; + #size-cells = <0>; + + ethernet-phy@0 { + compatible = "ethernet-phy-id004d.d101"; + reg = <0>; + + leds { + #address-cells = <1>; + #size-cells = <0>; + + led@0 { + reg = <0>; + color = <LED_COLOR_ID_GREEN>; + function = LED_FUNCTION_WAN; + default-state = "keep"; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/qcom,ethqos.yaml b/Documentation/devicetree/bindings/net/qcom,ethqos.yaml index 7bdb412a0185..69a337c7e345 100644 --- a/Documentation/devicetree/bindings/net/qcom,ethqos.yaml +++ b/Documentation/devicetree/bindings/net/qcom,ethqos.yaml @@ -37,12 +37,14 @@ properties: items: - description: Combined signal for various interrupt events - description: The interrupt that occurs when Rx exits the LPI state + - description: The interrupt that occurs when HW safety error triggered interrupt-names: minItems: 1 items: - const: macirq - - const: eth_lpi + - enum: [eth_lpi, sfty] + - const: sfty clocks: maxItems: 4 @@ -89,8 +91,9 @@ examples: <&gcc GCC_ETH_PTP_CLK>, <&gcc GCC_ETH_RGMII_CLK>; interrupts = <GIC_SPI 56 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 55 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "macirq", "eth_lpi"; + <GIC_SPI 55 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 782 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "macirq", "eth_lpi", "sfty"; rx-fifo-depth = <4096>; tx-fifo-depth = <4096>; diff --git a/Documentation/devicetree/bindings/net/qcom,ipa.yaml b/Documentation/devicetree/bindings/net/qcom,ipa.yaml index c30218684cfe..53cae71d9957 100644 --- a/Documentation/devicetree/bindings/net/qcom,ipa.yaml +++ b/Documentation/devicetree/bindings/net/qcom,ipa.yaml @@ -159,7 +159,7 @@ properties: when the AP (not the modem) performs early initialization. firmware-name: - $ref: /schemas/types.yaml#/definitions/string + maxItems: 1 description: If present, name (or relative path) of the file within the firmware search path containing the firmware image used when diff --git a/Documentation/devicetree/bindings/net/qcom,ipq4019-mdio.yaml b/Documentation/devicetree/bindings/net/qcom,ipq4019-mdio.yaml index 3407e909e8a7..0029e197a825 100644 --- a/Documentation/devicetree/bindings/net/qcom,ipq4019-mdio.yaml +++ b/Documentation/devicetree/bindings/net/qcom,ipq4019-mdio.yaml @@ -44,6 +44,21 @@ properties: items: - const: gcc_mdio_ahb_clk + clock-frequency: + description: + The MDIO bus clock that must be output by the MDIO bus hardware, if + absent, the default hardware values are used. + + MDC rate is feed by an external clock (fixed 100MHz) and is divider + internally. The default divider is /256 resulting in the default rate + applied of 390KHz. + + To follow 802.3 standard that instruct up to 2.5MHz by default, if + this property is not declared and the divider is set to /256, by + default 1.5625Mhz is select. + enum: [ 390625, 781250, 1562500, 3125000, 6250000, 12500000 ] + default: 1562500 + required: - compatible - reg diff --git a/Documentation/devicetree/bindings/net/qcom,qca807x.yaml b/Documentation/devicetree/bindings/net/qcom,qca807x.yaml new file mode 100644 index 000000000000..7290024024f5 --- /dev/null +++ b/Documentation/devicetree/bindings/net/qcom,qca807x.yaml @@ -0,0 +1,184 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/qcom,qca807x.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm QCA807x Ethernet PHY + +maintainers: + - Christian Marangi <ansuelsmth@gmail.com> + - Robert Marko <robert.marko@sartura.hr> + +description: | + Qualcomm QCA8072/5 Ethernet PHY is PHY package of 2 or 5 + IEEE 802.3 clause 22 compliant 10BASE-Te, 100BASE-TX and + 1000BASE-T PHY-s. + + They feature 2 SerDes, one for PSGMII or QSGMII connection with + MAC, while second one is SGMII for connection to MAC or fiber. + + Both models have a combo port that supports 1000BASE-X and + 100BASE-FX fiber. + + Each PHY inside of QCA807x series has 4 digitally controlled + output only pins that natively drive LED-s for up to 2 attached + LEDs. Some vendor also use these 4 output for GPIO usage without + attaching LEDs. + + Note that output pins can be set to drive LEDs OR GPIO, mixed + definition are not accepted. + +$ref: ethernet-phy-package.yaml# + +properties: + compatible: + enum: + - qcom,qca8072-package + - qcom,qca8075-package + + qcom,package-mode: + description: | + PHY package can be configured in 3 mode following this table: + + First Serdes mode Second Serdes mode + Option 1 PSGMII for copper Disabled + ports 0-4 + Option 2 PSGMII for copper 1000BASE-X / 100BASE-FX + ports 0-4 + Option 3 QSGMII for copper SGMII for + ports 0-3 copper port 4 + + PSGMII mode (option 1 or 2) is configured dynamically based on + the presence of a connected SFP device. + $ref: /schemas/types.yaml#/definitions/string + enum: + - qsgmii + - psgmii + default: psgmii + + qcom,tx-drive-strength-milliwatt: + description: set the TX Amplifier value in mv. + $ref: /schemas/types.yaml#/definitions/uint32 + enum: [140, 160, 180, 200, 220, + 240, 260, 280, 300, 320, + 400, 500, 600] + default: 600 + +patternProperties: + ^ethernet-phy@[a-f0-9]+$: + $ref: ethernet-phy.yaml# + + properties: + qcom,dac-full-amplitude: + description: + Set Analog MDI driver amplitude to FULL. + + With this not defined, amplitude is set to DSP. + (amplitude is adjusted based on cable length) + + With this enabled and qcom,dac-full-bias-current + and qcom,dac-disable-bias-current-tweak disabled, + bias current is half. + type: boolean + + qcom,dac-full-bias-current: + description: + Set Analog MDI driver bias current to FULL. + + With this not defined, bias current is set to DSP. + (bias current is adjusted based on cable length) + + Actual bias current might be different with + qcom,dac-disable-bias-current-tweak disabled. + type: boolean + + qcom,dac-disable-bias-current-tweak: + description: | + Set Analog MDI driver bias current to disable tweak + to bias current. + + With this not defined, bias current tweak are enabled + by default. + + With this enabled the following tweak are NOT applied: + - With both FULL amplitude and FULL bias current: bias current + is set to half. + - With only DSP amplitude: bias current is set to half and + is set to 1/4 with cable < 10m. + - With DSP bias current (included both DSP amplitude and + DSP bias current): bias current is half the detected current + with cable < 10m. + type: boolean + + gpio-controller: true + + '#gpio-cells': + const: 2 + + if: + required: + - gpio-controller + then: + properties: + leds: false + + unevaluatedProperties: false + +required: + - compatible + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/leds/common.h> + + mdio { + #address-cells = <1>; + #size-cells = <0>; + + ethernet-phy-package@0 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "qcom,qca8075-package"; + reg = <0>; + + qcom,package-mode = "qsgmii"; + + ethernet-phy@0 { + reg = <0>; + + leds { + #address-cells = <1>; + #size-cells = <0>; + + led@0 { + reg = <0>; + color = <LED_COLOR_ID_GREEN>; + function = LED_FUNCTION_LAN; + default-state = "keep"; + }; + }; + }; + + ethernet-phy@1 { + reg = <1>; + }; + + ethernet-phy@2 { + reg = <2>; + + gpio-controller; + #gpio-cells = <2>; + }; + + ethernet-phy@3 { + reg = <3>; + }; + + ethernet-phy@4 { + reg = <4>; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/renesas,etheravb.yaml b/Documentation/devicetree/bindings/net/renesas,etheravb.yaml index 890f7858d0dc..de7ba7f345a9 100644 --- a/Documentation/devicetree/bindings/net/renesas,etheravb.yaml +++ b/Documentation/devicetree/bindings/net/renesas,etheravb.yaml @@ -46,6 +46,7 @@ properties: - enum: - renesas,etheravb-r8a779a0 # R-Car V3U - renesas,etheravb-r8a779g0 # R-Car V4H + - renesas,etheravb-r8a779h0 # R-Car V4M - const: renesas,etheravb-rcar-gen4 # R-Car Gen4 - items: diff --git a/Documentation/devicetree/bindings/net/renesas,ethertsn.yaml b/Documentation/devicetree/bindings/net/renesas,ethertsn.yaml index 475aff7714d6..ea35d19be829 100644 --- a/Documentation/devicetree/bindings/net/renesas,ethertsn.yaml +++ b/Documentation/devicetree/bindings/net/renesas,ethertsn.yaml @@ -65,9 +65,11 @@ properties: rx-internal-delay-ps: enum: [0, 1800] + default: 0 tx-internal-delay-ps: enum: [0, 2000] + default: 0 '#address-cells': const: 1 diff --git a/Documentation/devicetree/bindings/net/snps,dwmac.yaml b/Documentation/devicetree/bindings/net/snps,dwmac.yaml index 5c2769dc689a..6b0341a8e0ea 100644 --- a/Documentation/devicetree/bindings/net/snps,dwmac.yaml +++ b/Documentation/devicetree/bindings/net/snps,dwmac.yaml @@ -95,6 +95,7 @@ properties: - snps,dwmac-5.20 - snps,dwxgmac - snps,dwxgmac-2.10 + - starfive,jh7100-dwmac - starfive,jh7110-dwmac reg: @@ -107,13 +108,15 @@ properties: - description: Combined signal for various interrupt events - description: The interrupt to manage the remote wake-up packet detection - description: The interrupt that occurs when Rx exits the LPI state + - description: The interrupt that occurs when HW safety error triggered interrupt-names: minItems: 1 items: - const: macirq - - enum: [eth_wake_irq, eth_lpi] - - const: eth_lpi + - enum: [eth_wake_irq, eth_lpi, sfty] + - enum: [eth_wake_irq, eth_lpi, sfty] + - enum: [eth_wake_irq, eth_lpi, sfty] clocks: minItems: 1 @@ -144,10 +147,12 @@ properties: - description: AHB reset reset-names: - minItems: 1 - items: - - const: stmmaceth - - const: ahb + oneOf: + - items: + - enum: [stmmaceth, ahb] + - items: + - const: stmmaceth + - const: ahb power-domains: maxItems: 1 diff --git a/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml b/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml index 5e7cfbbebce6..0d1962980f57 100644 --- a/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml +++ b/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml @@ -16,16 +16,20 @@ select: compatible: contains: enum: + - starfive,jh7100-dwmac - starfive,jh7110-dwmac required: - compatible properties: compatible: - items: - - enum: - - starfive,jh7110-dwmac - - const: snps,dwmac-5.20 + oneOf: + - items: + - const: starfive,jh7100-dwmac + - const: snps,dwmac + - items: + - const: starfive,jh7110-dwmac + - const: snps,dwmac-5.20 reg: maxItems: 1 @@ -46,24 +50,6 @@ properties: - const: tx - const: gtx - interrupts: - minItems: 3 - maxItems: 3 - - interrupt-names: - minItems: 3 - maxItems: 3 - - resets: - items: - - description: MAC Reset signal. - - description: AHB Reset signal. - - reset-names: - items: - - const: stmmaceth - - const: ahb - starfive,tx-use-rgmii-clk: description: Tx clock is provided by external rgmii clock. @@ -94,6 +80,48 @@ required: allOf: - $ref: snps,dwmac.yaml# + - if: + properties: + compatible: + contains: + const: starfive,jh7100-dwmac + then: + properties: + interrupts: + minItems: 2 + maxItems: 2 + + interrupt-names: + minItems: 2 + maxItems: 2 + + resets: + maxItems: 1 + + reset-names: + const: ahb + + - if: + properties: + compatible: + contains: + const: starfive,jh7110-dwmac + then: + properties: + interrupts: + minItems: 3 + maxItems: 3 + + interrupt-names: + minItems: 3 + maxItems: 3 + + resets: + minItems: 2 + + reset-names: + minItems: 2 + unevaluatedProperties: false examples: diff --git a/Documentation/devicetree/bindings/net/ti,cpsw-switch.yaml b/Documentation/devicetree/bindings/net/ti,cpsw-switch.yaml index f07ae3173b03..d5bd93ee4dbb 100644 --- a/Documentation/devicetree/bindings/net/ti,cpsw-switch.yaml +++ b/Documentation/devicetree/bindings/net/ti,cpsw-switch.yaml @@ -7,8 +7,9 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# title: TI SoC Ethernet Switch Controller (CPSW) maintainers: - - Grygorii Strashko <grygorii.strashko@ti.com> - - Sekhar Nori <nsekhar@ti.com> + - Siddharth Vadapalli <s-vadapalli@ti.com> + - Ravi Gunasekaran <r-gunasekaran@ti.com> + - Roger Quadros <rogerq@kernel.org> description: The 3-port switch gigabit ethernet subsystem provides ethernet packet diff --git a/Documentation/devicetree/bindings/net/ti,dp83822.yaml b/Documentation/devicetree/bindings/net/ti,dp83822.yaml index db74474207ed..784866ea392b 100644 --- a/Documentation/devicetree/bindings/net/ti,dp83822.yaml +++ b/Documentation/devicetree/bindings/net/ti,dp83822.yaml @@ -62,6 +62,40 @@ properties: for the PHY. The internal delay for the PHY is fixed to 3.5ns relative to transmit data. + ti,cfg-dac-minus-one-bp: + description: | + DP83826 PHY only. + Sets the voltage ratio (with respect to the nominal value) + of the logical level -1 for the MLT-3 encoded TX data. + enum: [5000, 5625, 6250, 6875, 7500, 8125, 8750, 9375, 10000, + 10625, 11250, 11875, 12500, 13125, 13750, 14375, 15000] + default: 10000 + + ti,cfg-dac-plus-one-bp: + description: | + DP83826 PHY only. + Sets the voltage ratio (with respect to the nominal value) + of the logical level +1 for the MLT-3 encoded TX data. + enum: [5000, 5625, 6250, 6875, 7500, 8125, 8750, 9375, 10000, + 10625, 11250, 11875, 12500, 13125, 13750, 14375, 15000] + default: 10000 + + ti,rmii-mode: + description: | + If present, select the RMII operation mode. Two modes are + available: + - RMII master, where the PHY outputs a 50MHz reference clock which can + be connected to the MAC. + - RMII slave, where the PHY expects a 50MHz reference clock input + shared with the MAC. + The RMII operation mode can also be configured by its straps. + If the strap pin is not set correctly or not set at all, then this can be + used to configure it. + $ref: /schemas/types.yaml#/definitions/string + enum: + - master + - slave + required: - reg diff --git a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml index c9c25132d154..73ed5951d296 100644 --- a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml +++ b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml @@ -7,8 +7,9 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# title: The TI AM654x/J721E/AM642x SoC Gigabit Ethernet MAC (Media Access Controller) maintainers: - - Grygorii Strashko <grygorii.strashko@ti.com> - - Sekhar Nori <nsekhar@ti.com> + - Siddharth Vadapalli <s-vadapalli@ti.com> + - Ravi Gunasekaran <r-gunasekaran@ti.com> + - Roger Quadros <rogerq@kernel.org> description: The TI AM654x/J721E SoC Gigabit Ethernet MAC (CPSW2G NUSS) has two ports diff --git a/Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml b/Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml index 3e910d3b24a0..b1c875325776 100644 --- a/Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml +++ b/Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml @@ -7,8 +7,9 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# title: The TI AM654x/J721E Common Platform Time Sync (CPTS) module maintainers: - - Grygorii Strashko <grygorii.strashko@ti.com> - - Sekhar Nori <nsekhar@ti.com> + - Siddharth Vadapalli <s-vadapalli@ti.com> + - Ravi Gunasekaran <r-gunasekaran@ti.com> + - Roger Quadros <rogerq@kernel.org> description: |+ The TI AM654x/J721E CPTS module is used to facilitate host control of time diff --git a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml index 252207adbc54..eabceb849537 100644 --- a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml +++ b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml @@ -19,9 +19,6 @@ description: | Alternatively, it can specify the wireless part of the MT7628/MT7688 or MT7622/MT7986 SoC. -allOf: - - $ref: ieee80211.yaml# - properties: compatible: enum: @@ -38,7 +35,12 @@ properties: MT7986 should contain 3 regions consys, dcm, and sku, in this order. interrupts: - maxItems: 1 + minItems: 1 + items: + - description: major interrupt for rings + - description: additional interrupt for ring 19 + - description: additional interrupt for ring 4 + - description: additional interrupt for ring 5 power-domains: maxItems: 1 @@ -217,6 +219,24 @@ required: - compatible - reg +allOf: + - $ref: ieee80211.yaml# + - if: + properties: + compatible: + contains: + enum: + - mediatek,mt7981-wmac + - mediatek,mt7986-wmac + then: + properties: + interrupts: + minItems: 4 + else: + properties: + interrupts: + maxItems: 1 + unevaluatedProperties: false examples: @@ -293,7 +313,10 @@ examples: reg = <0x18000000 0x1000000>, <0x10003000 0x1000>, <0x11d10000 0x1000>; - interrupts = <GIC_SPI 213 IRQ_TYPE_LEVEL_HIGH>; + interrupts = <GIC_SPI 213 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 214 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 215 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 216 IRQ_TYPE_LEVEL_HIGH>; clocks = <&topckgen 50>, <&topckgen 62>; clock-names = "mcu", "ap2conn"; diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml index 7758a55dd328..9b3ef4bc3732 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml @@ -8,6 +8,7 @@ title: Qualcomm Technologies ath10k wireless devices maintainers: - Kalle Valo <kvalo@kernel.org> + - Jeff Johnson <jjohnson@kernel.org> description: Qualcomm Technologies, Inc. IEEE 802.11ac devices. diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath11k-pci.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ath11k-pci.yaml index 817f02a8b481..41d023797d7d 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath11k-pci.yaml +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath11k-pci.yaml @@ -9,6 +9,7 @@ title: Qualcomm Technologies ath11k wireless devices (PCIe) maintainers: - Kalle Valo <kvalo@kernel.org> + - Jeff Johnson <jjohnson@kernel.org> description: | Qualcomm Technologies IEEE 802.11ax PCIe devices diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml index 7d5f982a3d09..672282cdfc2f 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml @@ -9,6 +9,7 @@ title: Qualcomm Technologies ath11k wireless devices maintainers: - Kalle Valo <kvalo@kernel.org> + - Jeff Johnson <jjohnson@kernel.org> description: | These are dt entries for Qualcomm Technologies, Inc. IEEE 802.11ax diff --git a/Documentation/devicetree/bindings/opp/opp-v2-base.yaml b/Documentation/devicetree/bindings/opp/opp-v2-base.yaml index e2f8f7af3cf4..b1bb87c865ed 100644 --- a/Documentation/devicetree/bindings/opp/opp-v2-base.yaml +++ b/Documentation/devicetree/bindings/opp/opp-v2-base.yaml @@ -57,8 +57,6 @@ patternProperties: specific binding. minItems: 1 maxItems: 32 - items: - maxItems: 1 opp-microvolt: description: | diff --git a/Documentation/devicetree/bindings/power/qcom,rpmpd.yaml b/Documentation/devicetree/bindings/power/qcom,rpmpd.yaml index 2ff246cf8b81..929b7ef9c1bc 100644 --- a/Documentation/devicetree/bindings/power/qcom,rpmpd.yaml +++ b/Documentation/devicetree/bindings/power/qcom,rpmpd.yaml @@ -24,6 +24,8 @@ properties: - qcom,msm8917-rpmpd - qcom,msm8939-rpmpd - qcom,msm8953-rpmpd + - qcom,msm8974-rpmpd + - qcom,msm8974pro-pma8084-rpmpd - qcom,msm8976-rpmpd - qcom,msm8994-rpmpd - qcom,msm8996-rpmpd diff --git a/Documentation/devicetree/bindings/power/renesas,rcar-sysc.yaml b/Documentation/devicetree/bindings/power/renesas,rcar-sysc.yaml index 0720b54881c2..e76fb273490f 100644 --- a/Documentation/devicetree/bindings/power/renesas,rcar-sysc.yaml +++ b/Documentation/devicetree/bindings/power/renesas,rcar-sysc.yaml @@ -45,6 +45,7 @@ properties: - renesas,r8a779a0-sysc # R-Car V3U - renesas,r8a779f0-sysc # R-Car S4-8 - renesas,r8a779g0-sysc # R-Car V4H + - renesas,r8a779h0-sysc # R-Car V4M reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/pwm/atmel,hlcdc-pwm.yaml b/Documentation/devicetree/bindings/pwm/atmel,hlcdc-pwm.yaml new file mode 100644 index 000000000000..0e92868a2b68 --- /dev/null +++ b/Documentation/devicetree/bindings/pwm/atmel,hlcdc-pwm.yaml @@ -0,0 +1,35 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/pwm/atmel,hlcdc-pwm.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Atmel's HLCDC's PWM controller + +maintainers: + - Nicolas Ferre <nicolas.ferre@microchip.com> + - Alexandre Belloni <alexandre.belloni@bootlin.com> + - Claudiu Beznea <claudiu.beznea@tuxon.dev> + +description: + The LCDC integrates a Pulse Width Modulation (PWM) Controller. This block + generates the LCD contrast control signal (LCD_PWM) that controls the + display's contrast by software. LCDC_PWM is an 8-bit PWM signal that can be + converted to an analog voltage with a simple passive filter. LCD display + panels have different backlight specifications in terms of minimum/maximum + values for PWM frequency. If the LCDC PWM frequency range does not match the + LCD display panel, it is possible to use the standalone PWM Controller to + drive the backlight. + +properties: + compatible: + const: atmel,hlcdc-pwm + + "#pwm-cells": + const: 3 + +required: + - compatible + - "#pwm-cells" + +additionalProperties: false diff --git a/Documentation/devicetree/bindings/pwm/atmel-hlcdc-pwm.txt b/Documentation/devicetree/bindings/pwm/atmel-hlcdc-pwm.txt deleted file mode 100644 index afa501bf7f94..000000000000 --- a/Documentation/devicetree/bindings/pwm/atmel-hlcdc-pwm.txt +++ /dev/null @@ -1,29 +0,0 @@ -Device-Tree bindings for Atmel's HLCDC (High-end LCD Controller) PWM driver - -The Atmel HLCDC PWM is subdevice of the HLCDC MFD device. -See ../mfd/atmel-hlcdc.txt for more details. - -Required properties: - - compatible: value should be one of the following: - "atmel,hlcdc-pwm" - - pinctr-names: the pin control state names. Should contain "default". - - pinctrl-0: should contain the pinctrl states described by pinctrl - default. - - #pwm-cells: should be set to 3. This PWM chip use the default 3 cells - bindings defined in pwm.yaml in this directory. - -Example: - - hlcdc: hlcdc@f0030000 { - compatible = "atmel,sama5d3-hlcdc"; - reg = <0xf0030000 0x2000>; - clocks = <&lcdc_clk>, <&lcdck>, <&clk32k>; - clock-names = "periph_clk","sys_clk", "slow_clk"; - - hlcdc_pwm: hlcdc-pwm { - compatible = "atmel,hlcdc-pwm"; - pinctrl-names = "default"; - pinctrl-0 = <&pinctrl_lcd_pwm>; - #pwm-cells = <3>; - }; - }; diff --git a/Documentation/devicetree/bindings/pwm/marvell,pxa-pwm.yaml b/Documentation/devicetree/bindings/pwm/marvell,pxa-pwm.yaml new file mode 100644 index 000000000000..ba6325575ea0 --- /dev/null +++ b/Documentation/devicetree/bindings/pwm/marvell,pxa-pwm.yaml @@ -0,0 +1,51 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/pwm/marvell,pxa-pwm.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Marvell PXA PWM + +maintainers: + - Duje Mihanović <duje.mihanovic@skole.hr> + +allOf: + - $ref: pwm.yaml# + +properties: + compatible: + enum: + - marvell,pxa250-pwm + - marvell,pxa270-pwm + - marvell,pxa168-pwm + - marvell,pxa910-pwm + + reg: + # Length should be 0x10 + maxItems: 1 + + "#pwm-cells": + # Used for specifying the period length in nanoseconds + const: 1 + + clocks: + maxItems: 1 + +required: + - compatible + - reg + - "#pwm-cells" + - clocks + +additionalProperties: false + +examples: + - | + #include <dt-bindings/clock/pxa-clock.h> + + pwm0: pwm@40b00000 { + compatible = "marvell,pxa250-pwm"; + reg = <0x40b00000 0x10>; + #pwm-cells = <1>; + clocks = <&clks CLK_PWM0>; + }; diff --git a/Documentation/devicetree/bindings/pwm/mediatek,mt2712-pwm.yaml b/Documentation/devicetree/bindings/pwm/mediatek,mt2712-pwm.yaml index 0fbe8a6469eb..a5c308801619 100644 --- a/Documentation/devicetree/bindings/pwm/mediatek,mt2712-pwm.yaml +++ b/Documentation/devicetree/bindings/pwm/mediatek,mt2712-pwm.yaml @@ -24,6 +24,7 @@ properties: - mediatek,mt7629-pwm - mediatek,mt7981-pwm - mediatek,mt7986-pwm + - mediatek,mt7988-pwm - mediatek,mt8183-pwm - mediatek,mt8365-pwm - mediatek,mt8516-pwm diff --git a/Documentation/devicetree/bindings/pwm/pwm-amlogic.yaml b/Documentation/devicetree/bindings/pwm/pwm-amlogic.yaml index 527864a4d855..1d71d4f8f328 100644 --- a/Documentation/devicetree/bindings/pwm/pwm-amlogic.yaml +++ b/Documentation/devicetree/bindings/pwm/pwm-amlogic.yaml @@ -9,9 +9,6 @@ title: Amlogic PWM maintainers: - Heiner Kallweit <hkallweit1@gmail.com> -allOf: - - $ref: pwm.yaml# - properties: compatible: oneOf: @@ -24,31 +21,40 @@ properties: - amlogic,meson-g12a-ee-pwm - amlogic,meson-g12a-ao-pwm-ab - amlogic,meson-g12a-ao-pwm-cd - - amlogic,meson-s4-pwm + deprecated: true - items: - const: amlogic,meson-gx-pwm - const: amlogic,meson-gxbb-pwm + deprecated: true - items: - const: amlogic,meson-gx-ao-pwm - const: amlogic,meson-gxbb-ao-pwm + deprecated: true - items: - const: amlogic,meson8-pwm - const: amlogic,meson8b-pwm + deprecated: true + - enum: + - amlogic,meson8-pwm-v2 + - amlogic,meson-s4-pwm + - items: + - enum: + - amlogic,meson8b-pwm-v2 + - amlogic,meson-gxbb-pwm-v2 + - amlogic,meson-axg-pwm-v2 + - amlogic,meson-g12-pwm-v2 + - const: amlogic,meson8-pwm-v2 reg: maxItems: 1 clocks: minItems: 1 - maxItems: 2 + maxItems: 4 clock-names: - oneOf: - - items: - - enum: [clkin0, clkin1] - - items: - - const: clkin0 - - const: clkin1 + minItems: 1 + maxItems: 2 "#pwm-cells": const: 3 @@ -57,6 +63,79 @@ required: - compatible - reg +allOf: + - $ref: pwm.yaml# + + - if: + properties: + compatible: + contains: + enum: + - amlogic,meson8-pwm + - amlogic,meson8b-pwm + - amlogic,meson-gxbb-pwm + - amlogic,meson-gxbb-ao-pwm + - amlogic,meson-axg-ee-pwm + - amlogic,meson-axg-ao-pwm + - amlogic,meson-g12a-ee-pwm + - amlogic,meson-g12a-ao-pwm-ab + - amlogic,meson-g12a-ao-pwm-cd + then: + # Obsolete historic bindings tied to the driver implementation + # The clocks provided here are meant to be matched with the input + # known (hard-coded) in the driver and used to select pwm clock + # source. Currently, the linux driver ignores this. + # This is kept to maintain ABI backward compatibility. + properties: + clocks: + maxItems: 2 + clock-names: + oneOf: + - items: + - enum: [clkin0, clkin1] + - items: + - const: clkin0 + - const: clkin1 + + # Newer binding where clock describe the actual clock inputs of the pwm + # block. These are necessary but some inputs may be grounded. + - if: + properties: + compatible: + contains: + enum: + - amlogic,meson8-pwm-v2 + then: + properties: + clocks: + minItems: 1 + items: + - description: input clock 0 of the pwm block + - description: input clock 1 of the pwm block + - description: input clock 2 of the pwm block + - description: input clock 3 of the pwm block + clock-names: false + required: + - clocks + + # Newer IP block take a single input per channel, instead of 4 inputs + # for both channels + - if: + properties: + compatible: + contains: + enum: + - amlogic,meson-s4-pwm + then: + properties: + clocks: + items: + - description: input clock of PWM channel A + - description: input clock of PWM channel B + clock-names: false + required: + - clocks + additionalProperties: false examples: @@ -68,3 +147,17 @@ examples: clock-names = "clkin0", "clkin1"; #pwm-cells = <3>; }; + - | + pwm@2000 { + compatible = "amlogic,meson8-pwm-v2"; + reg = <0x1000 0x10>; + clocks = <&xtal>, <0>, <&fdiv4>, <&fdiv5>; + #pwm-cells = <3>; + }; + - | + pwm@1000 { + compatible = "amlogic,meson-s4-pwm"; + reg = <0x1000 0x10>; + clocks = <&pwm_src_a>, <&pwm_src_b>; + #pwm-cells = <3>; + }; diff --git a/Documentation/devicetree/bindings/pwm/pxa-pwm.txt b/Documentation/devicetree/bindings/pwm/pxa-pwm.txt deleted file mode 100644 index 5ae9f1e3c338..000000000000 --- a/Documentation/devicetree/bindings/pwm/pxa-pwm.txt +++ /dev/null @@ -1,30 +0,0 @@ -Marvell PWM controller - -Required properties: -- compatible: should be one or more of: - - "marvell,pxa250-pwm" - - "marvell,pxa270-pwm" - - "marvell,pxa168-pwm" - - "marvell,pxa910-pwm" -- reg: Physical base address and length of the registers used by the PWM channel - Note that one device instance must be created for each PWM that is used, so the - length covers only the register window for one PWM output, not that of the - entire PWM controller. Currently length is 0x10 for all supported devices. -- #pwm-cells: Should be 1. This cell is used to specify the period in - nanoseconds. - -Example PWM device node: - -pwm0: pwm@40b00000 { - compatible = "marvell,pxa250-pwm"; - reg = <0x40b00000 0x10>; - #pwm-cells = <1>; -}; - -Example PWM client node: - -backlight { - compatible = "pwm-backlight"; - pwms = <&pwm0 5000000>; - ... -} diff --git a/Documentation/devicetree/bindings/regulator/gpio-regulator.yaml b/Documentation/devicetree/bindings/regulator/gpio-regulator.yaml index f4c1f36e52e9..a34e85754658 100644 --- a/Documentation/devicetree/bindings/regulator/gpio-regulator.yaml +++ b/Documentation/devicetree/bindings/regulator/gpio-regulator.yaml @@ -47,6 +47,7 @@ properties: 1: HIGH Default is LOW if nothing else is specified. $ref: /schemas/types.yaml#/definitions/uint32-array + minItems: 1 maxItems: 8 items: enum: [0, 1] @@ -57,7 +58,8 @@ properties: regulator and matching GPIO configurations to achieve them. If there are no states in the "states" array, use a fixed regulator instead. $ref: /schemas/types.yaml#/definitions/uint32-matrix - maxItems: 8 + minItems: 2 + maxItems: 256 items: items: - description: Voltage in microvolts diff --git a/Documentation/devicetree/bindings/regulator/infineon,ir38060.yaml b/Documentation/devicetree/bindings/regulator/infineon,ir38060.yaml new file mode 100644 index 000000000000..e6ffbc2a2298 --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/infineon,ir38060.yaml @@ -0,0 +1,45 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/infineon,ir38060.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Infineon Buck Regulators with PMBUS interfaces + +maintainers: + - Not Me. + +allOf: + - $ref: regulator.yaml# + +properties: + compatible: + enum: + - infineon,ir38060 + - infineon,ir38064 + - infineon,ir38164 + - infineon,ir38263 + + reg: + maxItems: 1 + +required: + - compatible + - reg + +unevaluatedProperties: false + +examples: + - | + i2c { + #address-cells = <1>; + #size-cells = <0>; + + regulator@34 { + compatible = "infineon,ir38060"; + reg = <0x34>; + + regulator-min-microvolt = <437500>; + regulator-max-microvolt = <1387500>; + }; + }; diff --git a/Documentation/devicetree/bindings/regulator/mcp16502-regulator.txt b/Documentation/devicetree/bindings/regulator/mcp16502-regulator.txt deleted file mode 100644 index 451cc4e86b01..000000000000 --- a/Documentation/devicetree/bindings/regulator/mcp16502-regulator.txt +++ /dev/null @@ -1,144 +0,0 @@ -MCP16502 PMIC - -Required properties: -- compatible: "microchip,mcp16502" -- reg: I2C slave address -- lpm-gpios: GPIO for LPM pin. Note that this GPIO *must* remain high during - suspend-to-ram, keeping the PMIC into HIBERNATE mode; this - property is optional; -- regulators: A node that houses a sub-node for each regulator within - the device. Each sub-node is identified using the node's - name. The content of each sub-node is defined by the - standard binding for regulators; see regulator.txt. - -Regulators of MCP16502 PMIC: -1) VDD_IO - Buck (1.2 - 3.7 V) -2) VDD_DDR - Buck (0.6 - 1.85 V) -3) VDD_CORE - Buck (0.6 - 1.85 V) -4) VDD_OTHER - BUCK (0.6 - 1.85 V) -5) LDO1 - LDO (1.2 - 3.7 V) -6) LDO2 - LDO (1.2 - 3.7 V) - -Regulator modes: -2 - FPWM: higher precision, higher consumption -4 - AutoPFM: lower precision, lower consumption - -Each regulator is defined using the standard binding for regulators. - -Example: - -mcp16502@5b { - compatible = "microchip,mcp16502"; - reg = <0x5b>; - status = "okay"; - lpm-gpios = <&pioBU 7 GPIO_ACTIVE_HIGH>; - - regulators { - VDD_IO { - regulator-name = "VDD_IO"; - regulator-min-microvolt = <1200000>; - regulator-max-microvolt = <3700000>; - regulator-initial-mode = <2>; - regulator-allowed-modes = <2>, <4>; - regulator-always-on; - - regulator-state-standby { - regulator-on-in-suspend; - regulator-mode = <4>; - }; - - regulator-state-mem { - regulator-off-in-suspend; - regulator-mode = <4>; - }; - }; - - VDD_DDR { - regulator-name = "VDD_DDR"; - regulator-min-microvolt = <600000>; - regulator-max-microvolt = <1850000>; - regulator-initial-mode = <2>; - regulator-allowed-modes = <2>, <4>; - regulator-always-on; - - regulator-state-standby { - regulator-on-in-suspend; - regulator-mode = <4>; - }; - - regulator-state-mem { - regulator-on-in-suspend; - regulator-mode = <4>; - }; - }; - - VDD_CORE { - regulator-name = "VDD_CORE"; - regulator-min-microvolt = <600000>; - regulator-max-microvolt = <1850000>; - regulator-initial-mode = <2>; - regulator-allowed-modes = <2>, <4>; - regulator-always-on; - - regulator-state-standby { - regulator-on-in-suspend; - regulator-mode = <4>; - }; - - regulator-state-mem { - regulator-off-in-suspend; - regulator-mode = <4>; - }; - }; - - VDD_OTHER { - regulator-name = "VDD_OTHER"; - regulator-min-microvolt = <600000>; - regulator-max-microvolt = <1850000>; - regulator-initial-mode = <2>; - regulator-allowed-modes = <2>, <4>; - regulator-always-on; - - regulator-state-standby { - regulator-on-in-suspend; - regulator-mode = <4>; - }; - - regulator-state-mem { - regulator-off-in-suspend; - regulator-mode = <4>; - }; - }; - - LDO1 { - regulator-name = "LDO1"; - regulator-min-microvolt = <1200000>; - regulator-max-microvolt = <3700000>; - regulator-always-on; - - regulator-state-standby { - regulator-on-in-suspend; - }; - - regulator-state-mem { - regulator-off-in-suspend; - }; - }; - - LDO2 { - regulator-name = "LDO2"; - regulator-min-microvolt = <1200000>; - regulator-max-microvolt = <3700000>; - regulator-always-on; - - regulator-state-standby { - regulator-on-in-suspend; - }; - - regulator-state-mem { - regulator-off-in-suspend; - }; - }; - - }; -}; diff --git a/Documentation/devicetree/bindings/regulator/microchip,mcp16502.yaml b/Documentation/devicetree/bindings/regulator/microchip,mcp16502.yaml new file mode 100644 index 000000000000..1aca3646789e --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/microchip,mcp16502.yaml @@ -0,0 +1,180 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/microchip,mcp16502.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: MCP16502 - High-Performance PMIC + +maintainers: + - Andrei Simion <andrei.simion@microchip.com> + +description: + The MCP16502 is an optimally integrated PMIC compatible + with Microchip's eMPUs(Embedded Microprocessor Units), + requiring Dynamic Voltage Scaling (DVS) with the use + of High-Performance mode (HPM). + +properties: + compatible: + const: microchip,mcp16502 + + lpm-gpios: + maxItems: 1 + description: GPIO for LPM pin. + Note that this GPIO must remain high during + suspend-to-ram, keeping the PMIC into HIBERNATE mode. + + reg: + maxItems: 1 + + regulators: + type: object + additionalProperties: false + description: List of regulators and its properties. + + patternProperties: + "^(VDD_(IO|CORE|DDR|OTHER)|LDO[1-2])$": + type: object + $ref: regulator.yaml# + unevaluatedProperties: false + + properties: + regulator-initial-mode: + enum: [2, 4] + default: 2 + description: Initial operating mode + + regulator-allowed-modes: + items: + enum: [2, 4] + description: Supported modes + 2 - FPWM higher precision, higher consumption + 4 - AutoPFM lower precision, lower consumption + +required: + - compatible + - reg + - regulators + +additionalProperties: false + +examples: + - | + i2c { + #address-cells = <1>; + #size-cells = <0>; + + pmic@5b { + compatible = "microchip,mcp16502"; + reg = <0x5b>; + + regulators { + VDD_IO { + regulator-name = "VDD_IO"; + regulator-min-microvolt = <3300000>; + regulator-max-microvolt = <3300000>; + regulator-initial-mode = <2>; + regulator-allowed-modes = <2>, <4>; + regulator-always-on; + + regulator-state-standby { + regulator-on-in-suspend; + regulator-mode = <4>; + }; + + regulator-state-mem { + regulator-off-in-suspend; + regulator-mode = <4>; + }; + }; + + VDD_DDR { + regulator-name = "VDD_DDR"; + regulator-min-microvolt = <1350000>; + regulator-max-microvolt = <1350000>; + regulator-initial-mode = <2>; + regulator-allowed-modes = <2>, <4>; + regulator-always-on; + + regulator-state-standby { + regulator-on-in-suspend; + regulator-mode = <4>; + }; + + regulator-state-mem { + regulator-on-in-suspend; + regulator-mode = <4>; + }; + }; + + VDD_CORE { + regulator-name = "VDD_CORE"; + regulator-min-microvolt = <1150000>; + regulator-max-microvolt = <1150000>; + regulator-initial-mode = <2>; + regulator-allowed-modes = <2>, <4>; + regulator-always-on; + + regulator-state-standby { + regulator-on-in-suspend; + regulator-mode = <4>; + }; + + regulator-state-mem { + regulator-off-in-suspend; + regulator-mode = <4>; + }; + }; + + VDD_OTHER { + regulator-name = "VDD_OTHER"; + regulator-min-microvolt = <1050000>; + regulator-max-microvolt = <1250000>; + regulator-initial-mode = <2>; + regulator-allowed-modes = <2>, <4>; + regulator-always-on; + + regulator-state-standby { + regulator-on-in-suspend; + regulator-mode = <4>; + }; + + regulator-state-mem { + regulator-off-in-suspend; + regulator-mode = <4>; + }; + }; + + LDO1 { + regulator-name = "LDO1"; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + regulator-always-on; + + regulator-state-standby { + regulator-on-in-suspend; + }; + + regulator-state-mem { + regulator-off-in-suspend; + }; + }; + + LDO2 { + regulator-name = "LDO2"; + regulator-min-microvolt = <1200000>; + regulator-max-microvolt = <3700000>; + regulator-always-on; + + regulator-state-standby { + regulator-on-in-suspend; + }; + + regulator-state-mem { + regulator-off-in-suspend; + }; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/regulator/qcom,usb-vbus-regulator.yaml b/Documentation/devicetree/bindings/regulator/qcom,usb-vbus-regulator.yaml index 534f87e98716..8afb40c67af3 100644 --- a/Documentation/devicetree/bindings/regulator/qcom,usb-vbus-regulator.yaml +++ b/Documentation/devicetree/bindings/regulator/qcom,usb-vbus-regulator.yaml @@ -19,8 +19,14 @@ allOf: properties: compatible: - enum: - - qcom,pm8150b-vbus-reg + oneOf: + - enum: + - qcom,pm8150b-vbus-reg + - items: + - enum: + - qcom,pm4125-vbus-reg + - qcom,pm6150-vbus-reg + - const: qcom,pm8150b-vbus-reg reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/regulator/ti,tps65132.yaml b/Documentation/devicetree/bindings/regulator/ti,tps65132.yaml new file mode 100644 index 000000000000..6a6d1a3d6fa7 --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/ti,tps65132.yaml @@ -0,0 +1,84 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/ti,tps65132.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: TI TPS65132 Dual Output Power Regulators + +maintainers: + - devicetree@vger.kernel.org + +description: | + The TPS65132 is designed to supply positive/negative driven applications. + + Datasheet is available at: + https://www.ti.com/lit/gpn/tps65132 + +properties: + compatible: + enum: + - ti,tps65132 + + reg: + maxItems: 1 + +patternProperties: + "^out[pn]$": + type: object + $ref: regulator.yaml# + unevaluatedProperties: false + description: + Properties for single regulator. + + properties: + enable-gpios: + maxItems: 1 + description: + GPIO specifier to enable the GPIO control (on/off) for regulator. + + active-discharge-gpios: + maxItems: 1 + description: + GPIO specifier to actively discharge the delay mechanism. + + ti,active-discharge-time-us: + description: Regulator active discharge time in microseconds. + + dependencies: + active-discharge-gpios: [ 'ti,active-discharge-time-us' ] + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - | + #include <dt-bindings/gpio/gpio.h> + + i2c { + #address-cells = <1>; + #size-cells = <0>; + + regulator@3e { + compatible = "ti,tps65132"; + reg = <0x3e>; + + outp { + regulator-name = "outp"; + regulator-boot-on; + regulator-always-on; + enable-gpios = <&gpio 23 GPIO_ACTIVE_HIGH>; + }; + + outn { + regulator-name = "outn"; + regulator-boot-on; + regulator-always-on; + regulator-active-discharge = <0>; + enable-gpios = <&gpio 40 GPIO_ACTIVE_HIGH>; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/regulator/tps65132-regulator.txt b/Documentation/devicetree/bindings/regulator/tps65132-regulator.txt deleted file mode 100644 index 3a3505520c69..000000000000 --- a/Documentation/devicetree/bindings/regulator/tps65132-regulator.txt +++ /dev/null @@ -1,46 +0,0 @@ -TPS65132 regulators - -Required properties: -- compatible: "ti,tps65132" -- reg: I2C slave address - -Optional Subnode: -Device supports two regulators OUTP and OUTN. A sub node within the - device node describe the properties of these regulators. The sub-node - names must be as follows: - -For regulator outp, the sub node name should be "outp". - -For regulator outn, the sub node name should be "outn". - --enable-gpios:(active high, output) Regulators are controlled by the input pins. - If it is connected to GPIO through host system then provide the - gpio number as per gpio.txt. --active-discharge-gpios: (active high, output) Some configurations use delay mechanisms - on the enable pin, to keep the regulator enabled for some time after - the enable signal goes low. This GPIO is used to actively discharge - the delay mechanism. Requires specification of ti,active-discharge-time-us --ti,active-discharge-time-us: how long the active discharge gpio should be - asserted for during active discharge, in microseconds. - -Each regulator is defined using the standard binding for regulators. - -Example: - - tps65132@3e { - compatible = "ti,tps65132"; - reg = <0x3e>; - - outp { - regulator-name = "outp"; - regulator-boot-on; - regulator-always-on; - enable-gpios = <&gpio 23 0>; - }; - - outn { - regulator-name = "outn"; - regulator-boot-on; - regulator-always-on; - regulator-active-discharge = <0>; - enable-gpios = <&gpio 40 0>; - }; - }; diff --git a/Documentation/devicetree/bindings/reset/renesas,rst.yaml b/Documentation/devicetree/bindings/reset/renesas,rst.yaml index e7e487247751..58b4a45d3380 100644 --- a/Documentation/devicetree/bindings/reset/renesas,rst.yaml +++ b/Documentation/devicetree/bindings/reset/renesas,rst.yaml @@ -50,6 +50,7 @@ properties: - renesas,r8a779a0-rst # R-Car V3U - renesas,r8a779f0-rst # R-Car S4-8 - renesas,r8a779g0-rst # R-Car V4H + - renesas,r8a779h0-rst # R-Car V4M reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/reset/xlnx,zynqmp-reset.yaml b/Documentation/devicetree/bindings/reset/xlnx,zynqmp-reset.yaml index 49db66801429..1f1b42dde94d 100644 --- a/Documentation/devicetree/bindings/reset/xlnx,zynqmp-reset.yaml +++ b/Documentation/devicetree/bindings/reset/xlnx,zynqmp-reset.yaml @@ -7,7 +7,8 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# title: Zynq UltraScale+ MPSoC and Versal reset maintainers: - - Piyush Mehta <piyush.mehta@amd.com> + - Mubin Sayyed <mubin.sayyed@amd.com> + - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> description: | The Zynq UltraScale+ MPSoC and Versal has several different resets. diff --git a/Documentation/devicetree/bindings/soc/imx/fsl,imx8mp-hdmi-blk-ctrl.yaml b/Documentation/devicetree/bindings/soc/imx/fsl,imx8mp-hdmi-blk-ctrl.yaml index 1be4ce2a45e8..bd1cdaa4f54b 100644 --- a/Documentation/devicetree/bindings/soc/imx/fsl,imx8mp-hdmi-blk-ctrl.yaml +++ b/Documentation/devicetree/bindings/soc/imx/fsl,imx8mp-hdmi-blk-ctrl.yaml @@ -27,8 +27,8 @@ properties: const: 1 power-domains: - minItems: 8 - maxItems: 8 + minItems: 10 + maxItems: 10 power-domain-names: items: @@ -40,10 +40,12 @@ properties: - const: trng - const: hdmi-tx - const: hdmi-tx-phy + - const: hdcp + - const: hrv clocks: - minItems: 4 - maxItems: 4 + minItems: 5 + maxItems: 5 clock-names: items: @@ -51,6 +53,7 @@ properties: - const: axi - const: ref_266m - const: ref_24m + - const: fdcc interconnects: maxItems: 3 @@ -82,12 +85,15 @@ examples: clocks = <&clk IMX8MP_CLK_HDMI_APB>, <&clk IMX8MP_CLK_HDMI_ROOT>, <&clk IMX8MP_CLK_HDMI_REF_266M>, - <&clk IMX8MP_CLK_HDMI_24M>; - clock-names = "apb", "axi", "ref_266m", "ref_24m"; + <&clk IMX8MP_CLK_HDMI_24M>, + <&clk IMX8MP_CLK_HDMI_FDCC_TST>; + clock-names = "apb", "axi", "ref_266m", "ref_24m", "fdcc"; power-domains = <&pgc_hdmimix>, <&pgc_hdmimix>, <&pgc_hdmimix>, <&pgc_hdmimix>, <&pgc_hdmimix>, <&pgc_hdmimix>, - <&pgc_hdmimix>, <&pgc_hdmi_phy>; + <&pgc_hdmimix>, <&pgc_hdmi_phy>, + <&pgc_hdmimix>, <&pgc_hdmimix>; power-domain-names = "bus", "irqsteer", "lcdif", "pai", "pvi", "trng", - "hdmi-tx", "hdmi-tx-phy"; + "hdmi-tx", "hdmi-tx-phy", + "hdcp", "hrv"; #power-domain-cells = <1>; }; diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,pbs.yaml b/Documentation/devicetree/bindings/soc/qcom/qcom,pbs.yaml new file mode 100644 index 000000000000..b502ca72266a --- /dev/null +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,pbs.yaml @@ -0,0 +1,46 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/soc/qcom/qcom,pbs.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm Technologies, Inc. Programmable Boot Sequencer + +maintainers: + - Anjelique Melendez <quic_amelende@quicinc.com> + +description: | + The Qualcomm Technologies, Inc. Programmable Boot Sequencer (PBS) + supports triggering power up and power down sequences for clients + upon request. + +properties: + compatible: + items: + - enum: + - qcom,pmi632-pbs + - const: qcom,pbs + + reg: + maxItems: 1 + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - | + #include <dt-bindings/spmi/spmi.h> + + pmic@0 { + reg = <0x0 SPMI_USID>; + #address-cells = <1>; + #size-cells = <0>; + + pbs@7400 { + compatible = "qcom,pmi632-pbs", "qcom,pbs"; + reg = <0x7400>; + }; + }; diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,pmic-glink.yaml b/Documentation/devicetree/bindings/soc/qcom/qcom,pmic-glink.yaml index 61df97ffe1e4..d3f3259ef77d 100644 --- a/Documentation/devicetree/bindings/soc/qcom/qcom,pmic-glink.yaml +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,pmic-glink.yaml @@ -32,6 +32,7 @@ properties: - items: - enum: - qcom,sm8650-pmic-glink + - qcom,x1e80100-pmic-glink - const: qcom,sm8550-pmic-glink - const: qcom,pmic-glink @@ -65,6 +66,7 @@ allOf: enum: - qcom,sm8450-pmic-glink - qcom,sm8550-pmic-glink + - qcom,x1e80100-pmic-glink then: properties: orientation-gpios: false diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,rpm-master-stats.yaml b/Documentation/devicetree/bindings/soc/qcom/qcom,rpm-master-stats.yaml index 031800985b5e..9410404f87f1 100644 --- a/Documentation/devicetree/bindings/soc/qcom/qcom,rpm-master-stats.yaml +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,rpm-master-stats.yaml @@ -35,6 +35,8 @@ properties: description: Phandle to an RPM MSG RAM slice containing the master stats minItems: 1 maxItems: 5 + items: + maxItems: 1 qcom,master-names: $ref: /schemas/types.yaml#/definitions/string-array diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,spm.yaml b/Documentation/devicetree/bindings/soc/qcom/qcom,saw2.yaml index 20c8cd38ff0d..ca4bce817273 100644 --- a/Documentation/devicetree/bindings/soc/qcom/qcom,spm.yaml +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,saw2.yaml @@ -1,23 +1,33 @@ # SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) %YAML 1.2 --- -$id: http://devicetree.org/schemas/soc/qcom/qcom,spm.yaml# +$id: http://devicetree.org/schemas/soc/qcom/qcom,saw2.yaml# $schema: http://devicetree.org/meta-schemas/core.yaml# -title: Qualcomm Subsystem Power Manager +title: Qualcomm Subsystem Power Manager / SPM AVS Wrapper 2 (SAW2) maintainers: - Andy Gross <agross@kernel.org> - Bjorn Andersson <bjorn.andersson@linaro.org> description: | - This binding describes the Qualcomm Subsystem Power Manager, used to control - the peripheral logic surrounding the application cores in Qualcomm platforms. + The Qualcomm Subsystem Power Manager is used to control the peripheral logic + surrounding the application cores in Qualcomm platforms. + + The SAW2 is a wrapper around the Subsystem Power Manager (SPM) and the + Adaptive Voltage Scaling (AVS) hardware. The SPM is a programmable + power-controller that transitions a piece of hardware (like a processor or + subsystem) into and out of low power modes via a direct connection to + the PMIC. It can also be wired up to interact with other processors in the + system, notifying them when a low power state is entered or exited. properties: compatible: items: - enum: + - qcom,ipq4019-saw2-cpu + - qcom,ipq4019-saw2-l2 + - qcom,ipq8064-saw2-cpu - qcom,sdm660-gold-saw2-v4.1-l2 - qcom,sdm660-silver-saw2-v4.1-l2 - qcom,msm8998-gold-saw2-v4.1-l2 @@ -26,16 +36,27 @@ properties: - qcom,msm8916-saw2-v3.0-cpu - qcom,msm8939-saw2-v3.0-cpu - qcom,msm8226-saw2-v2.1-cpu + - qcom,msm8226-saw2-v2.1-l2 + - qcom,msm8960-saw2-cpu - qcom,msm8974-saw2-v2.1-cpu + - qcom,msm8974-saw2-v2.1-l2 - qcom,msm8976-gold-saw2-v2.3-l2 - qcom,msm8976-silver-saw2-v2.3-l2 - qcom,apq8084-saw2-v2.1-cpu + - qcom,apq8084-saw2-v2.1-l2 - qcom,apq8064-saw2-v1.1-cpu - const: qcom,saw2 reg: - description: Base address and size of the SPM register region - maxItems: 1 + items: + - description: Base address and size of the SPM register region + - description: Base address and size of the alias register region + minItems: 1 + + regulator: + $ref: /schemas/regulator/regulator.yaml# + description: Indicates that this SPM device acts as a regulator device + device for the core (CPU or Cache) the SPM is attached to. required: - compatible @@ -82,4 +103,17 @@ examples: reg = <0x17912000 0x1000>; }; + - | + /* + * Example 3: SAW2 with the bundled regulator definition. + */ + power-manager@2089000 { + compatible = "qcom,apq8064-saw2-v1.1-cpu", "qcom,saw2"; + reg = <0x02089000 0x1000>, <0x02009000 0x1000>; + + regulator { + regulator-min-microvolt = <850000>; + regulator-max-microvolt = <1300000>; + }; + }; ... diff --git a/Documentation/devicetree/bindings/soc/renesas/renesas-soc.yaml b/Documentation/devicetree/bindings/soc/renesas/renesas-soc.yaml new file mode 100644 index 000000000000..5ddd31f30f26 --- /dev/null +++ b/Documentation/devicetree/bindings/soc/renesas/renesas-soc.yaml @@ -0,0 +1,73 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/soc/renesas/renesas-soc.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Renesas SoC compatibles naming convention + +maintainers: + - Geert Uytterhoeven <geert+renesas@glider.be> + - Niklas Söderlund <niklas.soderlund@ragnatech.se> + +description: | + Guidelines for new compatibles for SoC blocks/components. + When adding new compatibles in new bindings, use the format:: + renesas,SoC-IP + + For example:: + renesas,r8a77965-csi2 + + When adding new compatibles to existing bindings, use the format in the + existing binding, even if it contradicts the above. + +select: + properties: + compatible: + contains: + pattern: "^renesas,.+-.+$" + required: + - compatible + +properties: + compatible: + minItems: 1 + maxItems: 4 + items: + anyOf: + # Preferred naming style for compatibles of SoC components + - pattern: "^renesas,(emev2|r(7s|8a|9a)[a-z0-9]+|rcar|rmobile|rz[a-z0-9]*|sh(7[a-z0-9]+)?|mobile)-[a-z0-9-]+$" + - pattern: "^renesas,(condor|falcon|gr-peach|gray-hawk|salvator|sk-rz|smar(c(2)?)?|spider|white-hawk)(.*)?$" + + # Legacy compatibles + # + # New compatibles are not allowed. + - pattern: "^renesas,(can|cpg|dmac|du|(g)?ether(avb)?|gpio|hscif|(r)?i[i2]c|imr|intc|ipmmu|irqc|jpu|mmcif|msiof|mtu2|pci(e)?|pfc|pwm|[rq]spi|rcar_sound|sata|scif[ab]*|sdhi|thermal|tmu|tpu|usb(2|hs)?|vin|xhci)-[a-z0-9-]+$" + - pattern: "^renesas,(d|s)?bsc(3)?-(r8a73a4|r8a7740|sh73a0)$" + - pattern: "^renesas,em-(gio|sti|uart)$" + - pattern: "^renesas,fsi2-(r8a7740|sh73a0)$" + - pattern: "^renesas,hspi-r8a777[89]$" + - pattern: "^renesas,sysc-(r8a73a4|r8a7740|rmobile|sh73a0)$" + - enum: + - renesas,imr-lx4 + - renesas,mtu2-r7s72100 + + # None SoC component compatibles + # + # Compatibles with the Renesas vendor prefix that do not relate to any SoC + # component are OK. New compatibles are allowed. + - enum: + - renesas,smp-sram + + # Do not fail compatibles not matching the select pattern + # + # Some SoC components in addition to a Renesas compatible list + # compatibles not related to Renesas. The select pattern for this + # schema hits all compatibles that have at lest one Renesas compatible + # and try to validate all values in that compatible array, allow all + # that don't match the schema select pattern. For example, + # + # compatible = "renesas,r9a07g044-mali", "arm,mali-bifrost"; + - pattern: "^(?!renesas,.+-.+).+$" + +additionalProperties: true diff --git a/Documentation/devicetree/bindings/soc/renesas/renesas.yaml b/Documentation/devicetree/bindings/soc/renesas/renesas.yaml index 16ca3ff7b1ae..c1ce4da2dc32 100644 --- a/Documentation/devicetree/bindings/soc/renesas/renesas.yaml +++ b/Documentation/devicetree/bindings/soc/renesas/renesas.yaml @@ -348,12 +348,25 @@ properties: - renesas,white-hawk-cpu # White Hawk CPU board (RTP8A779G0ASKB0FC0SA000) - const: renesas,r8a779g0 + - description: R-Car V4H (R8A779G2) + items: + - enum: + - renesas,white-hawk-single # White Hawk Single board (RTP8A779G2ASKB0F10SA001) + - const: renesas,r8a779g2 + - const: renesas,r8a779g0 + - items: - enum: - renesas,white-hawk-breakout # White Hawk BreakOut board (RTP8A779G0ASKB0SB0SA000) - const: renesas,white-hawk-cpu - const: renesas,r8a779g0 + - description: R-Car V4M (R8A779H0) + items: + - enum: + - renesas,gray-hawk-single # Gray Hawk Single board (RTP8A779H0ASKB0F10S) + - const: renesas,r8a779h0 + - description: R-Car H3e (R8A779M0) items: - enum: @@ -475,12 +488,6 @@ properties: - renesas,r9a07g054l2 # Dual Cortex-A55 RZ/V2L - const: renesas,r9a07g054 - - description: RZ/V2M (R9A09G011) - items: - - enum: - - renesas,rzv2mevk2 # RZ/V2M Eval Board v2.0 - - const: renesas,r9a09g011 - - description: RZ/G3S (R9A08G045) items: - enum: @@ -500,6 +507,12 @@ properties: - const: renesas,r9a08g045s33 # PCIe support - const: renesas,r9a08g045 + - description: RZ/V2M (R9A09G011) + items: + - enum: + - renesas,rzv2mevk2 # RZ/V2M Eval Board v2.0 + - const: renesas,r9a09g011 + additionalProperties: true ... diff --git a/Documentation/devicetree/bindings/soc/rockchip/grf.yaml b/Documentation/devicetree/bindings/soc/rockchip/grf.yaml index 9793ea6f0fe6..0b87c266760c 100644 --- a/Documentation/devicetree/bindings/soc/rockchip/grf.yaml +++ b/Documentation/devicetree/bindings/soc/rockchip/grf.yaml @@ -22,12 +22,15 @@ properties: - rockchip,rk3568-usb2phy-grf - rockchip,rk3588-bigcore0-grf - rockchip,rk3588-bigcore1-grf + - rockchip,rk3588-hdptxphy-grf - rockchip,rk3588-ioc - rockchip,rk3588-php-grf - rockchip,rk3588-pipe-phy-grf - rockchip,rk3588-sys-grf - rockchip,rk3588-pcie3-phy-grf - rockchip,rk3588-pcie3-pipe-grf + - rockchip,rk3588-usb-grf + - rockchip,rk3588-usbdpphy-grf - rockchip,rk3588-vo-grf - rockchip,rk3588-vop-grf - rockchip,rv1108-usbgrf @@ -66,6 +69,9 @@ properties: reg: maxItems: 1 + clocks: + maxItems: 1 + "#address-cells": const: 1 @@ -248,6 +254,22 @@ allOf: unevaluatedProperties: false + - if: + properties: + compatible: + contains: + enum: + - rockchip,rk3588-vo-grf + + then: + required: + - clocks + + else: + properties: + clocks: false + + examples: - | #include <dt-bindings/clock/rk3399-cru.h> diff --git a/Documentation/devicetree/bindings/soc/samsung/samsung,exynos-sysreg.yaml b/Documentation/devicetree/bindings/soc/samsung/samsung,exynos-sysreg.yaml index 1794e3799f21..c0c6ce8fc786 100644 --- a/Documentation/devicetree/bindings/soc/samsung/samsung,exynos-sysreg.yaml +++ b/Documentation/devicetree/bindings/soc/samsung/samsung,exynos-sysreg.yaml @@ -72,6 +72,8 @@ allOf: compatible: contains: enum: + - google,gs101-peric0-sysreg + - google,gs101-peric1-sysreg - samsung,exynos850-cmgp-sysreg - samsung,exynos850-peri-sysreg - samsung,exynos850-sysreg diff --git a/Documentation/devicetree/bindings/soc/xilinx/xilinx.yaml b/Documentation/devicetree/bindings/soc/xilinx/xilinx.yaml index d4c0fe1fe435..131aba5ed9f4 100644 --- a/Documentation/devicetree/bindings/soc/xilinx/xilinx.yaml +++ b/Documentation/devicetree/bindings/soc/xilinx/xilinx.yaml @@ -117,20 +117,70 @@ properties: - const: xlnx,zynqmp - description: Xilinx Kria SOMs + minItems: 3 items: - - const: xlnx,zynqmp-sm-k26-rev1 - - const: xlnx,zynqmp-sm-k26-revB - - const: xlnx,zynqmp-sm-k26-revA - - const: xlnx,zynqmp-sm-k26 - - const: xlnx,zynqmp + enum: + - xlnx,zynqmp-sm-k26-rev2 + - xlnx,zynqmp-sm-k26-rev1 + - xlnx,zynqmp-sm-k26-revB + - xlnx,zynqmp-sm-k26-revA + - xlnx,zynqmp-sm-k26 + - xlnx,zynqmp + allOf: + - contains: + const: xlnx,zynqmp + - contains: + const: xlnx,zynqmp-sm-k26 - description: Xilinx Kria SOMs (starter) + minItems: 3 items: - - const: xlnx,zynqmp-smk-k26-rev1 - - const: xlnx,zynqmp-smk-k26-revB - - const: xlnx,zynqmp-smk-k26-revA - - const: xlnx,zynqmp-smk-k26 - - const: xlnx,zynqmp + enum: + - xlnx,zynqmp-smk-k26-rev2 + - xlnx,zynqmp-smk-k26-rev1 + - xlnx,zynqmp-smk-k26-revB + - xlnx,zynqmp-smk-k26-revA + - xlnx,zynqmp-smk-k26 + - xlnx,zynqmp + allOf: + - contains: + const: xlnx,zynqmp + - contains: + const: xlnx,zynqmp-smk-k26 + + - description: Xilinx Kria SOM KV260 revA/Y/Z + minItems: 3 + items: + enum: + - xlnx,zynqmp-sk-kv260-revA + - xlnx,zynqmp-sk-kv260-revY + - xlnx,zynqmp-sk-kv260-revZ + - xlnx,zynqmp-sk-kv260 + - xlnx,zynqmp + allOf: + - contains: + const: xlnx,zynqmp-sk-kv260-revA + - contains: + const: xlnx,zynqmp-sk-kv260 + - contains: + const: xlnx,zynqmp + + - description: Xilinx Kria SOM KV260 rev2/1/B + minItems: 3 + items: + enum: + - xlnx,zynqmp-sk-kv260-rev2 + - xlnx,zynqmp-sk-kv260-rev1 + - xlnx,zynqmp-sk-kv260-revB + - xlnx,zynqmp-sk-kv260 + - xlnx,zynqmp + allOf: + - contains: + const: xlnx,zynqmp-sk-kv260-revB + - contains: + const: xlnx,zynqmp-sk-kv260 + - contains: + const: xlnx,zynqmp - description: AMD MicroBlaze V (QEMU) items: diff --git a/Documentation/devicetree/bindings/sound/allwinner,sun4i-a10-spdif.yaml b/Documentation/devicetree/bindings/sound/allwinner,sun4i-a10-spdif.yaml index 8108c564dd78..aa32dc950e72 100644 --- a/Documentation/devicetree/bindings/sound/allwinner,sun4i-a10-spdif.yaml +++ b/Documentation/devicetree/bindings/sound/allwinner,sun4i-a10-spdif.yaml @@ -22,6 +22,7 @@ properties: - const: allwinner,sun6i-a31-spdif - const: allwinner,sun8i-h3-spdif - const: allwinner,sun50i-h6-spdif + - const: allwinner,sun50i-h616-spdif - items: - const: allwinner,sun8i-a83t-spdif - const: allwinner,sun8i-h3-spdif @@ -62,6 +63,8 @@ allOf: enum: - allwinner,sun6i-a31-spdif - allwinner,sun8i-h3-spdif + - allwinner,sun50i-h6-spdif + - allwinner,sun50i-h616-spdif then: required: @@ -73,7 +76,7 @@ allOf: contains: enum: - allwinner,sun8i-h3-spdif - - allwinner,sun50i-h6-spdif + - allwinner,sun50i-h616-spdif then: properties: diff --git a/Documentation/devicetree/bindings/sound/google,sc7280-herobrine.yaml b/Documentation/devicetree/bindings/sound/google,sc7280-herobrine.yaml index ec4b6e547ca6..cdcd7c6f21eb 100644 --- a/Documentation/devicetree/bindings/sound/google,sc7280-herobrine.yaml +++ b/Documentation/devicetree/bindings/sound/google,sc7280-herobrine.yaml @@ -7,7 +7,6 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# title: Google SC7280-Herobrine ASoC sound card driver maintainers: - - Srinivasa Rao Mandadapu <srivasam@codeaurora.org> - Judy Hsiao <judyhsiao@chromium.org> description: diff --git a/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-max9808x.yaml b/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-max9808x.yaml index c29d7942915c..241d20f3aad0 100644 --- a/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-max9808x.yaml +++ b/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-max9808x.yaml @@ -64,7 +64,7 @@ examples: #include <dt-bindings/clock/tegra30-car.h> #include <dt-bindings/soc/tegra-pmc.h> sound { - compatible = "lge,tegra-audio-max98089-p895", + compatible = "lg,tegra-audio-max98089-p895", "nvidia,tegra-audio-max98089"; nvidia,model = "LG Optimus Vu MAX98089"; diff --git a/Documentation/devicetree/bindings/spi/atmel,at91rm9200-spi.yaml b/Documentation/devicetree/bindings/spi/atmel,at91rm9200-spi.yaml index 58367587bfbc..32e7c14033c2 100644 --- a/Documentation/devicetree/bindings/spi/atmel,at91rm9200-spi.yaml +++ b/Documentation/devicetree/bindings/spi/atmel,at91rm9200-spi.yaml @@ -22,7 +22,6 @@ properties: - const: atmel,at91rm9200-spi - items: - const: microchip,sam9x7-spi - - const: microchip,sam9x60-spi - const: atmel,at91rm9200-spi reg: diff --git a/Documentation/devicetree/bindings/spi/samsung,spi.yaml b/Documentation/devicetree/bindings/spi/samsung,spi.yaml index 79da99ca0e53..f681372da81f 100644 --- a/Documentation/devicetree/bindings/spi/samsung,spi.yaml +++ b/Documentation/devicetree/bindings/spi/samsung,spi.yaml @@ -17,11 +17,13 @@ properties: compatible: oneOf: - enum: + - google,gs101-spi - samsung,s3c2443-spi # for S3C2443, S3C2416 and S3C2450 - samsung,s3c6410-spi - samsung,s5pv210-spi # for S5PV210 and S5PC110 - samsung,exynos4210-spi - samsung,exynos5433-spi + - samsung,exynos850-spi - samsung,exynosautov9-spi - tesla,fsd-spi - const: samsung,exynos7-spi @@ -74,8 +76,6 @@ required: - compatible - clocks - clock-names - - dmas - - dma-names - interrupts - reg diff --git a/Documentation/devicetree/bindings/spi/spi-controller.yaml b/Documentation/devicetree/bindings/spi/spi-controller.yaml index 524f6fe8c27b..093150c0cb87 100644 --- a/Documentation/devicetree/bindings/spi/spi-controller.yaml +++ b/Documentation/devicetree/bindings/spi/spi-controller.yaml @@ -69,6 +69,21 @@ properties: Should be generally avoided and be replaced by spi-cs-high + ACTIVE_HIGH. + fifo-depth: + $ref: /schemas/types.yaml#/definitions/uint32 + description: + Size of the RX and TX data FIFOs in bytes. + + rx-fifo-depth: + $ref: /schemas/types.yaml#/definitions/uint32 + description: + Size of the RX data FIFO in bytes. + + tx-fifo-depth: + $ref: /schemas/types.yaml#/definitions/uint32 + description: + Size of the TX data FIFO in bytes. + num-cs: $ref: /schemas/types.yaml#/definitions/uint32 description: @@ -116,6 +131,10 @@ patternProperties: - compatible - reg +dependencies: + rx-fifo-depth: [ tx-fifo-depth ] + tx-fifo-depth: [ rx-fifo-depth ] + allOf: - if: not: @@ -129,6 +148,14 @@ allOf: properties: "#address-cells": const: 0 + - not: + required: + - fifo-depth + - rx-fifo-depth + - not: + required: + - fifo-depth + - tx-fifo-depth additionalProperties: true diff --git a/Documentation/devicetree/bindings/spi/spi-fsl-lpspi.yaml b/Documentation/devicetree/bindings/spi/spi-fsl-lpspi.yaml index 727c5346b8ce..2ff174244795 100644 --- a/Documentation/devicetree/bindings/spi/spi-fsl-lpspi.yaml +++ b/Documentation/devicetree/bindings/spi/spi-fsl-lpspi.yaml @@ -22,6 +22,7 @@ properties: - enum: - fsl,imx8ulp-spi - fsl,imx93-spi + - fsl,imx95-spi - const: fsl,imx7ulp-spi reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/spi/spi-nxp-fspi.yaml b/Documentation/devicetree/bindings/spi/spi-nxp-fspi.yaml index 7fd591145480..4a5f41bde00f 100644 --- a/Documentation/devicetree/bindings/spi/spi-nxp-fspi.yaml +++ b/Documentation/devicetree/bindings/spi/spi-nxp-fspi.yaml @@ -15,12 +15,18 @@ allOf: properties: compatible: - enum: - - nxp,imx8dxl-fspi - - nxp,imx8mm-fspi - - nxp,imx8mp-fspi - - nxp,imx8qxp-fspi - - nxp,lx2160a-fspi + oneOf: + - enum: + - nxp,imx8dxl-fspi + - nxp,imx8mm-fspi + - nxp,imx8mp-fspi + - nxp,imx8qxp-fspi + - nxp,lx2160a-fspi + - items: + - enum: + - nxp,imx93-fspi + - nxp,imx95-fspi + - const: nxp,imx8mm-fspi reg: items: diff --git a/Documentation/devicetree/bindings/sram/allwinner,sun4i-a10-system-control.yaml b/Documentation/devicetree/bindings/sram/allwinner,sun4i-a10-system-control.yaml index a1c96985951f..cf07b8f787a6 100644 --- a/Documentation/devicetree/bindings/sram/allwinner,sun4i-a10-system-control.yaml +++ b/Documentation/devicetree/bindings/sram/allwinner,sun4i-a10-system-control.yaml @@ -56,7 +56,7 @@ properties: ranges: true patternProperties: - "^sram@[a-z0-9]+": + "^sram@[a-f0-9]+": $ref: /schemas/sram/sram.yaml# unevaluatedProperties: false diff --git a/Documentation/devicetree/bindings/tpm/tcg,tpm_tis-spi.yaml b/Documentation/devicetree/bindings/tpm/tcg,tpm_tis-spi.yaml index c3413b47ac3d..6cb2de7cb568 100644 --- a/Documentation/devicetree/bindings/tpm/tcg,tpm_tis-spi.yaml +++ b/Documentation/devicetree/bindings/tpm/tcg,tpm_tis-spi.yaml @@ -20,6 +20,7 @@ properties: compatible: items: - enum: + - atmel,attpm20p - infineon,slb9670 - st,st33htpm-spi - st,st33zp24-spi diff --git a/Documentation/devicetree/bindings/tpm/tpm-common.yaml b/Documentation/devicetree/bindings/tpm/tpm-common.yaml index 90390624a8be..3c1241b2a43f 100644 --- a/Documentation/devicetree/bindings/tpm/tpm-common.yaml +++ b/Documentation/devicetree/bindings/tpm/tpm-common.yaml @@ -42,7 +42,7 @@ properties: resets: description: Reset controller to reset the TPM - $ref: /schemas/types.yaml#/definitions/phandle + maxItems: 1 reset-gpios: description: Output GPIO pin to reset the TPM diff --git a/Documentation/devicetree/bindings/trivial-devices.yaml b/Documentation/devicetree/bindings/trivial-devices.yaml index 79dcd92c4a43..2210964faaf6 100644 --- a/Documentation/devicetree/bindings/trivial-devices.yaml +++ b/Documentation/devicetree/bindings/trivial-devices.yaml @@ -47,6 +47,8 @@ properties: - adi,lt7182s # AMS iAQ-Core VOC Sensor - ams,iaq-core + # Temperature monitoring of Astera Labs PT5161L PCIe retimer + - asteralabs,pt5161l # i2c serial eeprom (24cxx) - at,24c08 # ATSHA204 - i2c h/w symmetric crypto module @@ -129,6 +131,8 @@ properties: - mps,mp2975 # Monolithic Power Systems Inc. multi-phase hot-swap controller mp5990 - mps,mp5990 + # Monolithic Power Systems Inc. synchronous step-down converter mpq8785 + - mps,mpq8785 # Honeywell Humidicon HIH-6130 humidity/temperature sensor - honeywell,hi6130 # IBM Common Form Factor Power Supply Versions (all versions) @@ -139,14 +143,6 @@ properties: - ibm,cffps2 # Infineon IR36021 digital POL buck controller - infineon,ir36021 - # Infineon IR38060 Voltage Regulator - - infineon,ir38060 - # Infineon IR38064 Voltage Regulator - - infineon,ir38064 - # Infineon IR38164 Voltage Regulator - - infineon,ir38164 - # Infineon IR38263 Voltage Regulator - - infineon,ir38263 # Infineon IRPS5401 Voltage Regulator (PMIC) - infineon,irps5401 # Infineon TLV493D-A1B6 I2C 3D Magnetic Sensor diff --git a/Documentation/devicetree/bindings/ufs/samsung,exynos-ufs.yaml b/Documentation/devicetree/bindings/ufs/samsung,exynos-ufs.yaml index 88cc1e3a0c88..b2b509b3944d 100644 --- a/Documentation/devicetree/bindings/ufs/samsung,exynos-ufs.yaml +++ b/Documentation/devicetree/bindings/ufs/samsung,exynos-ufs.yaml @@ -55,9 +55,12 @@ properties: samsung,sysreg: $ref: /schemas/types.yaml#/definitions/phandle-array - description: Should be phandle/offset pair. The phandle to the syscon node - which indicates the FSYSx sysreg interface and the offset of - the control register for UFS io coherency setting. + items: + - items: + - description: phandle to FSYSx sysreg node + - description: offset of the control register for UFS io coherency setting + description: + Phandle and offset to the FSYSx sysreg for UFS io coherency setting. dma-coherent: true diff --git a/Documentation/devicetree/bindings/usb/dwc3-xilinx.yaml b/Documentation/devicetree/bindings/usb/dwc3-xilinx.yaml index bb373eb025a5..00f87a558c7d 100644 --- a/Documentation/devicetree/bindings/usb/dwc3-xilinx.yaml +++ b/Documentation/devicetree/bindings/usb/dwc3-xilinx.yaml @@ -7,7 +7,8 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# title: Xilinx SuperSpeed DWC3 USB SoC controller maintainers: - - Piyush Mehta <piyush.mehta@amd.com> + - Mubin Sayyed <mubin.sayyed@amd.com> + - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> properties: compatible: diff --git a/Documentation/devicetree/bindings/usb/microchip,usb5744.yaml b/Documentation/devicetree/bindings/usb/microchip,usb5744.yaml index 6d4cfd943f58..445183d9d6db 100644 --- a/Documentation/devicetree/bindings/usb/microchip,usb5744.yaml +++ b/Documentation/devicetree/bindings/usb/microchip,usb5744.yaml @@ -16,8 +16,9 @@ description: USB 2.0 traffic. maintainers: - - Piyush Mehta <piyush.mehta@amd.com> - Michal Simek <michal.simek@amd.com> + - Mubin Sayyed <mubin.sayyed@amd.com> + - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> properties: compatible: diff --git a/Documentation/devicetree/bindings/usb/xlnx,usb2.yaml b/Documentation/devicetree/bindings/usb/xlnx,usb2.yaml index 868dffe314bc..a7f75fe36665 100644 --- a/Documentation/devicetree/bindings/usb/xlnx,usb2.yaml +++ b/Documentation/devicetree/bindings/usb/xlnx,usb2.yaml @@ -7,7 +7,8 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# title: Xilinx udc controller maintainers: - - Piyush Mehta <piyush.mehta@amd.com> + - Mubin Sayyed <mubin.sayyed@amd.com> + - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> properties: compatible: diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml b/Documentation/devicetree/bindings/vendor-prefixes.yaml index 1a0dc04f1db4..ace572665cf0 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.yaml +++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml @@ -39,6 +39,8 @@ patternProperties: description: ShenZhen Asia Better Technology Ltd. "^acbel,.*": description: Acbel Polytech Inc. + "^acelink,.*": + description: Acelink Technology Co., Ltd. "^acer,.*": description: Acer Inc. "^acme,.*": @@ -61,6 +63,8 @@ patternProperties: description: Analog Devices, Inc. "^adieng,.*": description: ADI Engineering, Inc. + "^admatec,.*": + description: admatec GmbH "^advantech,.*": description: Advantech Corporation "^aeroflexgaisler,.*": @@ -107,6 +111,8 @@ patternProperties: description: Amlogic, Inc. "^ampere,.*": description: Ampere Computing LLC + "^amphenol,.*": + description: Amphenol Advanced Sensors "^ampire,.*": description: Ampire Co., Ltd. "^ams,.*": @@ -159,6 +165,8 @@ patternProperties: description: ASPEED Technology Inc. "^asrock,.*": description: ASRock Inc. + "^asteralabs,.*": + description: Astera Labs, Inc. "^asus,.*": description: AsusTek Computer Inc. "^atheros,.*": @@ -500,6 +508,8 @@ patternProperties: description: FocalTech Systems Co.,Ltd "^forlinx,.*": description: Baoding Forlinx Embedded Technology Co., Ltd. + "^freebox,.*": + description: Freebox SAS "^freecom,.*": description: Freecom Gmbh "^frida,.*": @@ -719,6 +729,8 @@ patternProperties: description: JetHome (IP Sokolov P.A.) "^jianda,.*": description: Jiandangjing Technology Co., Ltd. + "^jide,.*": + description: Jide Tech "^joz,.*": description: JOZ BV "^kam,.*": @@ -1484,6 +1496,8 @@ patternProperties: description: Ufi Space Co., Ltd. "^ugoos,.*": description: Ugoos Industrial Co., Ltd. + "^uni-t,.*": + description: Uni-Trend Technology (China) Co., Ltd. "^uniwest,.*": description: United Western Technologies Corp (UniWest) "^upisemi,.*": diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst index 6ad72ac6861b..d6f7efefea42 100644 --- a/Documentation/doc-guide/kernel-doc.rst +++ b/Documentation/doc-guide/kernel-doc.rst @@ -341,6 +341,51 @@ Typedefs with function prototypes can also be documented:: */ typedef void (*type_name)(struct v4l2_ctrl *arg1, void *arg2); +Object-like macro documentation +------------------------------- + +Object-like macros are distinct from function-like macros. They are +differentiated by whether the macro name is immediately followed by a +left parenthesis ('(') for function-like macros or not followed by one +for object-like macros. + +Function-like macros are handled like functions by ``scripts/kernel-doc``. +They may have a parameter list. Object-like macros have do not have a +parameter list. + +The general format of an object-like macro kernel-doc comment is:: + + /** + * define object_name - Brief description. + * + * Description of the object. + */ + +Example:: + + /** + * define MAX_ERRNO - maximum errno value that is supported + * + * Kernel pointers have redundant information, so we can use a + * scheme where we can return either an error code or a normal + * pointer with the same return value. + */ + #define MAX_ERRNO 4095 + +Example:: + + /** + * define DRM_GEM_VRAM_PLANE_HELPER_FUNCS - \ + * Initializes struct drm_plane_helper_funcs for VRAM handling + * + * This macro initializes struct drm_plane_helper_funcs to use the + * respective helper functions. + */ + #define DRM_GEM_VRAM_PLANE_HELPER_FUNCS \ + .prepare_fb = drm_gem_vram_plane_helper_prepare_fb, \ + .cleanup_fb = drm_gem_vram_plane_helper_cleanup_fb + + Highlights and cross-references ------------------------------- diff --git a/Documentation/doc-guide/maintainer-profile.rst b/Documentation/doc-guide/maintainer-profile.rst index 755d39f0d407..db3636d0d71d 100644 --- a/Documentation/doc-guide/maintainer-profile.rst +++ b/Documentation/doc-guide/maintainer-profile.rst @@ -27,6 +27,13 @@ documentation and ensure that no new errors or warnings have been introduced. Generating HTML documents and looking at the result will help to avoid unsightly misunderstandings about how things will be rendered. +All new documentation (including additions to existing documents) should +ideally justify who the intended target audience is somewhere in the +changelog; this way, we ensure that the documentation ends up in the correct +place. Some possible categories are: kernel developers (experts or +beginners), userspace programmers, end users and/or system administrators, +and distributors. + Key cycle dates --------------- diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst index 3d125fb4139d..8081ebfe48bc 100644 --- a/Documentation/doc-guide/sphinx.rst +++ b/Documentation/doc-guide/sphinx.rst @@ -48,13 +48,14 @@ or ``virtualenv``, depending on how your distribution packaged Python 3. on the Sphinx version, it should be installed separately, with ``pip install sphinx_rtd_theme``. -In summary, if you want to install Sphinx version 2.4.4, you should do:: +In summary, if you want to install the latest version of Sphinx, you +should do:: - $ virtualenv sphinx_2.4.4 - $ . sphinx_2.4.4/bin/activate - (sphinx_2.4.4) $ pip install -r Documentation/sphinx/requirements.txt + $ virtualenv sphinx_latest + $ . sphinx_latest/bin/activate + (sphinx_latest) $ pip install -r Documentation/sphinx/requirements.txt -After running ``. sphinx_2.4.4/bin/activate``, the prompt will change, +After running ``. sphinx_latest/bin/activate``, the prompt will change, in order to indicate that you're using the new environment. If you open a new shell, you need to rerun this command to enter again at the virtual environment before building the documentation. @@ -63,8 +64,7 @@ Image output ------------ The kernel documentation build system contains an extension that -handles images on both GraphViz and SVG formats (see -:ref:`sphinx_kfigure`). +handles images in both GraphViz and SVG formats (see :ref:`sphinx_kfigure`). For it to work, you need to install both GraphViz and ImageMagick packages. If those packages are not installed, the build system will @@ -108,7 +108,7 @@ further info. Checking for Sphinx dependencies -------------------------------- -There's a script that automatically check for Sphinx dependencies. If it can +There's a script that automatically checks for Sphinx dependencies. If it can recognize your distribution, it will also give a hint about the install command line options for your distro:: @@ -283,7 +283,7 @@ Here are some specific guidelines for the kernel documentation: from highlighting. For a short snippet of code embedded in the text, use \`\`. -the C domain +The C domain ------------ The **Sphinx C Domain** (name c) is suited for documentation of C API. E.g. a diff --git a/Documentation/driver-api/dpll.rst b/Documentation/driver-api/dpll.rst index e3d593841aa7..ea8d16600e16 100644 --- a/Documentation/driver-api/dpll.rst +++ b/Documentation/driver-api/dpll.rst @@ -545,7 +545,7 @@ In such scenario, dpll device input signal shall be also configurable to drive dpll with signal recovered from the PHY netdevice. This is done by exposing a pin to the netdevice - attaching pin to the netdevice itself with -``netdev_dpll_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin)``. +``dpll_netdev_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin)``. Exposed pin id handle ``DPLL_A_PIN_ID`` is then identifiable by the user as it is attached to rtnetlink respond to get ``RTM_NEWLINK`` command in nested attribute ``IFLA_DPLL_PIN``. diff --git a/Documentation/driver-api/driver-model/devres.rst b/Documentation/driver-api/driver-model/devres.rst index c5f99d834ec5..7be8b8dd5f00 100644 --- a/Documentation/driver-api/driver-model/devres.rst +++ b/Documentation/driver-api/driver-model/devres.rst @@ -420,6 +420,7 @@ POWER devm_reboot_mode_unregister() PWM + devm_pwmchip_alloc() devm_pwmchip_add() devm_pwm_get() devm_fwnode_pwm_get() @@ -462,7 +463,7 @@ SLAVE DMA ENGINE SPI devm_spi_alloc_master() devm_spi_alloc_slave() - devm_spi_register_master() + devm_spi_register_controller() WATCHDOG devm_watchdog_register_device() diff --git a/Documentation/driver-api/gpio/consumer.rst b/Documentation/driver-api/gpio/consumer.rst index 3e588b9d678c..ab56ab0dd7a6 100644 --- a/Documentation/driver-api/gpio/consumer.rst +++ b/Documentation/driver-api/gpio/consumer.rst @@ -222,9 +222,9 @@ Use the following calls to access GPIOs from an atomic context:: int gpiod_get_value(const struct gpio_desc *desc); void gpiod_set_value(struct gpio_desc *desc, int value); -The values are boolean, zero for low, nonzero for high. When reading the value -of an output pin, the value returned should be what's seen on the pin. That -won't always match the specified output value, because of issues including +The values are boolean, zero for inactive, nonzero for active. When reading the +value of an output pin, the value returned should be what's seen on the pin. +That won't always match the specified output value, because of issues including open-drain signaling and output latencies. The get/set calls do not return errors because "invalid GPIO" should have been @@ -277,11 +277,11 @@ switch their output to a high impedance value. The consumer should not need to care. (For details read about open drain in driver.rst.) With this, all the gpiod_set_(array)_value_xxx() functions interpret the -parameter "value" as "asserted" ("1") or "de-asserted" ("0"). The physical line +parameter "value" as "active" ("1") or "inactive" ("0"). The physical line level will be driven accordingly. As an example, if the active low property for a dedicated GPIO is set, and the -gpiod_set_(array)_value_xxx() passes "asserted" ("1"), the physical line level +gpiod_set_(array)_value_xxx() passes "active" ("1"), the physical line level will be driven low. To summarize:: diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index eba851605388..f10decc2c14b 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -9,110 +9,141 @@ of device drivers. This document is an only somewhat organized collection of some of those interfaces — it will hopefully get better over time! The available subsections can be seen below. + +General information for driver authors +====================================== + +This section contains documentation that should, at some point or other, be +of interest to most developers working on device drivers. + .. toctree:: - :caption: Table of contents - :maxdepth: 2 + :maxdepth: 1 - driver-model/index basics + driver-model/index + device_link infrastructure ioctl - early-userspace/index pm/index - clk + +Useful support libraries +======================== + +This section contains documentation that should, at some point or other, be +of interest to most developers working on device drivers. + +.. toctree:: + :maxdepth: 1 + + early-userspace/index + connector device-io + devfreq dma-buf - device_link component - message-based - infiniband - aperture - frame-buffer - regulator - reset - iio/index - input - usb/index - firewire - pci/index + io-mapping + io_ordering + uio-howto + vfio-mediated-device + vfio + vfio-pci-device-specific-driver-acceptance + +Bus-level documentation +======================= + +.. toctree:: + :maxdepth: 1 + + auxiliary_bus cxl/index - spi - i2c - ipmb - ipmi + eisa + firewire i3c/index - interconnect - devfreq - hsi - edac - scsi - libata - target - mailbox - mtdnand - miscellaneous - mei/index - mtd/index - mmc/index - nvdimm/index - w1 + isa + men-chameleon-bus + pci/index rapidio/index - s390-drivers + slimbus + usb/index + virtio/index vme + w1 + xillybus + + +Subsystem-specific APIs +======================= + +.. toctree:: + :maxdepth: 1 + 80211/index - uio-howto + acpi/index + backlight/lp855x-driver.rst + clk + console + crypto/index + dmaengine/index + dpll + edac firmware/index - pin-control + fpga/index + frame-buffer + aperture + generic-counter gpio/index + hsi + hte/index + i2c + iio/index + infiniband + input + interconnect + ipmb + ipmi + libata + mailbox md/index media/index + mei/index + memory-devices/index + message-based misc_devices + miscellaneous + mmc/index + mtd/index + mtdnand nfc/index - dmaengine/index - slimbus - soundwire/index - thermal/index - fpga/index - acpi/index - auxiliary_bus - backlight/lp855x-driver.rst - connector - console - eisa - isa - io-mapping - io_ordering - generic-counter - memory-devices/index - men-chameleon-bus ntb + nvdimm/index nvmem parport-lowlevel + phy/index + pin-control + pldmfw/index pps ptp - phy/index pwm - pldmfw/index + regulator + reset rfkill + s390-drivers + scsi serial/index sm501 + soundwire/index + spi surface_aggregator/index switchtec sync_file + target + tee + thermal/index tty/index - vfio-mediated-device - vfio - vfio-pci-device-specific-driver-acceptance - virtio/index + wbrf + wmi xilinx/index - xillybus zorro - hte/index - wmi - dpll - wbrf - crypto/index - tee .. only:: subproject and html diff --git a/Documentation/driver-api/pwm.rst b/Documentation/driver-api/pwm.rst index 3c28ccc4b611..b41b1c56477f 100644 --- a/Documentation/driver-api/pwm.rst +++ b/Documentation/driver-api/pwm.rst @@ -143,11 +143,12 @@ to implement the pwm_*() functions itself. This means that it's impossible to have multiple PWM drivers in the system. For this reason it's mandatory for new drivers to use the generic PWM framework. -A new PWM controller/chip can be added using pwmchip_add() and removed -again with pwmchip_remove(). pwmchip_add() takes a filled in struct -pwm_chip as argument which provides a description of the PWM chip, the -number of PWM devices provided by the chip and the chip-specific -implementation of the supported PWM operations to the framework. +A new PWM controller/chip can be allocated using pwmchip_alloc(), then +registered using pwmchip_add() and removed again with pwmchip_remove(). To undo +pwmchip_alloc() use pwmchip_put(). pwmchip_add() takes a filled in struct +pwm_chip as argument which provides a description of the PWM chip, the number +of PWM devices provided by the chip and the chip-specific implementation of the +supported PWM operations to the framework. When implementing polarity support in a PWM driver, make sure to respect the signal conventions in the PWM framework. By definition, normal polarity diff --git a/Documentation/fault-injection/index.rst b/Documentation/fault-injection/index.rst index 8408a8a91b34..a6ea1d190222 100644 --- a/Documentation/fault-injection/index.rst +++ b/Documentation/fault-injection/index.rst @@ -1,7 +1,7 @@ .. SPDX-License-Identifier: GPL-2.0 =============== -fault-injection +Fault-injection =============== .. toctree:: diff --git a/Documentation/filesystems/files.rst b/Documentation/filesystems/files.rst index 9e38e4c221ca..eb770f891b27 100644 --- a/Documentation/filesystems/files.rst +++ b/Documentation/filesystems/files.rst @@ -116,7 +116,7 @@ before and after the reference count increment. This pattern can be seen in get_file_rcu() and __files_get_rcu(). In addition, it isn't possible to access or check fields in struct file -without first aqcuiring a reference on it under rcu lookup. Not doing +without first acquiring a reference on it under rcu lookup. Not doing that was always very dodgy and it was only usable for non-pointer data in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers either first acquire a reference or they must hold the files_lock of the diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst index e86b886b64d0..04eaab01314b 100644 --- a/Documentation/filesystems/fscrypt.rst +++ b/Documentation/filesystems/fscrypt.rst @@ -338,11 +338,14 @@ Supported modes Currently, the following pairs of encryption modes are supported: -- AES-256-XTS for contents and AES-256-CTS-CBC for filenames +- AES-256-XTS for contents and AES-256-CBC-CTS for filenames - AES-256-XTS for contents and AES-256-HCTR2 for filenames - Adiantum for both contents and filenames -- AES-128-CBC-ESSIV for contents and AES-128-CTS-CBC for filenames -- SM4-XTS for contents and SM4-CTS-CBC for filenames +- AES-128-CBC-ESSIV for contents and AES-128-CBC-CTS for filenames +- SM4-XTS for contents and SM4-CBC-CTS for filenames + +Note: in the API, "CBC" means CBC-ESSIV, and "CTS" means CBC-CTS. +So, for example, FSCRYPT_MODE_AES_256_CTS means AES-256-CBC-CTS. Authenticated encryption modes are not currently supported because of the difficulty of dealing with ciphertext expansion. Therefore, @@ -351,11 +354,11 @@ contents encryption uses a block cipher in `XTS mode `CBC-ESSIV mode <https://en.wikipedia.org/wiki/Disk_encryption_theory#Encrypted_salt-sector_initialization_vector_(ESSIV)>`_, or a wide-block cipher. Filenames encryption uses a -block cipher in `CTS-CBC mode +block cipher in `CBC-CTS mode <https://en.wikipedia.org/wiki/Ciphertext_stealing>`_ or a wide-block cipher. -The (AES-256-XTS, AES-256-CTS-CBC) pair is the recommended default. +The (AES-256-XTS, AES-256-CBC-CTS) pair is the recommended default. It is also the only option that is *guaranteed* to always be supported if the kernel supports fscrypt at all; see `Kernel config options`_. @@ -364,7 +367,7 @@ upgrades the filenames encryption to use a wide-block cipher. (A *wide-block cipher*, also called a tweakable super-pseudorandom permutation, has the property that changing one bit scrambles the entire result.) As described in `Filenames encryption`_, a wide-block -cipher is the ideal mode for the problem domain, though CTS-CBC is the +cipher is the ideal mode for the problem domain, though CBC-CTS is the "least bad" choice among the alternatives. For more information about HCTR2, see `the HCTR2 paper <https://eprint.iacr.org/2021/1441.pdf>`_. @@ -375,13 +378,13 @@ the work is done by XChaCha12, which is much faster than AES when AES acceleration is unavailable. For more information about Adiantum, see `the Adiantum paper <https://eprint.iacr.org/2018/720.pdf>`_. -The (AES-128-CBC-ESSIV, AES-128-CTS-CBC) pair exists only to support +The (AES-128-CBC-ESSIV, AES-128-CBC-CTS) pair exists only to support systems whose only form of AES acceleration is an off-CPU crypto accelerator such as CAAM or CESA that does not support XTS. The remaining mode pairs are the "national pride ciphers": -- (SM4-XTS, SM4-CTS-CBC) +- (SM4-XTS, SM4-CBC-CTS) Generally speaking, these ciphers aren't "bad" per se, but they receive limited security review compared to the usual choices such as @@ -393,7 +396,7 @@ Kernel config options Enabling fscrypt support (CONFIG_FS_ENCRYPTION) automatically pulls in only the basic support from the crypto API needed to use AES-256-XTS -and AES-256-CTS-CBC encryption. For optimal performance, it is +and AES-256-CBC-CTS encryption. For optimal performance, it is strongly recommended to also enable any available platform-specific kconfig options that provide acceleration for the algorithm(s) you wish to use. Support for any "non-default" encryption modes typically @@ -407,7 +410,7 @@ kernel crypto API (see `Inline encryption support`_); in that case, the file contents mode doesn't need to supported in the kernel crypto API, but the filenames mode still does. -- AES-256-XTS and AES-256-CTS-CBC +- AES-256-XTS and AES-256-CBC-CTS - Recommended: - arm64: CONFIG_CRYPTO_AES_ARM64_CE_BLK - x86: CONFIG_CRYPTO_AES_NI_INTEL @@ -433,7 +436,7 @@ API, but the filenames mode still does. - x86: CONFIG_CRYPTO_NHPOLY1305_SSE2 - x86: CONFIG_CRYPTO_NHPOLY1305_AVX2 -- AES-128-CBC-ESSIV and AES-128-CTS-CBC: +- AES-128-CBC-ESSIV and AES-128-CBC-CTS: - Mandatory: - CONFIG_CRYPTO_ESSIV - CONFIG_CRYPTO_SHA256 or another SHA-256 implementation @@ -521,7 +524,7 @@ alternatively has the file's nonce (for `DIRECT_KEY policies`_) or inode number (for `IV_INO_LBLK_64 policies`_) included in the IVs. Thus, IV reuse is limited to within a single directory. -With CTS-CBC, the IV reuse means that when the plaintext filenames share a +With CBC-CTS, the IV reuse means that when the plaintext filenames share a common prefix at least as long as the cipher block size (16 bytes for AES), the corresponding encrypted filenames will also share a common prefix. This is undesirable. Adiantum and HCTR2 do not have this weakness, as they are diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index e18bc5ae3b35..0ea1e44fa028 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -98,7 +98,6 @@ Documentation for filesystem implementations. isofs nilfs2 nfs/index - ntfs ntfs3 ocfs2 ocfs2-online-filecheck diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index d5bf4b6b7509..e664061ed55d 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -29,7 +29,7 @@ prototypes:: char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen); struct vfsmount *(*d_automount)(struct path *path); int (*d_manage)(const struct path *, bool); - struct dentry *(*d_real)(struct dentry *, const struct inode *); + struct dentry *(*d_real)(struct dentry *, enum d_real_type type); locking rules: diff --git a/Documentation/filesystems/ntfs.rst b/Documentation/filesystems/ntfs.rst deleted file mode 100644 index 5bb093a26485..000000000000 --- a/Documentation/filesystems/ntfs.rst +++ /dev/null @@ -1,466 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -================================ -The Linux NTFS filesystem driver -================================ - - -.. Table of contents - - - Overview - - Web site - - Features - - Supported mount options - - Known bugs and (mis-)features - - Using NTFS volume and stripe sets - - The Device-Mapper driver - - The Software RAID / MD driver - - Limitations when using the MD driver - - -Overview -======== - -Linux-NTFS comes with a number of user-space programs known as ntfsprogs. -These include mkntfs, a full-featured ntfs filesystem format utility, -ntfsundelete used for recovering files that were unintentionally deleted -from an NTFS volume and ntfsresize which is used to resize an NTFS partition. -See the web site for more information. - -To mount an NTFS 1.2/3.x (Windows NT4/2000/XP/2003) volume, use the file -system type 'ntfs'. The driver currently supports read-only mode (with no -fault-tolerance, encryption or journalling) and very limited, but safe, write -support. - -For fault tolerance and raid support (i.e. volume and stripe sets), you can -use the kernel's Software RAID / MD driver. See section "Using Software RAID -with NTFS" for details. - - -Web site -======== - -There is plenty of additional information on the linux-ntfs web site -at http://www.linux-ntfs.org/ - -The web site has a lot of additional information, such as a comprehensive -FAQ, documentation on the NTFS on-disk format, information on the Linux-NTFS -userspace utilities, etc. - - -Features -======== - -- This is a complete rewrite of the NTFS driver that used to be in the 2.4 and - earlier kernels. This new driver implements NTFS read support and is - functionally equivalent to the old ntfs driver and it also implements limited - write support. The biggest limitation at present is that files/directories - cannot be created or deleted. See below for the list of write features that - are so far supported. Another limitation is that writing to compressed files - is not implemented at all. Also, neither read nor write access to encrypted - files is so far implemented. -- The new driver has full support for sparse files on NTFS 3.x volumes which - the old driver isn't happy with. -- The new driver supports execution of binaries due to mmap() now being - supported. -- The new driver supports loopback mounting of files on NTFS which is used by - some Linux distributions to enable the user to run Linux from an NTFS - partition by creating a large file while in Windows and then loopback - mounting the file while in Linux and creating a Linux filesystem on it that - is used to install Linux on it. -- A comparison of the two drivers using:: - - time find . -type f -exec md5sum "{}" \; - - run three times in sequence with each driver (after a reboot) on a 1.4GiB - NTFS partition, showed the new driver to be 20% faster in total time elapsed - (from 9:43 minutes on average down to 7:53). The time spent in user space - was unchanged but the time spent in the kernel was decreased by a factor of - 2.5 (from 85 CPU seconds down to 33). -- The driver does not support short file names in general. For backwards - compatibility, we implement access to files using their short file names if - they exist. The driver will not create short file names however, and a - rename will discard any existing short file name. -- The new driver supports exporting of mounted NTFS volumes via NFS. -- The new driver supports async io (aio). -- The new driver supports fsync(2), fdatasync(2), and msync(2). -- The new driver supports readv(2) and writev(2). -- The new driver supports access time updates (including mtime and ctime). -- The new driver supports truncate(2) and open(2) with O_TRUNC. But at present - only very limited support for highly fragmented files, i.e. ones which have - their data attribute split across multiple extents, is included. Another - limitation is that at present truncate(2) will never create sparse files, - since to mark a file sparse we need to modify the directory entry for the - file and we do not implement directory modifications yet. -- The new driver supports write(2) which can both overwrite existing data and - extend the file size so that you can write beyond the existing data. Also, - writing into sparse regions is supported and the holes are filled in with - clusters. But at present only limited support for highly fragmented files, - i.e. ones which have their data attribute split across multiple extents, is - included. Another limitation is that write(2) will never create sparse - files, since to mark a file sparse we need to modify the directory entry for - the file and we do not implement directory modifications yet. - -Supported mount options -======================= - -In addition to the generic mount options described by the manual page for the -mount command (man 8 mount, also see man 5 fstab), the NTFS driver supports the -following mount options: - -======================= ======================================================= -iocharset=name Deprecated option. Still supported but please use - nls=name in the future. See description for nls=name. - -nls=name Character set to use when returning file names. - Unlike VFAT, NTFS suppresses names that contain - unconvertible characters. Note that most character - sets contain insufficient characters to represent all - possible Unicode characters that can exist on NTFS. - To be sure you are not missing any files, you are - advised to use nls=utf8 which is capable of - representing all Unicode characters. - -utf8=<bool> Option no longer supported. Currently mapped to - nls=utf8 but please use nls=utf8 in the future and - make sure utf8 is compiled either as module or into - the kernel. See description for nls=name. - -uid= -gid= -umask= Provide default owner, group, and access mode mask. - These options work as documented in mount(8). By - default, the files/directories are owned by root and - he/she has read and write permissions, as well as - browse permission for directories. No one else has any - access permissions. I.e. the mode on all files is by - default rw------- and for directories rwx------, a - consequence of the default fmask=0177 and dmask=0077. - Using a umask of zero will grant all permissions to - everyone, i.e. all files and directories will have mode - rwxrwxrwx. - -fmask= -dmask= Instead of specifying umask which applies both to - files and directories, fmask applies only to files and - dmask only to directories. - -sloppy=<BOOL> If sloppy is specified, ignore unknown mount options. - Otherwise the default behaviour is to abort mount if - any unknown options are found. - -show_sys_files=<BOOL> If show_sys_files is specified, show the system files - in directory listings. Otherwise the default behaviour - is to hide the system files. - Note that even when show_sys_files is specified, "$MFT" - will not be visible due to bugs/mis-features in glibc. - Further, note that irrespective of show_sys_files, all - files are accessible by name, i.e. you can always do - "ls -l \$UpCase" for example to specifically show the - system file containing the Unicode upcase table. - -case_sensitive=<BOOL> If case_sensitive is specified, treat all file names as - case sensitive and create file names in the POSIX - namespace. Otherwise the default behaviour is to treat - file names as case insensitive and to create file names - in the WIN32/LONG name space. Note, the Linux NTFS - driver will never create short file names and will - remove them on rename/delete of the corresponding long - file name. - Note that files remain accessible via their short file - name, if it exists. If case_sensitive, you will need - to provide the correct case of the short file name. - -disable_sparse=<BOOL> If disable_sparse is specified, creation of sparse - regions, i.e. holes, inside files is disabled for the - volume (for the duration of this mount only). By - default, creation of sparse regions is enabled, which - is consistent with the behaviour of traditional Unix - filesystems. - -errors=opt What to do when critical filesystem errors are found. - Following values can be used for "opt": - - ======== ========================================= - continue DEFAULT, try to clean-up as much as - possible, e.g. marking a corrupt inode as - bad so it is no longer accessed, and then - continue. - recover At present only supported is recovery of - the boot sector from the backup copy. - If read-only mount, the recovery is done - in memory only and not written to disk. - ======== ========================================= - - Note that the options are additive, i.e. specifying:: - - errors=continue,errors=recover - - means the driver will attempt to recover and if that - fails it will clean-up as much as possible and - continue. - -mft_zone_multiplier= Set the MFT zone multiplier for the volume (this - setting is not persistent across mounts and can be - changed from mount to mount but cannot be changed on - remount). Values of 1 to 4 are allowed, 1 being the - default. The MFT zone multiplier determines how much - space is reserved for the MFT on the volume. If all - other space is used up, then the MFT zone will be - shrunk dynamically, so this has no impact on the - amount of free space. However, it can have an impact - on performance by affecting fragmentation of the MFT. - In general use the default. If you have a lot of small - files then use a higher value. The values have the - following meaning: - - ===== ================================= - Value MFT zone size (% of volume size) - ===== ================================= - 1 12.5% - 2 25% - 3 37.5% - 4 50% - ===== ================================= - - Note this option is irrelevant for read-only mounts. -======================= ======================================================= - - -Known bugs and (mis-)features -============================= - -- The link count on each directory inode entry is set to 1, due to Linux not - supporting directory hard links. This may well confuse some user space - applications, since the directory names will have the same inode numbers. - This also speeds up ntfs_read_inode() immensely. And we haven't found any - problems with this approach so far. If you find a problem with this, please - let us know. - - -Please send bug reports/comments/feedback/abuse to the Linux-NTFS development -list at sourceforge: linux-ntfs-dev@lists.sourceforge.net - - -Using NTFS volume and stripe sets -================================= - -For support of volume and stripe sets, you can either use the kernel's -Device-Mapper driver or the kernel's Software RAID / MD driver. The former is -the recommended one to use for linear raid. But the latter is required for -raid level 5. For striping and mirroring, either driver should work fine. - - -The Device-Mapper driver ------------------------- - -You will need to create a table of the components of the volume/stripe set and -how they fit together and load this into the kernel using the dmsetup utility -(see man 8 dmsetup). - -Linear volume sets, i.e. linear raid, has been tested and works fine. Even -though untested, there is no reason why stripe sets, i.e. raid level 0, and -mirrors, i.e. raid level 1 should not work, too. Stripes with parity, i.e. -raid level 5, unfortunately cannot work yet because the current version of the -Device-Mapper driver does not support raid level 5. You may be able to use the -Software RAID / MD driver for raid level 5, see the next section for details. - -To create the table describing your volume you will need to know each of its -components and their sizes in sectors, i.e. multiples of 512-byte blocks. - -For NT4 fault tolerant volumes you can obtain the sizes using fdisk. So for -example if one of your partitions is /dev/hda2 you would do:: - - $ fdisk -ul /dev/hda - - Disk /dev/hda: 81.9 GB, 81964302336 bytes - 255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors - Units = sectors of 1 * 512 = 512 bytes - - Device Boot Start End Blocks Id System - /dev/hda1 * 63 4209029 2104483+ 83 Linux - /dev/hda2 4209030 37768814 16779892+ 86 NTFS - /dev/hda3 37768815 46170809 4200997+ 83 Linux - -And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 = -33559785 sectors. - -For Win2k and later dynamic disks, you can for example use the ldminfo utility -which is part of the Linux LDM tools (the latest version at the time of -writing is linux-ldm-0.0.8.tar.bz2). You can download it from: - - http://www.linux-ntfs.org/ - -Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go -into it (cd linux-ldm-0.0.8) and change to the test directory (cd test). You -will find the precompiled (i386) ldminfo utility there. NOTE: You will not be -able to compile this yourself easily so use the binary version! - -Then you would use ldminfo in dump mode to obtain the necessary information:: - - $ ./ldminfo --dump /dev/hda - -This would dump the LDM database found on /dev/hda which describes all of your -dynamic disks and all the volumes on them. At the bottom you will see the -VOLUME DEFINITIONS section which is all you really need. You may need to look -further above to determine which of the disks in the volume definitions is -which device in Linux. Hint: Run ldminfo on each of your dynamic disks and -look at the Disk Id close to the top of the output for each (the PRIVATE HEADER -section). You can then find these Disk Ids in the VBLK DATABASE section in the -<Disk> components where you will get the LDM Name for the disk that is found in -the VOLUME DEFINITIONS section. - -Note you will also need to enable the LDM driver in the Linux kernel. If your -distribution did not enable it, you will need to recompile the kernel with it -enabled. This will create the LDM partitions on each device at boot time. You -would then use those devices (for /dev/hda they would be /dev/hda1, 2, 3, etc) -in the Device-Mapper table. - -You can also bypass using the LDM driver by using the main device (e.g. -/dev/hda) and then using the offsets of the LDM partitions into this device as -the "Start sector of device" when creating the table. Once again ldminfo would -give you the correct information to do this. - -Assuming you know all your devices and their sizes things are easy. - -For a linear raid the table would look like this (note all values are in -512-byte sectors):: - - # Offset into Size of this Raid type Device Start sector - # volume device of device - 0 1028161 linear /dev/hda1 0 - 1028161 3903762 linear /dev/hdb2 0 - 4931923 2103211 linear /dev/hdc1 0 - -For a striped volume, i.e. raid level 0, you will need to know the chunk size -you used when creating the volume. Windows uses 64kiB as the default, so it -will probably be this unless you changes the defaults when creating the array. - -For a raid level 0 the table would look like this (note all values are in -512-byte sectors):: - - # Offset Size Raid Number Chunk 1st Start 2nd Start - # into of the type of size Device in Device in - # volume volume stripes device device - 0 2056320 striped 2 128 /dev/hda1 0 /dev/hdb1 0 - -If there are more than two devices, just add each of them to the end of the -line. - -Finally, for a mirrored volume, i.e. raid level 1, the table would look like -this (note all values are in 512-byte sectors):: - - # Ofs Size Raid Log Number Region Should Number Source Start Target Start - # in of the type type of log size sync? of Device in Device in - # vol volume params mirrors Device Device - 0 2056320 mirror core 2 16 nosync 2 /dev/hda1 0 /dev/hdb1 0 - -If you are mirroring to multiple devices you can specify further targets at the -end of the line. - -Note the "Should sync?" parameter "nosync" means that the two mirrors are -already in sync which will be the case on a clean shutdown of Windows. If the -mirrors are not clean, you can specify the "sync" option instead of "nosync" -and the Device-Mapper driver will then copy the entirety of the "Source Device" -to the "Target Device" or if you specified multiple target devices to all of -them. - -Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1), -and hand it over to dmsetup to work with, like so:: - - $ dmsetup create myvolume1 /etc/ntfsvolume1 - -You can obviously replace "myvolume1" with whatever name you like. - -If it all worked, you will now have the device /dev/device-mapper/myvolume1 -which you can then just use as an argument to the mount command as usual to -mount the ntfs volume. For example:: - - $ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1 - -(You need to create the directory /mnt/myvol1 first and of course you can use -anything you like instead of /mnt/myvol1 as long as it is an existing -directory.) - -It is advisable to do the mount read-only to see if the volume has been setup -correctly to avoid the possibility of causing damage to the data on the ntfs -volume. - - -The Software RAID / MD driver ------------------------------ - -An alternative to using the Device-Mapper driver is to use the kernel's -Software RAID / MD driver. For which you need to set up your /etc/raidtab -appropriately (see man 5 raidtab). - -Linear volume sets, i.e. linear raid, as well as stripe sets, i.e. raid level -0, have been tested and work fine (though see section "Limitations when using -the MD driver with NTFS volumes" especially if you want to use linear raid). -Even though untested, there is no reason why mirrors, i.e. raid level 1, and -stripes with parity, i.e. raid level 5, should not work, too. - -You have to use the "persistent-superblock 0" option for each raid-disk in the -NTFS volume/stripe you are configuring in /etc/raidtab as the persistent -superblock used by the MD driver would damage the NTFS volume. - -Windows by default uses a stripe chunk size of 64k, so you probably want the -"chunk-size 64k" option for each raid-disk, too. - -For example, if you have a stripe set consisting of two partitions /dev/hda5 -and /dev/hdb1 your /etc/raidtab would look like this:: - - raiddev /dev/md0 - raid-level 0 - nr-raid-disks 2 - nr-spare-disks 0 - persistent-superblock 0 - chunk-size 64k - device /dev/hda5 - raid-disk 0 - device /dev/hdb1 - raid-disk 1 - -For linear raid, just change the raid-level above to "raid-level linear", for -mirrors, change it to "raid-level 1", and for stripe sets with parity, change -it to "raid-level 5". - -Note for stripe sets with parity you will also need to tell the MD driver -which parity algorithm to use by specifying the option "parity-algorithm -which", where you need to replace "which" with the name of the algorithm to -use (see man 5 raidtab for available algorithms) and you will have to try the -different available algorithms until you find one that works. Make sure you -are working read-only when playing with this as you may damage your data -otherwise. If you find which algorithm works please let us know (email the -linux-ntfs developers list linux-ntfs-dev@lists.sourceforge.net or drop in on -IRC in channel #ntfs on the irc.freenode.net network) so we can update this -documentation. - -Once the raidtab is setup, run for example raid0run -a to start all devices or -raid0run /dev/md0 to start a particular md device, in this case /dev/md0. - -Then just use the mount command as usual to mount the ntfs volume using for -example:: - - mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume - -It is advisable to do the mount read-only to see if the md volume has been -setup correctly to avoid the possibility of causing damage to the data on the -ntfs volume. - - -Limitations when using the Software RAID / MD driver ------------------------------------------------------ - -Using the md driver will not work properly if any of your NTFS partitions have -an odd number of sectors. This is especially important for linear raid as all -data after the first partition with an odd number of sectors will be offset by -one or more sectors so if you mount such a partition with write support you -will cause massive damage to the data on the volume which will only become -apparent when you try to use the volume again under Windows. - -So when using linear raid, make sure that all your partitions have an even -number of sectors BEFORE attempting to use it. You have been warned! - -Even better is to simply use the Device-Mapper for linear raid and then you do -not have this problem with odd numbers of sectors. diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst index 1c244866041a..165514401441 100644 --- a/Documentation/filesystems/overlayfs.rst +++ b/Documentation/filesystems/overlayfs.rst @@ -145,7 +145,9 @@ filesystem, an overlay filesystem needs to record in the upper filesystem that files have been removed. This is done using whiteouts and opaque directories (non-directories are always opaque). -A whiteout is created as a character device with 0/0 device number. +A whiteout is created as a character device with 0/0 device number or +as a zero-size regular file with the xattr "trusted.overlay.whiteout". + When a whiteout is found in the upper level of a merged directory, any matching name in the lower level is ignored, and the whiteout itself is also hidden. @@ -154,6 +156,13 @@ A directory is made opaque by setting the xattr "trusted.overlay.opaque" to "y". Where the upper filesystem contains an opaque directory, any directory in the lower filesystem with the same name is ignored. +An opaque directory should not conntain any whiteouts, because they do not +serve any purpose. A merge directory containing regular files with the xattr +"trusted.overlay.whiteout", should be additionally marked by setting the xattr +"trusted.overlay.opaque" to "x" on the merge directory itself. +This is needed to avoid the overhead of checking the "trusted.overlay.whiteout" +on all entries during readdir in the common case. + readdir ------- @@ -534,8 +543,9 @@ A lower dir with a regular whiteout will always be handled by the overlayfs mount, so to support storing an effective whiteout file in an overlayfs mount an alternative form of whiteout is supported. This form is a regular, zero-size file with the "overlay.whiteout" xattr set, inside a directory with the -"overlay.whiteouts" xattr set. Such whiteouts are never created by overlayfs, -but can be used by userspace tools (like containers) that generate lower layers. +"overlay.opaque" xattr set to "x" (see `whiteouts and opaque directories`_). +These alternative whiteouts are never created by overlayfs, but can be used by +userspace tools (like containers) that generate lower layers. These alternative whiteouts can be escaped using the standard xattr escape mechanism in order to properly nest to any depth. diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 104c6d047d9b..c6a6b9df2104 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -1899,8 +1899,8 @@ For more information on mount propagation see: These files provide a method to access a task's comm value. It also allows for a task to set its own or one of its thread siblings comm value. The comm value is limited in size compared to the cmdline value, so writing anything longer -then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated -comm value. +then the kernel's TASK_COMM_LEN (currently 16 chars, including the NUL +terminator) will result in a truncated comm value. 3.7 /proc/<pid>/task/<tid>/children - Information about task children diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index eebcc0f9e2bc..6e903a903f8f 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -1264,7 +1264,7 @@ defined: char *(*d_dname)(struct dentry *, char *, int); struct vfsmount *(*d_automount)(struct path *); int (*d_manage)(const struct path *, bool); - struct dentry *(*d_real)(struct dentry *, const struct inode *); + struct dentry *(*d_real)(struct dentry *, enum d_real_type type); }; ``d_revalidate`` @@ -1419,16 +1419,14 @@ defined: the dentry being transited from. ``d_real`` - overlay/union type filesystems implement this method to return - one of the underlying dentries hidden by the overlay. It is - used in two different modes: + overlay/union type filesystems implement this method to return one + of the underlying dentries of a regular file hidden by the overlay. - Called from file_dentry() it returns the real dentry matching - the inode argument. The real dentry may be from a lower layer - already copied up, but still referenced from the file. This - mode is selected with a non-NULL inode argument. + The 'type' argument takes the values D_REAL_DATA or D_REAL_METADATA + for returning the real underlying dentry that refers to the inode + hosting the file's data or metadata respectively. - With NULL inode the topmost real underlying dentry is returned. + For non-regular files, the 'dentry' argument is returned. Each dentry has a pointer to its parent dentry, as well as a hash list of child dentries. Child dentries are basically like files in a diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst index 352516feef6f..6333697ba3e8 100644 --- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst @@ -1915,19 +1915,13 @@ four of those five higher level data structures. The fifth use case is discussed in the :ref:`realtime summary <rtsummary>` case study. -The most general storage interface supported by the xfile enables the reading -and writing of arbitrary quantities of data at arbitrary offsets in the xfile. -This capability is provided by ``xfile_pread`` and ``xfile_pwrite`` functions, -which behave similarly to their userspace counterparts. XFS is very record-based, which suggests that the ability to load and store complete records is important. -To support these cases, a pair of ``xfile_obj_load`` and ``xfile_obj_store`` -functions are provided to read and persist objects into an xfile. -They are internally the same as pread and pwrite, except that they treat any -error as an out of memory error. -For online repair, squashing error conditions in this manner is an acceptable -behavior because the only reaction is to abort the operation back to userspace. -All five xfile usecases can be serviced by these four functions. +To support these cases, a pair of ``xfile_load`` and ``xfile_store`` +functions are provided to read and persist objects into an xfile that treat any +error as an out of memory error. For online repair, squashing error conditions +in this manner is an acceptable behavior because the only reaction is to abort +the operation back to userspace. However, no discussion of file access idioms is complete without answering the question, "But what about mmap?" @@ -1939,15 +1933,14 @@ tmpfs can only push a pagecache folio to the swap cache if the folio is neither pinned nor locked, which means the xfile must not pin too many folios. Short term direct access to xfile contents is done by locking the pagecache -folio and mapping it into kernel address space. -Programmatic access (e.g. pread and pwrite) uses this mechanism. -Folio locks are not supposed to be held for long periods of time, so long -term direct access to xfile contents is done by bumping the folio refcount, +folio and mapping it into kernel address space. Object load and store uses this +mechanism. Folio locks are not supposed to be held for long periods of time, so +long term direct access to xfile contents is done by bumping the folio refcount, mapping it into kernel address space, and dropping the folio lock. These long term users *must* be responsive to memory reclaim by hooking into the shrinker infrastructure to know when to release folios. -The ``xfile_get_page`` and ``xfile_put_page`` functions are provided to +The ``xfile_get_folio`` and ``xfile_put_folio`` functions are provided to retrieve the (locked) folio that backs part of an xfile and to release it. The only code to use these folio lease functions are the xfarray :ref:`sorting<xfarray_sort>` algorithms and the :ref:`in-memory @@ -2277,13 +2270,12 @@ follows: pointing to the xfile. 3. Pass the buffer cache target, buffer ops, and other information to - ``xfbtree_create`` to write an initial tree header and root block to the - xfile. + ``xfbtree_init`` to initialize the passed in ``struct xfbtree`` and write an + initial root block to the xfile. Each btree type should define a wrapper that passes necessary arguments to the creation function. For example, rmap btrees define ``xfs_rmapbt_mem_create`` to take care of all the necessary details for callers. - A ``struct xfbtree`` object will be returned. 4. Pass the xfbtree object to the btree cursor creation function for the btree type. diff --git a/Documentation/firmware-guide/acpi/index.rst b/Documentation/firmware-guide/acpi/index.rst index b6a42f4ffe03..b246902f523f 100644 --- a/Documentation/firmware-guide/acpi/index.rst +++ b/Documentation/firmware-guide/acpi/index.rst @@ -14,7 +14,6 @@ ACPI Support dsd/phy enumeration osi - method-customizing method-tracing DSD-properties-rules debug diff --git a/Documentation/firmware-guide/acpi/method-customizing.rst b/Documentation/firmware-guide/acpi/method-customizing.rst deleted file mode 100644 index de3ebcaed4cf..000000000000 --- a/Documentation/firmware-guide/acpi/method-customizing.rst +++ /dev/null @@ -1,89 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -======================================= -Linux ACPI Custom Control Method How To -======================================= - -:Author: Zhang Rui <rui.zhang@intel.com> - - -Linux supports customizing ACPI control methods at runtime. - -Users can use this to: - -1. override an existing method which may not work correctly, - or just for debugging purposes. -2. insert a completely new method in order to create a missing - method such as _OFF, _ON, _STA, _INI, etc. - -For these cases, it is far simpler to dynamically install a single -control method rather than override the entire DSDT, because kernel -rebuild/reboot is not needed and test result can be got in minutes. - -.. note:: - - - Only ACPI METHOD can be overridden, any other object types like - "Device", "OperationRegion", are not recognized. Methods - declared inside scope operators are also not supported. - - - The same ACPI control method can be overridden for many times, - and it's always the latest one that used by Linux/kernel. - - - To get the ACPI debug object output (Store (AAAA, Debug)), - please run:: - - echo 1 > /sys/module/acpi/parameters/aml_debug_output - - -1. override an existing method -============================== -a) get the ACPI table via ACPI sysfs I/F. e.g. to get the DSDT, - just run "cat /sys/firmware/acpi/tables/DSDT > /tmp/dsdt.dat" -b) disassemble the table by running "iasl -d dsdt.dat". -c) rewrite the ASL code of the method and save it in a new file, -d) package the new file (psr.asl) to an ACPI table format. - Here is an example of a customized \_SB._AC._PSR method:: - - DefinitionBlock ("", "SSDT", 1, "", "", 0x20080715) - { - Method (\_SB_.AC._PSR, 0, NotSerialized) - { - Store ("In AC _PSR", Debug) - Return (ACON) - } - } - - Note that the full pathname of the method in ACPI namespace - should be used. -e) assemble the file to generate the AML code of the method. - e.g. "iasl -vw 6084 psr.asl" (psr.aml is generated as a result) - If parameter "-vw 6084" is not supported by your iASL compiler, - please try a newer version. -f) mount debugfs by "mount -t debugfs none /sys/kernel/debug" -g) override the old method via the debugfs by running - "cat /tmp/psr.aml > /sys/kernel/debug/acpi/custom_method" - -2. insert a new method -====================== -This is easier than overriding an existing method. -We just need to create the ASL code of the method we want to -insert and then follow the step c) ~ g) in section 1. - -3. undo your changes -==================== -The "undo" operation is not supported for a new inserted method -right now, i.e. we can not remove a method currently. -For an overridden method, in order to undo your changes, please -save a copy of the method original ASL code in step c) section 1, -and redo step c) ~ g) to override the method with the original one. - - -.. note:: We can use a kernel with multiple custom ACPI method running, - But each individual write to debugfs can implement a SINGLE - method override. i.e. if we want to insert/override multiple - ACPI methods, we need to redo step c) ~ g) for multiple times. - -.. note:: Be aware that root can mis-use this driver to modify arbitrary - memory and gain additional rights, if root's privileges got - restricted (for example if root is not allowed to load additional - modules after boot). diff --git a/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv b/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv index 882d2518f8ed..3825f00ca9fe 100644 --- a/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv +++ b/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv @@ -16,6 +16,7 @@ Radeon (RX|TM) (PRO|WX) Vega /MI25 /V320 /V340L /8200 /9100 /SSG MxGPU, VEGA10, AMD Radeon (Pro) VII /MI50 /MI60, VEGA20, DCE 12, 9.4.0, VCE 4.1.0 / UVD 7.2.0, 4.2.0 MI100, ARCTURUS, *, 9.4.1, VCN 2.5.0, 4.2.2 MI200, ALDEBARAN, *, 9.4.2, VCN 2.6.0, 4.4.0 +MI300, AQUA_VANGARAM, *, 9.4.3, VCN 4.0.3, 4.4.2 AMD Radeon (RX|Pro) 5600(M|XT) /5700 (M|XT|XTB) /W5700, NAVI10, DCN 2.0.0, 10.1.10, VCN 2.0.0, 5.0.0 AMD Radeon (Pro) 5300 /5500XTB/5500(XT|M) /W5500M /W5500, NAVI14, DCN 2.0.0, 10.1.1, VCN 2.0.2, 5.0.2 AMD Radeon RX 6800(XT) /6900(XT) /W6800, SIENNA_CICHLID, DCN 3.0.0, 10.3.0, VCN 3.0.0, 5.2.0 @@ -23,4 +24,5 @@ AMD Radeon RX 6700 XT / 6800M / 6700M, NAVY_FLOUNDER, DCN 3.0.0, 10.3.2, VCN 3.0 AMD Radeon RX 6600(XT) /6600M /W6600 /W6600M, DIMGREY_CAVEFISH, DCN 3.0.2, 10.3.4, VCN 3.0.16, 5.2.4 AMD Radeon RX 6500M /6300M /W6500M /W6300M, BEIGE_GOBY, DCN 3.0.3, 10.3.5, VCN 3.0.33, 5.2.5 AMD Radeon RX 7900 XT /XTX, , DCN 3.2.0, 11.0.0, VCN 4.0.0, 6.0.0 +AMD Radeon RX 7800 XT, , DCN 3.2.0, 11.0.3, VCN 4.0.0, 6.0.3 AMD Radeon RX 7600M (XT) /7700S /7600S, , DCN 3.2.1, 11.0.2, VCN 4.0.4, 6.0.2 diff --git a/Documentation/gpu/amdgpu/display/dcn-blocks.rst b/Documentation/gpu/amdgpu/display/dcn-blocks.rst new file mode 100644 index 000000000000..a3fbd3ea028b --- /dev/null +++ b/Documentation/gpu/amdgpu/display/dcn-blocks.rst @@ -0,0 +1,78 @@ +========== +DCN Blocks +========== + +In this section, you will find some extra details about some of the DCN blocks +and the code documentation when it is automatically generated. + +DCHUBBUB +-------- + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :export: + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :internal: + +HUBP +---- + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :export: + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :internal: + +DPP +--- + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :export: + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h + :internal: + +MPC +--- + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h + :export: + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h + :internal: + +OPP +--- + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h + :export: + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/opp.h + :internal: + +DIO +--- + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h + :doc: overview + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h + :export: + +.. kernel-doc:: drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h + :internal: diff --git a/Documentation/gpu/amdgpu/display/display-contributing.rst b/Documentation/gpu/amdgpu/display/display-contributing.rst new file mode 100644 index 000000000000..fdb2bea01d53 --- /dev/null +++ b/Documentation/gpu/amdgpu/display/display-contributing.rst @@ -0,0 +1,168 @@ +.. _display_todos: + +============================== +AMDGPU - Display Contributions +============================== + +First of all, if you are here, you probably want to give some technical +contribution to the display code, and for that, we say thank you :) + +This page summarizes some of the issues you can help with; keep in mind that +this is a static page, and it is always a good idea to try to reach developers +in the amdgfx or some of the maintainers. Finally, this page follows the DRM +way of creating a TODO list; for more information, check +'Documentation/gpu/todo.rst'. + +Gitlab issues +============= + +Users can report issues associated with AMD GPUs at: + +- https://gitlab.freedesktop.org/drm/amd + +Usually, we try to add a proper label to all new tickets to make it easy to +filter issues. If you can reproduce any problem, you could help by adding more +information or fixing the issue. + +Level: diverse + +IGT +=== + +`IGT`_ provides many integration tests that can be run on your GPU. We always +want to pass a large set of tests to increase the test coverage in our CI. If +you wish to contribute to the display code but are unsure where a good place +is, we recommend you run all IGT tests and try to fix any failure you see in +your hardware. Keep in mind that this failure can be an IGT problem or a kernel +issue; it is necessary to analyze case-by-case. + +Level: diverse + +.. _IGT: https://gitlab.freedesktop.org/drm/igt-gpu-tools + +Compilation +=========== + +Fix compilation warnings +------------------------ + +Enable the W1 or W2 warning level in the kernel compilation and try to fix the +issues on the display side. + +Level: Starter + +Fix compilation issues when using um architecture +------------------------------------------------- + +Linux has a User-mode Linux (UML) feature, and the kernel can be compiled to +the **um** architecture. Compiling for **um** can bring multiple advantages +from the test perspective. We currently have some compilation issues in this +area that we need to fix. + +Level: Intermediate + +Code Refactor +============= + +Add prefix to DC functions to improve the debug with ftrace +----------------------------------------------------------- + +The Ftrace debug feature (check 'Documentation/trace/ftrace.rst') is a +fantastic way to check the code path when developers try to make sense of a +bug. Ftrace provides a filter mechanism that can be useful when the developer +has some hunch of which part of the code can cause the issue; for this reason, +if a set of functions has a proper prefix, it becomes easy to create a good +filter. Additionally, prefixes can improve stack trace readability. + +The DC code does not follow some prefix rules, which makes the Ftrace filter +more complicated and reduces the readability of the stack trace. If you want +something simple to start contributing to the display, you can make patches for +adding prefixes to DC functions. To create those prefixes, use part of the file +name as a prefix for all functions in the target file. Check the +'amdgpu_dm_crtc.c` and `amdgpu_dm_plane.c` for some references. However, we +strongly advise not to send huge patches changing these prefixes; otherwise, it +will be hard to review and test, which can generate second thoughts from +maintainers. Try small steps; in case of double, you can ask before you put in +effort. We recommend first looking at folders like dceXYZ, dcnXYZ, basics, +bios, core, clk_mgr, hwss, resource, and irq. + +Level: Starter + +Reduce code duplication +----------------------- + +AMD has an extensive portfolio with various dGPUs and APUs that amdgpu +supports. To maintain the new hardware release cadence, DCE/DCN was designed in +a modular design, making the bring-up for new hardware fast. Over the years, +amdgpu accumulated some technical debt in the code duplication area. For this +task, it would be a good idea to find a tool that can discover code duplication +(including patterns) and use it as guidance to reduce duplications. + +Level: Intermediate + +Make atomic_commit_[check|tail] more readable +--------------------------------------------- + +The functions responsible for atomic commit and tail are intricate and +extensive. In particular `amdgpu_dm_atomic_commit_tail` is a long function and +could benefit from being split into smaller helpers. Improvements in this area +are more than welcome, but keep in mind that changes in this area will affect +all ASICs, meaning that refactoring requires a comprehensive verification; in +other words, this effort can take some time for validation. + +Level: Advanced + +Documentation +============= + +Expand kernel-doc +----------------- + +Many DC functions do not have a proper kernel-doc; understanding a function and +adding documentation is a great way to learn more about the amdgpu driver and +also leave an outstanding contribution to the entire community. + +Level: Starter + +Beyond AMDGPU +============= + +AMDGPU provides features that are not yet enabled in the userspace. This +section highlights some of the coolest display features, which could be enabled +with the userspace developer helper. + +Enable underlay +--------------- + +AMD display has this feature called underlay (which you can read more about at +'Documentation/GPU/amdgpu/display/mpo-overview.rst') which is intended to +save power when playing a video. The basic idea is to put a video in the +underlay plane at the bottom and the desktop in the plane above it with a hole +in the video area. This feature is enabled in ChromeOS, and from our data +measurement, it can save power. + +Level: Unknown + +Adaptive Backlight Modulation (ABM) +----------------------------------- + +ABM is a feature that adjusts the display panel's backlight level and pixel +values depending on the displayed image. This power-saving feature can be very +useful when the system starts to run off battery; since this will impact the +display output fidelity, it would be good if this option was something that +users could turn on or off. + +Level: Unknown + + +HDR & Color management & VRR +---------------------------- + +HDR, Color Management, and VRR are huge topics and it's hard to put these into +concise ToDos. If you are interested in this topic, we recommend checking some +blog posts from the community developers to better understand some of the +specific challenges and people working on the subject. If anyone wants to work +on some particular part, we can try to help with some basic guidance. Finally, +keep in mind that we already have some kernel-doc in place for those areas. + +Level: Unknown diff --git a/Documentation/gpu/amdgpu/display/display-manager.rst b/Documentation/gpu/amdgpu/display/display-manager.rst index be2651ecdd7f..67a811e6891f 100644 --- a/Documentation/gpu/amdgpu/display/display-manager.rst +++ b/Documentation/gpu/amdgpu/display/display-manager.rst @@ -132,9 +132,6 @@ The DRM blend mode and its elements are then mapped by AMDGPU display manager (MPC), as follows: .. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h - :doc: mpc-overview - -.. kernel-doc:: drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h :functions: mpcc_blnd_cfg Therefore, the blending configuration for a single MPCC instance on the MPC diff --git a/Documentation/gpu/amdgpu/display/index.rst b/Documentation/gpu/amdgpu/display/index.rst index f8a4f53d70d8..f0c342e00a39 100644 --- a/Documentation/gpu/amdgpu/display/index.rst +++ b/Documentation/gpu/amdgpu/display/index.rst @@ -7,18 +7,80 @@ drm/amd/display - Display Core (DC) AMD display engine is partially shared with other operating systems; for this reason, our Display Core Driver is divided into two pieces: -1. **Display Core (DC)** contains the OS-agnostic components. Things like +#. **Display Core (DC)** contains the OS-agnostic components. Things like hardware programming and resource management are handled here. -2. **Display Manager (DM)** contains the OS-dependent components. Hooks to the - amdgpu base driver and DRM are implemented here. +#. **Display Manager (DM)** contains the OS-dependent components. Hooks to the + amdgpu base driver and DRM are implemented here. For example, you can check + display/amdgpu_dm/ folder. + +------------------ +DC Code validation +------------------ + +Maintaining the same code base across multiple OSes requires a lot of +synchronization effort between repositories and exhaustive validation. In the +DC case, we maintain a tree to centralize code from different parts. The shared +repository has integration tests with our Internal Linux CI farm, and we run a +comprehensive set of IGT tests in various AMD GPUs/APUs (mostly recent dGPUs +and APUs). Our CI also checks ARM64/32, PPC64/32, and x86_64/32 compilation +with DCN enabled and disabled. + +When we upstream a new feature or some patches, we pack them in a patchset with +the prefix **DC Patches for <DATE>**, which is created based on the latest +`amd-staging-drm-next <https://gitlab.freedesktop.org/agd5f/linux>`_. All of +those patches are under a DC version tested as follows: + +* Ensure that every patch compiles and the entire series pass our set of IGT + test in different hardware. +* Prepare a branch with those patches for our validation team. If there is an + error, a developer will debug as fast as possible; usually, a simple bisect + in the series is enough to point to a bad change, and two possible actions + emerge: fix the issue or drop the patch. If it is not an easy fix, the bad + patch is dropped. +* Finally, developers wait a few days for community feedback before we merge + the series. + +It is good to stress that the test phase is something that we take extremely +seriously, and we never merge anything that fails our validation. Follows an +overview of our test set: + +#. Manual test + * Multiple Hotplugs with DP and HDMI. + * Stress test with multiple display configuration changes via the user interface. + * Validate VRR behaviour. + * Check PSR. + * Validate MPO when playing video. + * Test more than two displays connected at the same time. + * Check suspend/resume. + * Validate FPO. + * Check MST. +#. Automated test + * IGT tests in a farm with GPUs and APUs that support DCN and DCE. + * Compilation validation with the latest GCC and Clang from LTS distro. + * Cross-compilation for PowerPC 64/32, ARM 64/32, and x86 32. + +In terms of test setup for CI and manual tests, we usually use: + +#. The latest Ubuntu LTS. +#. In terms of userspace, we only use fully updated open-source components + provided by the distribution official package manager. +#. Regarding IGT, we use the latest code from the upstream. +#. Most of the manual tests are conducted in the GNome but we also use KDE. + +Notice that someone from our test team will always reply to the cover letter +with the test report. + +-------------- +DC Information +-------------- The display pipe is responsible for "scanning out" a rendered frame from the GPU memory (also called VRAM, FrameBuffer, etc.) to a display. In other words, it would: -1. Read frame information from memory; -2. Perform required transformation; -3. Send pixel data to sink devices. +#. Read frame information from memory; +#. Perform required transformation; +#. Send pixel data to sink devices. If you want to learn more about our driver details, take a look at the below table of content: @@ -26,7 +88,9 @@ table of content: .. toctree:: display-manager.rst - dc-debug.rst dcn-overview.rst + dcn-blocks.rst mpo-overview.rst + dc-debug.rst + display-contributing.rst dc-glossary.rst diff --git a/Documentation/gpu/drm-internals.rst b/Documentation/gpu/drm-internals.rst index 5fd20a306718..335de7fcddee 100644 --- a/Documentation/gpu/drm-internals.rst +++ b/Documentation/gpu/drm-internals.rst @@ -153,18 +153,6 @@ Managed Resources .. kernel-doc:: include/drm/drm_managed.h :internal: -Bus-specific Device Registration and PCI Support ------------------------------------------------- - -A number of functions are provided to help with device registration. The -functions deal with PCI and platform devices respectively and are only -provided for historical reasons. These are all deprecated and shouldn't -be used in new drivers. Besides that there's a few helpers for pci -drivers. - -.. kernel-doc:: drivers/gpu/drm/drm_pci.c - :export: - Open/Close, File Operations and IOCTLs ====================================== diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index 7aca5c7a7b1d..6dc299343b48 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -138,7 +138,7 @@ indicating kibi- or mebi-bytes. - drm-shared-<region>: <uint> [KiB|MiB] -The total size of buffers that are shared with another file (ie. have more +The total size of buffers that are shared with another file (e.g., have more than a single handle). - drm-total-<region>: <uint> [KiB|MiB] diff --git a/Documentation/gpu/introduction.rst b/Documentation/gpu/introduction.rst index f05eccd2c07c..b7c0baf97dbe 100644 --- a/Documentation/gpu/introduction.rst +++ b/Documentation/gpu/introduction.rst @@ -164,6 +164,8 @@ Conference talks Slides and articles ------------------- +* `The Linux graphics stack in a nutshell, part 1 <https://lwn.net/Articles/955376/>`_ - Thomas Zimmermann (2023) +* `The Linux graphics stack in a nutshell, part 2 <https://lwn.net/Articles/955708/>`_ - Thomas Zimmermann (2023) * `Understanding the Linux Graphics Stack <https://bootlin.com/doc/training/graphics/graphics-slides.pdf>`_ - Bootlin (2022) * `DRM KMS overview <https://wiki.st.com/stm32mpu/wiki/DRM_KMS_overview>`_ - STMicroelectronics (2021) * `Linux graphic stack <https://studiopixl.com/2017-05-13/linux-graphic-stack-an-overview>`_ - Nathan Gauër (2017) diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst index e4f7b005138d..476719771eef 100644 --- a/Documentation/gpu/rfc/index.rst +++ b/Documentation/gpu/rfc/index.rst @@ -31,7 +31,3 @@ host such documentation: .. toctree:: i915_vm_bind.rst - -.. toctree:: - - xe.rst diff --git a/Documentation/gpu/rfc/xe.rst b/Documentation/gpu/rfc/xe.rst deleted file mode 100644 index 97cf87578f97..000000000000 --- a/Documentation/gpu/rfc/xe.rst +++ /dev/null @@ -1,234 +0,0 @@ -========================== -Xe – Merge Acceptance Plan -========================== -Xe is a new driver for Intel GPUs that supports both integrated and -discrete platforms starting with Tiger Lake (first Intel Xe Architecture). - -This document aims to establish a merge plan for the Xe, by writing down clear -pre-merge goals, in order to avoid unnecessary delays. - -Xe – Overview -============= -The main motivation of Xe is to have a fresh base to work from that is -unencumbered by older platforms, whilst also taking the opportunity to -rearchitect our driver to increase sharing across the drm subsystem, both -leveraging and allowing us to contribute more towards other shared components -like TTM and drm/scheduler. - -This is also an opportunity to start from the beginning with a clean uAPI that is -extensible by design and already aligned with the modern userspace needs. For -this reason, the memory model is solely based on GPU Virtual Address space -bind/unbind (‘VM_BIND’) of GEM buffer objects (BOs) and execution only supporting -explicit synchronization. With persistent mapping across the execution, the -userspace does not need to provide a list of all required mappings during each -submission. - -The new driver leverages a lot from i915. As for display, the intent is to share -the display code with the i915 driver so that there is maximum reuse there. - -As for the power management area, the goal is to have a much-simplified support -for the system suspend states (S-states), PCI device suspend states (D-states), -GPU/Render suspend states (R-states) and frequency management. It should leverage -as much as possible all the existent PCI-subsystem infrastructure (pm and -runtime_pm) and underlying firmware components such PCODE and GuC for the power -states and frequency decisions. - -Repository: - -https://gitlab.freedesktop.org/drm/xe/kernel (branch drm-xe-next) - -Xe – Platforms -============== -Currently, Xe is already functional and has experimental support for multiple -platforms starting from Tiger Lake, with initial support in userspace implemented -in Mesa (for Iris and Anv, our OpenGL and Vulkan drivers), as well as in NEO -(for OpenCL and Level0). - -During a transition period, platforms will be supported by both Xe and i915. -However, the force_probe mechanism existent in both drivers will allow only one -official and by-default probe at a given time. - -For instance, in order to probe a DG2 which PCI ID is 0x5690 by Xe instead of -i915, the following set of parameters need to be used: - -``` -i915.force_probe=!5690 xe.force_probe=5690 -``` - -In both drivers, the ‘.require_force_probe’ protection forces the user to use the -force_probe parameter while the driver is under development. This protection is -only removed when the support for the platform and the uAPI are stable. Stability -which needs to be demonstrated by CI results. - -In order to avoid user space regressions, i915 will continue to support all the -current platforms that are already out of this protection. Xe support will be -forever experimental and dependent on the usage of force_probe for these -platforms. - -When the time comes for Xe, the protection will be lifted on Xe and kept in i915. - -Xe – Pre-Merge Goals - Work-in-Progress -======================================= - -Display integration with i915 ------------------------------ -In order to share the display code with the i915 driver so that there is maximum -reuse, the i915/display/ code is built twice, once for i915.ko and then for -xe.ko. Currently, the i915/display code in Xe tree is polluted with many 'ifdefs' -depending on the build target. The goal is to refactor both Xe and i915/display -code simultaneously in order to get a clean result before they land upstream, so -that display can already be part of the initial pull request towards drm-next. - -However, display code should not gate the acceptance of Xe in upstream. Xe -patches will be refactored in a way that display code can be removed, if needed, -from the first pull request of Xe towards drm-next. The expectation is that when -both drivers are part of the drm-tip, the introduction of cleaner patches will be -easier and speed up. - -Xe – uAPI high level overview -============================= - -...Warning: To be done in follow up patches after/when/where the main consensus in various items are individually reached. - -Xe – Pre-Merge Goals - Completed -================================ - -Drm_exec --------- -Helper to make dma_resv locking for a big number of buffers is getting removed in -the drm_exec series proposed in https://patchwork.freedesktop.org/patch/524376/ -If that happens, Xe needs to change and incorporate the changes in the driver. -The goal is to engage with the Community to understand if the best approach is to -move that to the drivers that are using it or if we should keep the helpers in -place waiting for Xe to get merged. - -This item ties into the GPUVA, VM_BIND, and even long-running compute support. - -As a key measurable result, we need to have a community consensus documented in -this document and the Xe driver prepared for the changes, if necessary. - -Userptr integration and vm_bind -------------------------------- -Different drivers implement different ways of dealing with execution of userptr. -With multiple drivers currently introducing support to VM_BIND, the goal is to -aim for a DRM consensus on what’s the best way to have that support. To some -extent this is already getting addressed itself with the GPUVA where likely the -userptr will be a GPUVA with a NULL GEM call VM bind directly on the userptr. -However, there are more aspects around the rules for that and the usage of -mmu_notifiers, locking and other aspects. - -This task here has the goal of introducing a documentation of the basic rules. - -The documentation *needs* to first live in this document (API session below) and -then moved to another more specific document or at Xe level or at DRM level. - -Documentation should include: - - * The userptr part of the VM_BIND api. - - * Locking, including the page-faulting case. - - * O(1) complexity under VM_BIND. - -The document is now included in the drm documentation :doc:`here </gpu/drm-vm-bind-async>`. - -Some parts of userptr like mmu_notifiers should become GPUVA or DRM helpers when -the second driver supporting VM_BIND+userptr appears. Details to be defined when -the time comes. - -The DRM GPUVM helpers do not yet include the userptr parts, but discussions -about implementing them are ongoing. - -ASYNC VM_BIND -------------- -Although having a common DRM level IOCTL for VM_BIND is not a requirement to get -Xe merged, it is mandatory to have a consensus with other drivers and Mesa. -It needs to be clear how to handle async VM_BIND and interactions with userspace -memory fences. Ideally with helper support so people don't get it wrong in all -possible ways. - -As a key measurable result, the benefits of ASYNC VM_BIND and a discussion of -various flavors, error handling and sample API suggestions are documented in -:doc:`The ASYNC VM_BIND document </gpu/drm-vm-bind-async>`. - -Drm_scheduler -------------- -Xe primarily uses Firmware based scheduling (GuC FW). However, it will use -drm_scheduler as the scheduler ‘frontend’ for userspace submission in order to -resolve syncobj and dma-buf implicit sync dependencies. However, drm_scheduler is -not yet prepared to handle the 1-to-1 relationship between drm_gpu_scheduler and -drm_sched_entity. - -Deeper changes to drm_scheduler should *not* be required to get Xe accepted, but -some consensus needs to be reached between Xe and other community drivers that -could also benefit from this work, for coupling FW based/assisted submission such -as the ARM’s new Mali GPU driver, and others. - -As a key measurable result, the patch series introducing Xe itself shall not -depend on any other patch touching drm_scheduler itself that was not yet merged -through drm-misc. This, by itself, already includes the reach of an agreement for -uniform 1 to 1 relationship implementation / usage across drivers. - -Long running compute: minimal data structure/scaffolding --------------------------------------------------------- -The generic scheduler code needs to include the handling of endless compute -contexts, with the minimal scaffolding for preempt-ctx fences (probably on the -drm_sched_entity) and making sure drm_scheduler can cope with the lack of job -completion fence. - -The goal is to achieve a consensus ahead of Xe initial pull-request, ideally with -this minimal drm/scheduler work, if needed, merged to drm-misc in a way that any -drm driver, including Xe, could re-use and add their own individual needs on top -in a next stage. However, this should not block the initial merge. - -Dev_coredump ------------- - -Xe needs to align with other drivers on the way that the error states are -dumped, avoiding a Xe only error_state solution. The goal is to use devcoredump -infrastructure to report error states, since it produces a standardized way -by exposing a virtual and temporary /sys/class/devcoredump device. - -As the key measurable result, Xe driver needs to provide GPU snapshots captured -at hang time through devcoredump, but without depending on any core modification -of devcoredump infrastructure itself. - -Later, when we are in-tree, the goal is to collaborate with devcoredump -infrastructure with overall possible improvements, like multiple file support -for better organization of the dumps, snapshot support, dmesg extra print, -and whatever may make sense and help the overall infrastructure. - -DRM_VM_BIND ------------ -Nouveau, and Xe are all implementing ‘VM_BIND’ and new ‘Exec’ uAPIs in order to -fulfill the needs of the modern uAPI. Xe merge should *not* be blocked on the -development of a common new drm_infrastructure. However, the Xe team needs to -engage with the community to explore the options of a common API. - -As a key measurable result, the DRM_VM_BIND needs to be documented in this file -below, or this entire block deleted if the consensus is for independent drivers -vm_bind ioctls. - -Although having a common DRM level IOCTL for VM_BIND is not a requirement to get -Xe merged, it is mandatory to enforce the overall locking scheme for all major -structs and list (so vm and vma). So, a consensus is needed, and possibly some -common helpers. If helpers are needed, they should be also documented in this -document. - -GPU VA ------- -Two main goals of Xe are meeting together here: - -1) Have an uAPI that aligns with modern UMD needs. - -2) Early upstream engagement. - -RedHat engineers working on Nouveau proposed a new DRM feature to handle keeping -track of GPU virtual address mappings. This is still not merged upstream, but -this aligns very well with our goals and with our VM_BIND. The engagement with -upstream and the port of Xe towards GPUVA is already ongoing. - -As a key measurable result, Xe needs to be aligned with the GPU VA and working in -our tree. Missing Nouveau patches should *not* block Xe and any needed GPUVA -related patch should be independent and present on dri-devel or acked by -maintainers to go along with the first Xe pull request towards drm-next. diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst index 41a264bf84ce..fb9ad120b141 100644 --- a/Documentation/gpu/todo.rst +++ b/Documentation/gpu/todo.rst @@ -120,6 +120,29 @@ Contact: Daniel Vetter, respective driver maintainers Level: Advanced +Rename drm_atomic_state +----------------------- + +The KMS framework uses two slightly different definitions for the ``state`` +concept. For a given object (plane, CRTC, encoder, etc., so +``drm_$OBJECT_state``), the state is the entire state of that object. However, +at the device level, ``drm_atomic_state`` refers to a state update for a +limited number of objects. + +The state isn't the entire device state, but only the full state of some +objects in that device. This is confusing to newcomers, and +``drm_atomic_state`` should be renamed to something clearer like +``drm_atomic_commit``. + +In addition to renaming the structure itself, it would also imply renaming some +related functions (``drm_atomic_state_alloc``, ``drm_atomic_state_get``, +``drm_atomic_state_put``, ``drm_atomic_state_init``, +``__drm_atomic_state_free``, etc.). + +Contact: Maxime Ripard <mripard@kernel.org> + +Level: Advanced + Fallout from atomic KMS ----------------------- diff --git a/Documentation/hwmon/aspeed-g6-pwm-tach.rst b/Documentation/hwmon/aspeed-g6-pwm-tach.rst new file mode 100644 index 000000000000..17398fe397fe --- /dev/null +++ b/Documentation/hwmon/aspeed-g6-pwm-tach.rst @@ -0,0 +1,26 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +Kernel driver aspeed-g6-pwm-tach +================================= + +Supported chips: + ASPEED AST2600 + +Authors: + <billy_tsai@aspeedtech.com> + +Description: +------------ +This driver implements support for ASPEED AST2600 Fan Tacho controller. +The controller supports up to 16 tachometer inputs. + +The driver provides the following sensor accesses in sysfs: + +=============== ======= ====================================================== +fanX_input ro provide current fan rotation value in RPM as reported + by the fan to the device. +fanX_div rw Fan divisor: Supported value are power of 4 (1, 4, 16 + 64, ... 4194304) + The larger divisor, the less rpm accuracy and the less + affected by fan signal glitch. +=============== ======= ====================================================== diff --git a/Documentation/hwmon/asus_rog_ryujin.rst b/Documentation/hwmon/asus_rog_ryujin.rst new file mode 100644 index 000000000000..9f77da070022 --- /dev/null +++ b/Documentation/hwmon/asus_rog_ryujin.rst @@ -0,0 +1,47 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +Kernel driver asus_rog_ryujin +============================= + +Supported devices: + +* ASUS ROG RYUJIN II 360 + +Author: Aleksa Savic + +Description +----------- + +This driver enables hardware monitoring support for the listed ASUS ROG RYUJIN +all-in-one CPU liquid coolers. Available sensors are pump, internal and external +(controller) fan speed in RPM, their duties in PWM, as well as coolant temperature. + +Attaching external fans to the controller is optional and allows them to be +controlled from the device. If not connected, the fan-related sensors will +report zeroes. The controller is a separate hardware unit that comes bundled +with the AIO and connects to it to allow fan control. + +The addressable LCD screen is not supported in this driver and should +be controlled through userspace tools. + +Usage notes +----------- + +As these are USB HIDs, the driver can be loaded automatically by the kernel and +supports hot swapping. + +Sysfs entries +------------- + +=========== ============================================= +fan1_input Pump speed (in rpm) +fan2_input Internal fan speed (in rpm) +fan3_input External (controller) fan 1 speed (in rpm) +fan4_input External (controller) fan 2 speed (in rpm) +fan5_input External (controller) fan 3 speed (in rpm) +fan6_input External (controller) fan 4 speed (in rpm) +temp1_input Coolant temperature (in millidegrees Celsius) +pwm1 Pump duty +pwm2 Internal fan duty +pwm3 External (controller) fan duty +=========== ============================================= diff --git a/Documentation/hwmon/chipcap2.rst b/Documentation/hwmon/chipcap2.rst new file mode 100644 index 000000000000..dc165becc64c --- /dev/null +++ b/Documentation/hwmon/chipcap2.rst @@ -0,0 +1,73 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +Kernel driver ChipCap2 +====================== + +Supported chips: + + * Amphenol CC2D23, CC2D23S, CC2D25, CC2D25S, CC2D33, CC2D33S, CC2D35, CC2D35S + + Prefix: 'chipcap2' + + Addresses scanned: - + + Datasheet: https://www.amphenol-sensors.com/en/telaire/humidity/527-humidity-sensors/3095-chipcap-2 + +Author: + + - Javier Carrasco <javier.carrasco.cruz@gmail.com> + +Description +----------- + +This driver implements support for the Amphenol ChipCap 2, a humidity and +temperature chip family. Temperature is measured in milli degrees celsius, +relative humidity is expressed as a per cent mille. The measurement ranges +are the following: + + - Relative humidity: 0 to 100000 pcm (14-bit resolution) + - Temperature: -40000 to +125000 m°C (14-bit resolution) + +The device communicates with the I2C protocol and uses the I2C address 0x28 +by default. + +Depending on the hardware configuration, up to two humidity alarms to control +minimum and maximum values are provided. Their thresholds and hystersis can be +configured via sysfs. + +Thresholds and hysteris must be provided as a per cent mille. These values +might be truncated to match the 14-bit device resolution (6.1 pcm/LSB) + +Known Issues +------------ + +The driver does not support I2C address and command window length modification. + +sysfs-Interface +--------------- + +The following list includes the sysfs attributes that the driver always provides, +their permissions and a short description: + +=============================== ======= ======================================== +Name Perm Description +=============================== ======= ======================================== +temp1_input: RO temperature input +humidity1_input: RO humidity input +=============================== ======= ======================================== + +The following list includes the sysfs attributes that the driver may provide +depending on the hardware configuration: + +=============================== ======= ======================================== +Name Perm Description +=============================== ======= ======================================== +humidity1_min: RW humidity low limit. Measurements under + this limit trigger a humidity low alarm +humidity1_max: RW humidity high limit. Measurements above + this limit trigger a humidity high alarm +humidity1_min_hyst: RW humidity low hystersis +humidity1_max_hyst: RW humidity high hystersis +humidity1_min_alarm: RO humidity low alarm indicator +humidity1_max_alarm: RO humidity high alarm indicator +=============================== ======= ======================================== diff --git a/Documentation/hwmon/emc2305.rst b/Documentation/hwmon/emc2305.rst index 2403dbaf2728..d0bfffe46358 100644 --- a/Documentation/hwmon/emc2305.rst +++ b/Documentation/hwmon/emc2305.rst @@ -6,7 +6,6 @@ Kernel driver emc2305 Supported chips: Microchip EMC2305, EMC2303, EMC2302, EMC2301 - Addresses scanned: I2C 0x27, 0x2c, 0x2d, 0x2e, 0x2f, 0x4c, 0x4d Prefixes: 'emc2305' Datasheet: Publicly available at the Microchip website : diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst index c7ed1f73ac06..1ca7a4fe1f8f 100644 --- a/Documentation/hwmon/index.rst +++ b/Documentation/hwmon/index.rst @@ -44,13 +44,16 @@ Hardware Monitoring Kernel Drivers aquacomputer_d5next asb100 asc7621 + aspeed-g6-pwm-tach aspeed-pwm-tacho asus_ec_sensors + asus_rog_ryujin asus_wmi_sensors bcm54140 bel-pfe bpa-rs600 bt1-pvt + chipcap2 coretemp corsair-cpro corsair-psu @@ -129,6 +132,7 @@ Hardware Monitoring Kernel Drivers ltc4245 ltc4260 ltc4261 + ltc4282 ltc4286 max127 max15301 @@ -163,6 +167,7 @@ Hardware Monitoring Kernel Drivers mp2975 mp5023 mp5990 + mpq8785 nct6683 nct6775 nct7802 @@ -171,6 +176,7 @@ Hardware Monitoring Kernel Drivers nsa320 ntc_thermistor nzxt-kraken2 + nzxt-kraken3 nzxt-smart2 occ oxp-sensors @@ -185,6 +191,7 @@ Hardware Monitoring Kernel Drivers pmbus powerz powr1220 + pt5161l pxe1610 pwm-fan q54sj108a2 @@ -208,6 +215,7 @@ Hardware Monitoring Kernel Drivers smsc47m1 sparx5-temp stpddc60 + surface_fan sy7636a-hwmon tc654 tc74 diff --git a/Documentation/hwmon/ltc4282.rst b/Documentation/hwmon/ltc4282.rst new file mode 100644 index 000000000000..a87ec3564998 --- /dev/null +++ b/Documentation/hwmon/ltc4282.rst @@ -0,0 +1,133 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +Kernel drivers ltc4282 +========================================== + +Supported chips: + + * Analog Devices LTC4282 + + Prefix: 'ltc4282' + + Addresses scanned: - I2C 0x40 - 0x5A (7-bit) + Addresses scanned: - I2C 0x80 - 0xB4 with a step of 2 (8-bit) + + Datasheet: + + https://www.analog.com/media/en/technical-documentation/data-sheets/ltc4282.pdf + +Author: Nuno Sá <nuno.sa@analog.com> + +Description +___________ + +The LTC4282 hot swap controller allows a board to be safely inserted and removed +from a live backplane. Using one or more external N-channel pass transistors, +board supply voltage and inrush current are ramped up at an adjustable rate. An +I2C interface and onboard ADC allows for monitoring of board current, voltage, +power, energy and fault status. The device features analog foldback current +limiting and supply monitoring for applications from 2.9V to 33V. Dual 12V gate +drive allows high power applications to either share safe operating area across +parallel MOSFETs or support a 2-stage start-up that first charges the load +capacitance followed by enabling a low on-resistance path to the load. The +LTC4282 is well suited to high power applications because the precise monitoring +capability and accurate current limiting reduce the extremes in which both loads +and power supplies must safely operate. Non-volatile configuration allows for +flexibility in the autonomous generation of alerts and response to faults. + +Sysfs entries +_____________ + +The following attributes are supported. Limits are read-write and all the other +attributes are read-only. Note that in0 and in1 are mutually exclusive. Enabling +one disables the other and disabling one enables the other. + +======================= ========================================== +in0_input Output voltage (mV). +in0_min Undervoltage threshold +in0_max Overvoltage threshold +in0_lowest Lowest measured voltage +in0_highest Highest measured voltage +in0_reset_history Write 1 to reset in0 history. + Also clears fet bad and short fault logs. +in0_min_alarm Undervoltage alarm +in0_max_alarm Overvoltage alarm +in0_enable Enable/Disable VSOURCE monitoring +in0_fault Failure in the MOSFETs. Either bad or shorted FET. +in0_label Channel label (VSOURCE) + +in1_input Input voltage (mV). +in1_min Undervoltage threshold +in1_max Overvoltage threshold +in1_lowest Lowest measured voltage +in1_highest Highest measured voltage +in1_reset_history Write 1 to reset in1 history. + Also clears over/undervoltage fault logs. +in1_min_alarm Undervoltage alarm +in1_max_alarm Overvoltage alarm +in1_lcrit_alarm Critical Undervoltage alarm +in1_crit_alarm Critical Overvoltage alarm +in1_enable Enable/Disable VDD monitoring +in1_label Channel label (VDD) + +in2_input GPIO voltage (mV) +in2_min Undervoltage threshold +in2_max Overvoltage threshold +in2_lowest Lowest measured voltage +in2_highest Highest measured voltage +in2_reset_history Write 1 to reset in2 history +in2_min_alarm Undervoltage alarm +in2_max_alarm Overvoltage alarm +in2_label Channel label (VGPIO) + +curr1_input Sense current (mA) +curr1_min Undercurrent threshold +curr1_max Overcurrent threshold +curr1_lowest Lowest measured current +curr1_highest Highest measured current +curr1_reset_history Write 1 to reset curr1 history. + Also clears overcurrent fault logs. +curr1_min_alarm Undercurrent alarm +curr1_max_alarm Overcurrent alarm +curr1_crit_alarm Critical Overcurrent alarm +curr1_label Channel label (ISENSE) + +power1_input Power (in uW) +power1_min Low power threshold +power1_max High power threshold +power1_input_lowest Historical minimum power use +power1_input_highest Historical maximum power use +power1_reset_history Write 1 to reset power1 history. + Also clears power bad fault logs. +power1_min_alarm Low power alarm +power1_max_alarm High power alarm +power1_label Channel label (Power) + +energy1_input Measured energy over time (in microJoule) +energy1_enable Enable/Disable Energy accumulation +======================= ========================================== + +DebugFs entries +_______________ + +The chip also has a fault log register where failures can be logged. Hence, +as these are logging events, we give access to them in debugfs. Note that +even if some failure is detected in these logs, it does necessarily mean +that the failure is still present. As mentioned in the proper Sysfs entries, +these logs can be cleared by writing in the proper reset_history attribute. + +.. warning:: The debugfs interface is subject to change without notice + and is only available when the kernel is compiled with + ``CONFIG_DEBUG_FS`` defined. + +``/sys/kernel/debug/ltc4282-hwmon[X]/`` +contains the following attributes: + +======================= ========================================== +power1_bad_fault_log Set to 1 by a power1 bad fault occurring. +in0_fet_short_fault_log Set to 1 when the ADC detects a FET-short fault. +in0_fet_bad_fault_log Set to 1 when a FET-BAD fault occurs. +in1_crit_fault_log Set to 1 by a VDD overvoltage fault occurring. +in1_lcrit_fault_log Set to 1 by a VDD undervoltage fault occurring. +curr1_crit_fault_log Set to 1 by an overcurrent fault occurring. +======================= ========================================== diff --git a/Documentation/hwmon/max6620.rst b/Documentation/hwmon/max6620.rst index 84c1c44d3de4..d70173bf0242 100644 --- a/Documentation/hwmon/max6620.rst +++ b/Documentation/hwmon/max6620.rst @@ -11,7 +11,7 @@ Supported chips: Addresses scanned: none - Datasheet: http://pdfserv.maxim-ic.com/en/ds/MAX6620.pdf + Datasheet: https://www.analog.com/media/en/technical-documentation/data-sheets/max6620.pdf Authors: - L\. Grunenberg <contact@lgrunenberg.de> diff --git a/Documentation/hwmon/mpq8785.rst b/Documentation/hwmon/mpq8785.rst new file mode 100644 index 000000000000..bf8176b87086 --- /dev/null +++ b/Documentation/hwmon/mpq8785.rst @@ -0,0 +1,94 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +Kernel driver mpq8785 +======================= + +Supported chips: + + * MPS MPQ8785 + + Prefix: 'mpq8785' + +Author: Charles Hsu <ythsu0511@gmail.com> + +Description +----------- + +The MPQ8785 is a fully integrated, PMBus-compatible, high-frequency, synchronous +buck converter. The MPQ8785 offers a very compact solution that achieves up to +40A output current per phase, with excellent load and line regulation over a +wide input supply range. The MPQ8785 operates at high efficiency over a wide +output current load range. + +The PMBus interface provides converter configurations and key parameters +monitoring. + +The MPQ8785 adopts MPS's proprietary multi-phase digital constant-on-time (MCOT) +control, which provides fast transient response and eases loop stabilization. +The MCOT scheme also allows multiple MPQ8785 devices to be connected in parallel +with excellent current sharing and phase interleaving for high-current +applications. + +Fully integrated protection features include over-current protection (OCP), +over-voltage protection (OVP), under-voltage protection (UVP), and +over-temperature protection (OTP). + +The MPQ8785 requires a minimal number of readily available, standard external +components, and is available in a TLGA (5mmx6mm) package. + +Device compliant with: + +- PMBus rev 1.3 interface. + +The driver exports the following attributes via the 'sysfs' files +for input voltage: + +**in1_input** + +**in1_label** + +**in1_max** + +**in1_max_alarm** + +**in1_min** + +**in1_min_alarm** + +**in1_crit** + +**in1_crit_alarm** + +The driver provides the following attributes for output voltage: + +**in2_input** + +**in2_label** + +**in2_alarm** + +The driver provides the following attributes for output current: + +**curr1_input** + +**curr1_label** + +**curr1_max** + +**curr1_max_alarm** + +**curr1_crit** + +**curr1_crit_alarm** + +The driver provides the following attributes for temperature: + +**temp1_input** + +**temp1_max** + +**temp1_max_alarm** + +**temp1_crit** + +**temp1_crit_alarm** diff --git a/Documentation/hwmon/nct6683.rst b/Documentation/hwmon/nct6683.rst index 3e7f6ee779c2..2a7a78eb1b46 100644 --- a/Documentation/hwmon/nct6683.rst +++ b/Documentation/hwmon/nct6683.rst @@ -64,4 +64,5 @@ Intel DB85FL NCT6683D EC firmware version 1.0 build 04/03/13 ASRock X570 NCT6683D EC firmware version 1.0 build 06/28/19 ASRock X670E NCT6686D EC firmware version 1.0 build 05/19/22 MSI B550 NCT6687D EC firmware version 1.0 build 05/07/20 +MSI X670-P NCT6687D EC firmware version 0.0 build 09/27/22 =============== =============================================== diff --git a/Documentation/hwmon/nzxt-kraken3.rst b/Documentation/hwmon/nzxt-kraken3.rst new file mode 100644 index 000000000000..90fd9dec15ff --- /dev/null +++ b/Documentation/hwmon/nzxt-kraken3.rst @@ -0,0 +1,74 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +Kernel driver nzxt-kraken3 +========================== + +Supported devices: + +* NZXT Kraken X53 +* NZXT Kraken X63 +* NZXT Kraken X73 +* NZXT Kraken Z53 +* NZXT Kraken Z63 +* NZXT Kraken Z73 + +Author: Jonas Malaco, Aleksa Savic + +Description +----------- + +This driver enables hardware monitoring support for NZXT Kraken X53/X63/X73 and +Z53/Z63/Z73 all-in-one CPU liquid coolers. All models expose liquid temperature +and pump speed (in RPM), as well as PWM control (either as a fixed value +or through a temp-PWM curve). The Z-series models additionally expose the speed +and duty of an optionally connected fan, with the same PWM control capabilities. + +Pump and fan duty control mode can be set through pwm[1-2]_enable, where 1 is +for the manual control mode and 2 is for the liquid temp to PWM curve mode. +Writing a 0 disables control of the channel through the driver after setting its +duty to 100%. + +The temperature of the curves relates to the fixed [20-59] range, correlating to +the detected liquid temperature. Only PWM values (ranging from 0-255) can be set. +If in curve mode, setting point values should be done in moderation - the devices +require complete curves to be sent for each change; they can lock up or discard +the changes if they are too numerous at once. Suggestion is to set them while +in an another mode, and then apply them by switching to curve. + +The devices can report if they are faulty. The driver supports that situation +and will issue a warning. This can also happen when the USB cable is connected, +but SATA power is not. + +The addressable RGB LEDs and LCD screen (only on Z-series models) are not +supported in this driver, but can be controlled through existing userspace tools, +such as `liquidctl`_. + +.. _liquidctl: https://github.com/liquidctl/liquidctl + +Usage Notes +----------- + +As these are USB HIDs, the driver can be loaded automatically by the kernel and +supports hot swapping. + +Possible pwm_enable values are: + +====== ========================================================================== +0 Set fan to 100% +1 Direct PWM mode (applies value in corresponding PWM entry) +2 Curve control mode (applies the temp-PWM duty curve based on coolant temp) +====== ========================================================================== + +Sysfs entries +------------- + +============================== ================================================================ +fan1_input Pump speed (in rpm) +fan2_input Fan speed (in rpm) +temp1_input Coolant temperature (in millidegrees Celsius) +pwm1 Pump duty (value between 0-255) +pwm1_enable Pump duty control mode (0: disabled, 1: manual, 2: curve) +pwm2 Fan duty (value between 0-255) +pwm2_enable Fan duty control mode (0: disabled, 1: manual, 2: curve) +temp[1-2]_auto_point[1-40]_pwm Temp-PWM duty curves (for pump and fan), related to coolant temp +============================== ================================================================ diff --git a/Documentation/hwmon/oxp-sensors.rst b/Documentation/hwmon/oxp-sensors.rst index 3adeb7406243..55b1ef61625e 100644 --- a/Documentation/hwmon/oxp-sensors.rst +++ b/Documentation/hwmon/oxp-sensors.rst @@ -33,6 +33,7 @@ Currently the driver supports the following handhelds: - AOK ZOE A1 PRO - Aya Neo 2 - Aya Neo AIR + - Aya Neo AIR Plus (Mendocino) - Aya Neo AIR Pro - Aya Neo Geek - OneXPlayer AMD diff --git a/Documentation/hwmon/pt5161l.rst b/Documentation/hwmon/pt5161l.rst new file mode 100644 index 000000000000..1b97336991ea --- /dev/null +++ b/Documentation/hwmon/pt5161l.rst @@ -0,0 +1,42 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +Kernel driver pt5161l +===================== + +Supported chips: + + * Astera Labs PT5161L + + Prefix: 'pt5161l' + + Addresses scanned: I2C 0x20 - 0x27 + + Datasheet: Not publicly available. + +Authors: Cosmo Chou <cosmo.chou@quantatw.com> + +Description +----------- + +This driver implements support for temperature monitoring of Astera Labs +PT5161L series PCIe retimer chips. + +This driver implementation originates from the CSDK available at +https://github.com/facebook/openbmc/tree/helium/common/recipes-lib/retimer-v2.14 +The communication protocol utilized is based on the I2C/SMBus standard. + +Sysfs entries +---------------- + +================ ============================================== +temp1_input Measured temperature (in millidegrees Celsius) +================ ============================================== + +Debugfs entries +---------------- + +================ =============================== +fw_load_status Firmware load status +fw_ver Firmware version of the retimer +heartbeat_status Heartbeat status +================ =============================== diff --git a/Documentation/hwmon/sht3x.rst b/Documentation/hwmon/sht3x.rst index 957c854f5d08..9585fa7c5a5d 100644 --- a/Documentation/hwmon/sht3x.rst +++ b/Documentation/hwmon/sht3x.rst @@ -65,6 +65,10 @@ When the temperature and humidity readings move back between the hysteresis values, the alert bit is set to 0 and the alert pin on the sensor is set to low. +The serial number exposed to debugfs allows for unique identification of the +sensors. For sts32, sts33 and sht33, the manufacturer provides calibration +certificates through an API. + sysfs-Interface --------------- @@ -99,3 +103,10 @@ repeatability: write or read repeatability, higher repeatability means - 1: medium repeatability - 2: high repeatability =================== ============================================================ + +debugfs-Interface +----------------- + +=================== ============================================================ +serial_number: unique serial number of the sensor in decimal +=================== ============================================================ diff --git a/Documentation/hwmon/surface_fan.rst b/Documentation/hwmon/surface_fan.rst new file mode 100644 index 000000000000..07942574c4f0 --- /dev/null +++ b/Documentation/hwmon/surface_fan.rst @@ -0,0 +1,25 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +Kernel driver surface_fan +========================= + +Supported Devices: + + * Microsoft Surface Pro 9 + +Author: Ivor Wanders <ivor@iwanders.net> + +Description +----------- + +This provides monitoring of the fan found in some Microsoft Surface Pro devices, +like the Surface Pro 9. The fan is always controlled by the onboard controller. + +Sysfs interface +--------------- + +======================= ======= ========================================= +Name Perm Description +======================= ======= ========================================= +``fan1_input`` RO Current fan speed in RPM. +======================= ======= ========================================= diff --git a/Documentation/index.rst b/Documentation/index.rst index 36e61783437c..5298611e00ee 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -22,10 +22,10 @@ community and getting your work upstream. .. toctree:: :maxdepth: 1 - process/development-process - process/submitting-patches + Development process <process/development-process> + Submitting patches <process/submitting-patches> Code of conduct <process/code-of-conduct> - maintainer/index + Maintainer handbook <maintainer/index> All development-process docs <process/index> @@ -38,10 +38,10 @@ kernel. .. toctree:: :maxdepth: 1 - core-api/index - driver-api/index - subsystem-apis - Locking in the kernel <locking/index> + Core API <core-api/index> + Driver APIs <driver-api/index> + Subsystems <subsystem-apis> + Locking <locking/index> Development tools and processes =============================== @@ -51,15 +51,15 @@ Various other manuals with useful information for all kernel developers. .. toctree:: :maxdepth: 1 - process/license-rules - doc-guide/index - dev-tools/index - dev-tools/testing-overview - kernel-hacking/index - trace/index - fault-injection/index - livepatch/index - rust/index + Licensing rules <process/license-rules> + Writing documentation <doc-guide/index> + Development tools <dev-tools/index> + Testing guide <dev-tools/testing-overview> + Hacking guide <kernel-hacking/index> + Tracing <trace/index> + Fault injection <fault-injection/index> + Livepatching <livepatch/index> + Rust <rust/index> User-oriented documentation @@ -72,11 +72,11 @@ developers seeking information on the kernel's user-space APIs. .. toctree:: :maxdepth: 1 - admin-guide/index - The kernel build system <kbuild/index> - admin-guide/reporting-issues.rst - User-space tools <tools/index> - userspace-api/index + Administration <admin-guide/index> + Build system <kbuild/index> + Reporting issues <admin-guide/reporting-issues.rst> + Userspace tools <tools/index> + Userspace API <userspace-api/index> See also: the `Linux man pages <https://www.kernel.org/doc/man-pages/>`_, which are kept separately from the kernel's own documentation. @@ -89,8 +89,8 @@ platform firmwares. .. toctree:: :maxdepth: 1 - firmware-guide/index - devicetree/index + Firmware <firmware-guide/index> + Firmware and Devicetree <devicetree/index> Architecture-specific documentation @@ -99,7 +99,7 @@ Architecture-specific documentation .. toctree:: :maxdepth: 2 - arch/index + CPU architectures <arch/index> Other documentation @@ -112,8 +112,7 @@ to ReStructured Text format, or are simply too old. .. toctree:: :maxdepth: 1 - staging/index - RAS/ras + Unsorted documentation <staging/index> Translations @@ -122,7 +121,7 @@ Translations .. toctree:: :maxdepth: 2 - translations/index + Translations <translations/index> Indices and tables ================== diff --git a/Documentation/kbuild/Kconfig.recursion-issue-01 b/Documentation/kbuild/Kconfig.recursion-issue-01 index e8877db0461f..ac49836d8ecf 100644 --- a/Documentation/kbuild/Kconfig.recursion-issue-01 +++ b/Documentation/kbuild/Kconfig.recursion-issue-01 @@ -16,13 +16,13 @@ # that are possible for CORE. So for example if CORE_BELL_A_ADVANCED is 'y', # CORE must be 'y' too. # -# * What influences CORE_BELL_A_ADVANCED ? +# * What influences CORE_BELL_A_ADVANCED? # # As the name implies CORE_BELL_A_ADVANCED is an advanced feature of # CORE_BELL_A so naturally it depends on CORE_BELL_A. So if CORE_BELL_A is 'y' # we know CORE_BELL_A_ADVANCED can be 'y' too. # -# * What influences CORE_BELL_A ? +# * What influences CORE_BELL_A? # # CORE_BELL_A depends on CORE, so CORE influences CORE_BELL_A. # @@ -34,7 +34,7 @@ # the "recursive dependency detected" error. # # Reading the Documentation/kbuild/Kconfig.recursion-issue-01 file it may be -# obvious that an easy to solution to this problem should just be the removal +# obvious that an easy solution to this problem should just be the removal # of the "select CORE" from CORE_BELL_A_ADVANCED as that is implicit already # since CORE_BELL_A depends on CORE. Recursive dependency issues are not always # so trivial to resolve, we provide another example below of practical diff --git a/Documentation/maintainer/maintainer-entry-profile.rst b/Documentation/maintainer/maintainer-entry-profile.rst index 18cee1edaecb..b49fb6dc4d0c 100644 --- a/Documentation/maintainer/maintainer-entry-profile.rst +++ b/Documentation/maintainer/maintainer-entry-profile.rst @@ -102,7 +102,10 @@ to do something different in the near future. ../doc-guide/maintainer-profile ../nvdimm/maintainer-entry-profile ../arch/riscv/patch-acceptance + ../process/maintainer-soc + ../process/maintainer-soc-clean-dts ../driver-api/media/maintainer-entry-profile + ../process/maintainer-netdev ../driver-api/vfio-pci-device-specific-driver-acceptance ../nvme/feature-and-quirk-policy ../filesystems/xfs/xfs-maintainer-entry-profile diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst index be75971532f5..b517ee28a955 100644 --- a/Documentation/mm/slub.rst +++ b/Documentation/mm/slub.rst @@ -9,7 +9,7 @@ SLUB can enable debugging only for selected slabs in order to avoid an impact on overall system performance which may make a bug more difficult to find. -In order to switch debugging on one can add an option ``slub_debug`` +In order to switch debugging on one can add an option ``slab_debug`` to the kernel command line. That will enable full debugging for all slabs. @@ -26,16 +26,16 @@ be enabled on the command line. F.e. no tracking information will be available without debugging on and validation can only partially be performed if debugging was not switched on. -Some more sophisticated uses of slub_debug: +Some more sophisticated uses of slab_debug: ------------------------------------------- -Parameters may be given to ``slub_debug``. If none is specified then full +Parameters may be given to ``slab_debug``. If none is specified then full debugging is enabled. Format: -slub_debug=<Debug-Options> +slab_debug=<Debug-Options> Enable options for all slabs -slub_debug=<Debug-Options>,<slab name1>,<slab name2>,... +slab_debug=<Debug-Options>,<slab name1>,<slab name2>,... Enable options only for select slabs (no spaces after a comma) @@ -60,23 +60,23 @@ Possible debug options are:: F.e. in order to boot just with sanity checks and red zoning one would specify:: - slub_debug=FZ + slab_debug=FZ Trying to find an issue in the dentry cache? Try:: - slub_debug=,dentry + slab_debug=,dentry to only enable debugging on the dentry cache. You may use an asterisk at the end of the slab name, in order to cover all slabs with the same prefix. For example, here's how you can poison the dentry cache as well as all kmalloc slabs:: - slub_debug=P,kmalloc-*,dentry + slab_debug=P,kmalloc-*,dentry Red zoning and tracking may realign the slab. We can just apply sanity checks to the dentry cache with:: - slub_debug=F,dentry + slab_debug=F,dentry Debugging options may require the minimum possible slab order to increase as a result of storing the metadata (for example, caches with PAGE_SIZE object @@ -84,20 +84,20 @@ sizes). This has a higher liklihood of resulting in slab allocation errors in low memory situations or if there's high fragmentation of memory. To switch off debugging for such caches by default, use:: - slub_debug=O + slab_debug=O You can apply different options to different list of slab names, using blocks of options. This will enable red zoning for dentry and user tracking for kmalloc. All other slabs will not get any debugging enabled:: - slub_debug=Z,dentry;U,kmalloc-* + slab_debug=Z,dentry;U,kmalloc-* You can also enable options (e.g. sanity checks and poisoning) for all caches except some that are deemed too performance critical and don't need to be debugged by specifying global debug options followed by a list of slab names with "-" as options:: - slub_debug=FZ;-,zs_handle,zspage + slab_debug=FZ;-,zs_handle,zspage The state of each debug option for a slab can be found in the respective files under:: @@ -105,7 +105,7 @@ under:: /sys/kernel/slab/<slab name>/ If the file contains 1, the option is enabled, 0 means disabled. The debug -options from the ``slub_debug`` parameter translate to the following files:: +options from the ``slab_debug`` parameter translate to the following files:: F sanity_checks Z red_zone @@ -129,7 +129,7 @@ in order to reduce overhead and increase cache hotness of objects. Slab validation =============== -SLUB can validate all object if the kernel was booted with slub_debug. In +SLUB can validate all object if the kernel was booted with slab_debug. In order to do so you must have the ``slabinfo`` tool. Then you can do :: @@ -150,29 +150,29 @@ list_lock once in a while to deal with partial slabs. That overhead is governed by the order of the allocation for each slab. The allocations can be influenced by kernel parameters: -.. slub_min_objects=x (default 4) -.. slub_min_order=x (default 0) -.. slub_max_order=x (default 3 (PAGE_ALLOC_COSTLY_ORDER)) +.. slab_min_objects=x (default: automatically scaled by number of cpus) +.. slab_min_order=x (default 0) +.. slab_max_order=x (default 3 (PAGE_ALLOC_COSTLY_ORDER)) -``slub_min_objects`` +``slab_min_objects`` allows to specify how many objects must at least fit into one slab in order for the allocation order to be acceptable. In general slub will be able to perform this number of allocations on a slab without consulting centralized resources (list_lock) where contention may occur. -``slub_min_order`` +``slab_min_order`` specifies a minimum order of slabs. A similar effect like - ``slub_min_objects``. + ``slab_min_objects``. -``slub_max_order`` - specified the order at which ``slub_min_objects`` should no +``slab_max_order`` + specified the order at which ``slab_min_objects`` should no longer be checked. This is useful to avoid SLUB trying to - generate super large order pages to fit ``slub_min_objects`` + generate super large order pages to fit ``slab_min_objects`` of a slab cache with large object sizes into one high order page. Setting command line parameter ``debug_guardpage_minorder=N`` (N > 0), forces setting - ``slub_max_order`` to 0, what cause minimum possible order of + ``slab_max_order`` to 0, what cause minimum possible order of slabs allocation. SLUB Debug output @@ -219,7 +219,7 @@ Here is a sample of slub debug output:: FIX kmalloc-8: Restoring Redzone 0xc90f6d28-0xc90f6d2b=0xcc If SLUB encounters a corrupted object (full detection requires the kernel -to be booted with slub_debug) then the following output will be dumped +to be booted with slab_debug) then the following output will be dumped into the syslog: 1. Description of the problem encountered @@ -239,7 +239,7 @@ into the syslog: pid=<pid of the process> (Object allocation / free information is only available if SLAB_STORE_USER is - set for the slab. slub_debug sets that option) + set for the slab. slab_debug sets that option) 2. The object contents if an object was involved. @@ -262,7 +262,7 @@ into the syslog: the object boundary. (Redzone information is only available if SLAB_RED_ZONE is set. - slub_debug sets that option) + slab_debug sets that option) Padding <address> : <bytes> Unused data to fill up the space in order to get the next object @@ -296,7 +296,7 @@ Emergency operations Minimal debugging (sanity checks alone) can be enabled by booting with:: - slub_debug=F + slab_debug=F This will be generally be enough to enable the resiliency features of slub which will keep the system running even if a bad kernel component will @@ -311,13 +311,13 @@ and enabling debugging only for that cache I.e.:: - slub_debug=F,dentry + slab_debug=F,dentry If the corruption occurs by writing after the end of the object then it may be advisable to enable a Redzone to avoid corrupting the beginning of other objects:: - slub_debug=FZ,dentry + slab_debug=FZ,dentry Extended slabinfo mode and plotting =================================== diff --git a/Documentation/netlink/genetlink-c.yaml b/Documentation/netlink/genetlink-c.yaml index c58f7153fcf8..4dfd899a1661 100644 --- a/Documentation/netlink/genetlink-c.yaml +++ b/Documentation/netlink/genetlink-c.yaml @@ -11,7 +11,7 @@ $defs: minimum: 0 len-or-define: type: [ string, integer ] - pattern: ^[0-9A-Za-z_]+( - 1)?$ + pattern: ^[0-9A-Za-z_-]+( - 1)?$ minimum: 0 len-or-limit: # literal int or limit based on fixed-width type e.g. u8-min, u16-max, etc. @@ -126,8 +126,9 @@ properties: Prefix for the C enum name of the attributes. Default family[name]-set[name]-a- type: string enum-name: - description: Name for the enum type of the attribute. - type: string + description: | + Name for the enum type of the attribute, if empty no name will be used. + type: [ string, "null" ] doc: description: Documentation of the space. type: string @@ -208,6 +209,11 @@ properties: exact-len: description: Exact length for a string or a binary attribute. $ref: '#/$defs/len-or-define' + unterminated-ok: + description: | + For string attributes, do not check whether attribute + contains the terminating null character. + type: boolean sub-type: *attr-type display-hint: &display-hint description: | @@ -261,14 +267,16 @@ properties: the prefix with the upper case name of the command, with dashes replaced by underscores. type: string enum-name: - description: Name for the enum type with commands. - type: string + description: | + Name for the enum type with commands, if empty no name will be used. + type: [ string, "null" ] async-prefix: description: Same as name-prefix but used to render notifications and events to separate enum. type: string async-enum: - description: Name for the enum type with notifications/events. - type: string + description: | + Name for the enum type with commands, if empty no name will be used. + type: [ string, "null" ] list: description: List of commands type: array @@ -370,3 +378,22 @@ properties: type: string # End genetlink-c flags: *cmd_flags + + kernel-family: + description: Additional global attributes used for kernel C code generation. + type: object + additionalProperties: False + properties: + headers: + description: | + List of extra headers which should be included in the source + of the generated code. + type: array + items: + type: string + sock-priv: + description: | + Literal name of the type which is used within the kernel + to store the socket state. The type / structure is internal + to the kernel, and is not defined in the spec. + type: string diff --git a/Documentation/netlink/genetlink-legacy.yaml b/Documentation/netlink/genetlink-legacy.yaml index 938703088306..b48ad3b1cc32 100644 --- a/Documentation/netlink/genetlink-legacy.yaml +++ b/Documentation/netlink/genetlink-legacy.yaml @@ -11,7 +11,7 @@ $defs: minimum: 0 len-or-define: type: [ string, integer ] - pattern: ^[0-9A-Za-z_]+( - 1)?$ + pattern: ^[0-9A-Za-z_-]+( - 1)?$ minimum: 0 len-or-limit: # literal int or limit based on fixed-width type e.g. u8-min, u16-max, etc. @@ -168,8 +168,9 @@ properties: Prefix for the C enum name of the attributes. Default family[name]-set[name]-a- type: string enum-name: - description: Name for the enum type of the attribute. - type: string + description: | + Name for the enum type of the attribute, if empty no name will be used. + type: [ string, "null" ] doc: description: Documentation of the space. type: string @@ -251,6 +252,11 @@ properties: exact-len: description: Exact length for a string or a binary attribute. $ref: '#/$defs/len-or-define' + unterminated-ok: + description: | + For string attributes, do not check whether attribute + contains the terminating null character. + type: boolean sub-type: *attr-type display-hint: *display-hint # Start genetlink-c @@ -304,14 +310,16 @@ properties: the prefix with the upper case name of the command, with dashes replaced by underscores. type: string enum-name: - description: Name for the enum type with commands. - type: string + description: | + Name for the enum type with commands, if empty no name will be used. + type: [ string, "null" ] async-prefix: description: Same as name-prefix but used to render notifications and events to separate enum. type: string async-enum: - description: Name for the enum type with notifications/events. - type: string + description: | + Name for the enum type with commands, if empty no name will be used. + type: [ string, "null" ] # Start genetlink-legacy fixed-header: &fixed-header description: | @@ -431,3 +439,22 @@ properties: type: string # End genetlink-c flags: *cmd_flags + + kernel-family: + description: Additional global attributes used for kernel C code generation. + type: object + additionalProperties: False + properties: + headers: + description: | + List of extra headers which should be included in the source + of the generated code. + type: array + items: + type: string + sock-priv: + description: | + Literal name of the type which is used within the kernel + to store the socket state. The type / structure is internal + to the kernel, and is not defined in the spec. + type: string diff --git a/Documentation/netlink/genetlink.yaml b/Documentation/netlink/genetlink.yaml index 3283bf458ff1..ebd6ee743fcc 100644 --- a/Documentation/netlink/genetlink.yaml +++ b/Documentation/netlink/genetlink.yaml @@ -11,7 +11,7 @@ $defs: minimum: 0 len-or-define: type: [ string, integer ] - pattern: ^[0-9A-Za-z_]+( - 1)?$ + pattern: ^[0-9A-Za-z_-]+( - 1)?$ minimum: 0 len-or-limit: # literal int or limit based on fixed-width type e.g. u8-min, u16-max, etc. @@ -328,3 +328,22 @@ properties: The name for the group, used to form the define and the value of the define. type: string flags: *cmd_flags + + kernel-family: + description: Additional global attributes used for kernel C code generation. + type: object + additionalProperties: False + properties: + headers: + description: | + List of extra headers which should be included in the source + of the generated code. + type: array + items: + type: string + sock-priv: + description: | + Literal name of the type which is used within the kernel + to store the socket state. The type / structure is internal + to the kernel, and is not defined in the spec. + type: string diff --git a/Documentation/netlink/netlink-raw.yaml b/Documentation/netlink/netlink-raw.yaml index 04b92f1a5cd6..a76e54cbadbc 100644 --- a/Documentation/netlink/netlink-raw.yaml +++ b/Documentation/netlink/netlink-raw.yaml @@ -11,7 +11,7 @@ $defs: minimum: 0 len-or-define: type: [ string, integer ] - pattern: ^[0-9A-Za-z_]+( - 1)?$ + pattern: ^[0-9A-Za-z_-]+( - 1)?$ minimum: 0 # Schema for specs @@ -152,14 +152,23 @@ properties: the right formatting mechanism when displaying values of this type. enum: [ hex, mac, fddi, ipv4, ipv6, uuid ] + struct: + description: Name of the nested struct type. + type: string if: properties: type: - oneOf: - - const: binary - - const: pad + const: pad then: required: [ len ] + if: + properties: + type: + const: binary + then: + oneOf: + - required: [ len ] + - required: [ struct ] # End genetlink-legacy attribute-sets: @@ -180,8 +189,9 @@ properties: Prefix for the C enum name of the attributes. Default family[name]-set[name]-a- type: string enum-name: - description: Name for the enum type of the attribute. - type: string + description: | + Name for the enum type of the attribute, if empty no name will be used. + type: [ string, "null" ] doc: description: Documentation of the space. type: string @@ -261,6 +271,11 @@ properties: exact-len: description: Exact length for a string or a binary attribute. $ref: '#/$defs/len-or-define' + unterminated-ok: + description: | + For string attributes, do not check whether attribute + contains the terminating null character. + type: boolean sub-type: *attr-type display-hint: *display-hint # Start genetlink-c @@ -362,14 +377,16 @@ properties: the prefix with the upper case name of the command, with dashes replaced by underscores. type: string enum-name: - description: Name for the enum type with commands. - type: string + description: | + Name for the enum type with commands, if empty no name will be used. + type: [ string, "null" ] async-prefix: description: Same as name-prefix but used to render notifications and events to separate enum. type: string async-enum: - description: Name for the enum type with notifications/events. - type: string + description: | + Name for the enum type with commands, if empty no name will be used. + type: [ string, "null" ] # Start genetlink-legacy fixed-header: &fixed-header description: | diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml index cf6eaa0da821..09fbb4c03fc8 100644 --- a/Documentation/netlink/specs/devlink.yaml +++ b/Documentation/netlink/specs/devlink.yaml @@ -290,7 +290,7 @@ attribute-sets: enum: eswitch-mode - name: eswitch-inline-mode - type: u16 + type: u8 enum: eswitch-inline-mode - name: dpipe-tables diff --git a/Documentation/netlink/specs/dpll.yaml b/Documentation/netlink/specs/dpll.yaml index b14aed18065f..95b0eb1486bf 100644 --- a/Documentation/netlink/specs/dpll.yaml +++ b/Documentation/netlink/specs/dpll.yaml @@ -52,6 +52,40 @@ definitions: dpll's lock-state shall remain DPLL_LOCK_STATUS_UNLOCKED) render-max: true - + type: enum + name: lock-status-error + doc: | + if previous status change was done due to a failure, this provides + information of dpll device lock status error. + Valid values for DPLL_A_LOCK_STATUS_ERROR attribute + entries: + - + name: none + doc: | + dpll device lock status was changed without any error + value: 1 + - + name: undefined + doc: | + dpll device lock status was changed due to undefined error. + Driver fills this value up in case it is not able + to obtain suitable exact error type. + - + name: media-down + doc: | + dpll device lock status was changed because of associated + media got down. + This may happen for example if dpll device was previously + locked on an input pin of type PIN_TYPE_SYNCE_ETH_PORT. + - + name: fractional-frequency-offset-too-high + doc: | + the FFO (Fractional Frequency Offset) between the RX and TX + symbol rate on the media got too high. + This may happen for example if dpll device was previously + locked on an input pin of type PIN_TYPE_SYNCE_ETH_PORT. + render-max: true + - type: const name: temp-divider value: 1000 @@ -214,6 +248,10 @@ attribute-sets: name: type type: u32 enum: type + - + name: lock-status-error + type: u32 + enum: lock-status-error - name: pin enum-name: dpll_a_pin @@ -274,6 +312,7 @@ attribute-sets: - name: capabilities type: u32 + enum: pin-capabilities - name: parent-device type: nest @@ -379,13 +418,12 @@ operations: - mode - mode-supported - lock-status + - lock-status-error - temp - clock-id - type dump: - pre: dpll-lock-dumpit - post: dpll-unlock-dumpit reply: *dev-attrs - @@ -473,8 +511,6 @@ operations: - fractional-frequency-offset dump: - pre: dpll-lock-dumpit - post: dpll-unlock-dumpit request: attributes: - id diff --git a/Documentation/netlink/specs/mptcp_pm.yaml b/Documentation/netlink/specs/mptcp_pm.yaml index 49f90cfb4698..af525ed29792 100644 --- a/Documentation/netlink/specs/mptcp_pm.yaml +++ b/Documentation/netlink/specs/mptcp_pm.yaml @@ -292,13 +292,14 @@ operations: - name: get-addr doc: Get endpoint information - attribute-set: endpoint + attribute-set: attr dont-validate: [ strict ] flags: [ uns-admin-perm ] do: &get-addr-attrs request: attributes: - addr + - token reply: attributes: - addr diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml index 3addac970680..76352dbd2be4 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -74,6 +74,10 @@ definitions: name: queue-type type: enum entries: [ rx, tx ] + - + name: qstats-scope + type: flags + entries: [ queue ] attribute-sets: - @@ -265,6 +269,73 @@ attribute-sets: doc: ID of the NAPI instance which services this queue. type: u32 + - + name: qstats + doc: | + Get device statistics, scoped to a device or a queue. + These statistics extend (and partially duplicate) statistics available + in struct rtnl_link_stats64. + Value of the `scope` attribute determines how statistics are + aggregated. When aggregated for the entire device the statistics + represent the total number of events since last explicit reset of + the device (i.e. not a reconfiguration like changing queue count). + When reported per-queue, however, the statistics may not add + up to the total number of events, will only be reported for currently + active objects, and will likely report the number of events since last + reconfiguration. + attributes: + - + name: ifindex + doc: ifindex of the netdevice to which stats belong. + type: u32 + checks: + min: 1 + - + name: queue-type + doc: Queue type as rx, tx, for queue-id. + type: u32 + enum: queue-type + - + name: queue-id + doc: Queue ID, if stats are scoped to a single queue instance. + type: u32 + - + name: scope + doc: | + What object type should be used to iterate over the stats. + type: uint + enum: qstats-scope + - + name: rx-packets + doc: | + Number of wire packets successfully received and passed to the stack. + For drivers supporting XDP, XDP is considered the first layer + of the stack, so packets consumed by XDP are still counted here. + type: uint + value: 8 # reserve some attr ids in case we need more metadata later + - + name: rx-bytes + doc: Successfully received bytes, see `rx-packets`. + type: uint + - + name: tx-packets + doc: | + Number of wire packets successfully sent. Packet is considered to be + successfully sent once it is in device memory (usually this means + the device has issued a DMA completion for the packet). + type: uint + - + name: tx-bytes + doc: Successfully sent bytes, see `tx-packets`. + type: uint + - + name: rx-alloc-fail + doc: | + Number of times skb or buffer allocation failed on the Rx datapath. + Allocation failure may, or may not result in a packet drop, depending + on driver implementation and whether system recovers quickly. + type: uint + operations: list: - @@ -405,6 +476,26 @@ operations: attributes: - ifindex reply: *napi-get-op + - + name: qstats-get + doc: | + Get / dump fine grained statistics. Which statistics are reported + depends on the device and the driver, and whether the driver stores + software counters per-queue. + attribute-set: qstats + dump: + request: + attributes: + - scope + reply: + attributes: + - ifindex + - queue-type + - queue-id + - rx-packets + - rx-bytes + - tx-packets + - tx-bytes mcast-groups: list: diff --git a/Documentation/netlink/specs/nlctrl.yaml b/Documentation/netlink/specs/nlctrl.yaml new file mode 100644 index 000000000000..b1632b95f725 --- /dev/null +++ b/Documentation/netlink/specs/nlctrl.yaml @@ -0,0 +1,206 @@ +# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) + +name: nlctrl +protocol: genetlink-legacy +uapi-header: linux/genetlink.h + +doc: | + genetlink meta-family that exposes information about all genetlink + families registered in the kernel (including itself). + +definitions: + - + name: op-flags + type: flags + enum-name: + entries: + - admin-perm + - cmd-cap-do + - cmd-cap-dump + - cmd-cap-haspol + - uns-admin-perm + - + name: attr-type + enum-name: netlink-attribute-type + type: enum + entries: + - invalid + - flag + - u8 + - u16 + - u32 + - u64 + - s8 + - s16 + - s32 + - s64 + - binary + - string + - nul-string + - nested + - nested-array + - bitfield32 + - sint + - uint + +attribute-sets: + - + name: ctrl-attrs + name-prefix: ctrl-attr- + attributes: + - + name: family-id + type: u16 + - + name: family-name + type: string + - + name: version + type: u32 + - + name: hdrsize + type: u32 + - + name: maxattr + type: u32 + - + name: ops + type: array-nest + nested-attributes: op-attrs + - + name: mcast-groups + type: array-nest + nested-attributes: mcast-group-attrs + - + name: policy + type: nest-type-value + type-value: [ policy-id, attr-id ] + nested-attributes: policy-attrs + - + name: op-policy + type: nest-type-value + type-value: [ op-id ] + nested-attributes: op-policy-attrs + - + name: op + type: u32 + - + name: mcast-group-attrs + name-prefix: ctrl-attr-mcast-grp- + enum-name: + attributes: + - + name: name + type: string + - + name: id + type: u32 + - + name: op-attrs + name-prefix: ctrl-attr-op- + enum-name: + attributes: + - + name: id + type: u32 + - + name: flags + type: u32 + enum: op-flags + enum-as-flags: true + - + name: policy-attrs + name-prefix: nl-policy-type-attr- + enum-name: + attributes: + - + name: type + type: u32 + enum: attr-type + - + name: min-value-s + type: s64 + - + name: max-value-s + type: s64 + - + name: min-value-u + type: u64 + - + name: max-value-u + type: u64 + - + name: min-length + type: u32 + - + name: max-length + type: u32 + - + name: policy-idx + type: u32 + - + name: policy-maxtype + type: u32 + - + name: bitfield32-mask + type: u32 + - + name: mask + type: u64 + - + name: pad + type: pad + - + name: op-policy-attrs + name-prefix: ctrl-attr-policy- + enum-name: + attributes: + - + name: do + type: u32 + - + name: dump + type: u32 + +operations: + enum-model: directional + name-prefix: ctrl-cmd- + list: + - + name: getfamily + doc: Get / dump genetlink families + attribute-set: ctrl-attrs + do: + request: + value: 3 + attributes: + - family-name + reply: &all-attrs + value: 1 + attributes: + - family-id + - family-name + - hdrsize + - maxattr + - mcast-groups + - ops + - version + dump: + reply: *all-attrs + - + name: getpolicy + doc: Get / dump genetlink policies + attribute-set: ctrl-attrs + dump: + request: + value: 10 + attributes: + - family-name + - family-id + - op + reply: + value: 10 + attributes: + - family-id + - op-policy + - policy diff --git a/Documentation/netlink/specs/rt_link.yaml b/Documentation/netlink/specs/rt_link.yaml index 1ad01d52a863..8e4d19adee8c 100644 --- a/Documentation/netlink/specs/rt_link.yaml +++ b/Documentation/netlink/specs/rt_link.yaml @@ -942,6 +942,10 @@ attribute-sets: - name: gro-ipv4-max-size type: u32 + - + name: dpll-pin + type: nest + nested-attributes: link-dpll-pin-attrs - name: af-spec-attrs attributes: @@ -1627,6 +1631,12 @@ attribute-sets: - name: used type: u8 + - + name: link-dpll-pin-attrs + attributes: + - + name: id + type: u32 sub-messages: - diff --git a/Documentation/netlink/specs/tc.yaml b/Documentation/netlink/specs/tc.yaml index 4346fa402fc9..324fa182cd14 100644 --- a/Documentation/netlink/specs/tc.yaml +++ b/Documentation/netlink/specs/tc.yaml @@ -48,21 +48,28 @@ definitions: - name: bytes type: u64 + doc: Number of enqueued bytes - name: packets type: u32 + doc: Number of enqueued packets - name: drops type: u32 + doc: Packets dropped because of lack of resources - name: overlimits type: u32 + doc: | + Number of throttle events when this flow goes out of allocated bandwidth - name: bps type: u32 + doc: Current flow byte rate - name: pps type: u32 + doc: Current flow packet rate - name: qlen type: u32 @@ -112,6 +119,7 @@ definitions: - name: limit type: u32 + doc: Queue length; bytes for bfifo, packets for pfifo - name: tc-htb-opt type: struct @@ -119,11 +127,11 @@ definitions: - name: rate type: binary - len: 12 + struct: tc-ratespec - name: ceil type: binary - len: 12 + struct: tc-ratespec - name: buffer type: u32 @@ -149,15 +157,19 @@ definitions: - name: rate2quantum type: u32 + doc: bps->quantum divisor - name: defcls type: u32 + doc: Default class number - name: debug type: u32 + doc: Debug flags - name: direct-pkts type: u32 + doc: Count of non shaped packets - name: tc-gred-qopt type: struct @@ -165,15 +177,19 @@ definitions: - name: limit type: u32 + doc: HARD maximal queue length in bytes - name: qth-min type: u32 + doc: Min average length threshold in bytes - name: qth-max type: u32 + doc: Max average length threshold in bytes - name: DP type: u32 + doc: Up to 2^32 DPs - name: backlog type: u32 @@ -195,15 +211,19 @@ definitions: - name: Wlog type: u8 + doc: log(W) - name: Plog type: u8 + doc: log(P_max / (qth-max - qth-min)) - name: Scell_log type: u8 + doc: cell size for idle damping - name: prio type: u8 + doc: Priority of this VQ - name: packets type: u32 @@ -266,9 +286,11 @@ definitions: - name: bands type: u16 + doc: Number of bands - name: max-bands type: u16 + doc: Maximum number of queues - name: tc-netem-qopt type: struct @@ -276,21 +298,138 @@ definitions: - name: latency type: u32 + doc: Added delay in microseconds - name: limit type: u32 + doc: Fifo limit in packets - name: loss type: u32 + doc: Random packet loss (0=none, ~0=100%) - name: gap type: u32 + doc: Re-ordering gap (0 for none) - name: duplicate type: u32 + doc: Random packet duplication (0=none, ~0=100%) - name: jitter type: u32 + doc: Random jitter latency in microseconds + - + name: tc-netem-gimodel + doc: State transition probabilities for 4 state model + type: struct + members: + - + name: p13 + type: u32 + - + name: p31 + type: u32 + - + name: p32 + type: u32 + - + name: p14 + type: u32 + - + name: p23 + type: u32 + - + name: tc-netem-gemodel + doc: Gilbert-Elliot models + type: struct + members: + - + name: p + type: u32 + - + name: r + type: u32 + - + name: h + type: u32 + - + name: k1 + type: u32 + - + name: tc-netem-corr + type: struct + members: + - + name: delay-corr + type: u32 + doc: Delay correlation + - + name: loss-corr + type: u32 + doc: Packet loss correlation + - + name: dup-corr + type: u32 + doc: Duplicate correlation + - + name: tc-netem-reorder + type: struct + members: + - + name: probability + type: u32 + - + name: correlation + type: u32 + - + name: tc-netem-corrupt + type: struct + members: + - + name: probability + type: u32 + - + name: correlation + type: u32 + - + name: tc-netem-rate + type: struct + members: + - + name: rate + type: u32 + - + name: packet-overhead + type: s32 + - + name: cell-size + type: u32 + - + name: cell-overhead + type: s32 + - + name: tc-netem-slot + type: struct + members: + - + name: min-delay + type: s64 + - + name: max-delay + type: s64 + - + name: max-packets + type: s32 + - + name: max-bytes + type: s32 + - + name: dist-delay + type: s64 + - + name: dist-jitter + type: s64 - name: tc-plug-qopt type: struct @@ -307,11 +446,13 @@ definitions: members: - name: bands - type: u16 + type: u32 + doc: Number of bands - name: priomap type: binary len: 16 + doc: Map of logical priority -> PRIO band - name: tc-red-qopt type: struct @@ -319,21 +460,27 @@ definitions: - name: limit type: u32 + doc: Hard queue length in packets - name: qth-min type: u32 + doc: Min average threshold in packets - name: qth-max type: u32 + doc: Max average threshold in packets - name: Wlog type: u8 + doc: log(W) - name: Plog type: u8 + doc: log(P_max / (qth-max - qth-min)) - name: Scell-log type: u8 + doc: Cell size for idle damping - name: flags type: u8 @@ -369,71 +516,128 @@ definitions: name: penalty-burst type: u32 - - name: tc-sfq-qopt-v1 # TODO nested structs + name: tc-sfq-qopt type: struct members: - name: quantum type: u32 + doc: Bytes per round allocated to flow - name: perturb-period type: s32 + doc: Period of hash perturbation - name: limit type: u32 + doc: Maximal packets in queue - name: divisor type: u32 + doc: Hash divisor - name: flows type: u32 + doc: Maximal number of flows + - + name: tc-sfqred-stats + type: struct + members: + - + name: prob-drop + type: u32 + doc: Early drops, below max threshold + - + name: forced-drop + type: u32 + doc: Early drops, after max threshold + - + name: prob-mark + type: u32 + doc: Marked packets, below max threshold + - + name: forced-mark + type: u32 + doc: Marked packets, after max threshold + - + name: prob-mark-head + type: u32 + doc: Marked packets, below max threshold + - + name: forced-mark-head + type: u32 + doc: Marked packets, after max threshold + - + name: tc-sfq-qopt-v1 + type: struct + members: + - + name: v0 + type: binary + struct: tc-sfq-qopt - name: depth type: u32 + doc: Maximum number of packets per flow - name: headdrop type: u32 - name: limit type: u32 + doc: HARD maximal flow queue length in bytes - name: qth-min type: u32 + doc: Min average length threshold in bytes - - name: qth-mac + name: qth-max type: u32 + doc: Max average length threshold in bytes - name: Wlog type: u8 + doc: log(W) - name: Plog type: u8 + doc: log(P_max / (qth-max - qth-min)) - name: Scell-log type: u8 + doc: Cell size for idle damping - name: flags type: u8 - name: max-P type: u32 + doc: probabilty, high resolution - - name: prob-drop - type: u32 + name: stats + type: binary + struct: tc-sfqred-stats + - + name: tc-ratespec + type: struct + members: - - name: forced-drop - type: u32 + name: cell-log + type: u8 - - name: prob-mark - type: u32 + name: linklayer + type: u8 - - name: forced-mark - type: u32 + name: overhead + type: u8 - - name: prob-mark-head - type: u32 + name: cell-align + type: u8 - - name: forced-mark-head + name: mpu + type: u8 + - + name: rate type: u32 - name: tc-tbf-qopt @@ -441,12 +645,12 @@ definitions: members: - name: rate - type: binary # TODO nested struct tc_ratespec - len: 12 + type: binary + struct: tc-ratespec - name: peakrate - type: binary # TODO nested struct tc_ratespec - len: 12 + type: binary + struct: tc-ratespec - name: limit type: u32 @@ -491,9 +695,663 @@ definitions: - name: interval type: s8 + doc: Sampling period - name: ewma-log type: u8 + doc: The log() of measurement window weight + - + name: tc-choke-xstats + type: struct + members: + - + name: early + type: u32 + doc: Early drops + - + name: pdrop + type: u32 + doc: Drops due to queue limits + - + name: other + type: u32 + doc: Drops due to drop() calls + - + name: marked + type: u32 + doc: Marked packets + - + name: matched + type: u32 + doc: Drops due to flow match + - + name: tc-codel-xstats + type: struct + members: + - + name: maxpacket + type: u32 + doc: Largest packet we've seen so far + - + name: count + type: u32 + doc: How many drops we've done since the last time we entered dropping state + - + name: lastcount + type: u32 + doc: Count at entry to dropping state + - + name: ldelay + type: u32 + doc: in-queue delay seen by most recently dequeued packet + - + name: drop-next + type: s32 + doc: Time to drop next packet + - + name: drop-overlimit + type: u32 + doc: Number of times max qdisc packet limit was hit + - + name: ecn-mark + type: u32 + doc: Number of packets we've ECN marked instead of dropped + - + name: dropping + type: u32 + doc: Are we in a dropping state? + - + name: ce-mark + type: u32 + doc: Number of CE marked packets because of ce-threshold + - + name: tc-fq-codel-xstats + type: struct + members: + - + name: type + type: u32 + - + name: maxpacket + type: u32 + doc: Largest packet we've seen so far + - + name: drop-overlimit + type: u32 + doc: Number of times max qdisc packet limit was hit + - + name: ecn-mark + type: u32 + doc: Number of packets we ECN marked instead of being dropped + - + name: new-flow-count + type: u32 + doc: Number of times packets created a new flow + - + name: new-flows-len + type: u32 + doc: Count of flows in new list + - + name: old-flows-len + type: u32 + doc: Count of flows in old list + - + name: ce-mark + type: u32 + doc: Packets above ce-threshold + - + name: memory-usage + type: u32 + doc: Memory usage in bytes + - + name: drop-overmemory + type: u32 + - + name: tc-fq-pie-xstats + type: struct + members: + - + name: packets-in + type: u32 + doc: Total number of packets enqueued + - + name: dropped + type: u32 + doc: Packets dropped due to fq_pie_action + - + name: overlimit + type: u32 + doc: Dropped due to lack of space in queue + - + name: overmemory + type: u32 + doc: Dropped due to lack of memory in queue + - + name: ecn-mark + type: u32 + doc: Packets marked with ecn + - + name: new-flow-count + type: u32 + doc: Count of new flows created by packets + - + name: new-flows-len + type: u32 + doc: Count of flows in new list + - + name: old-flows-len + type: u32 + doc: Count of flows in old list + - + name: memory-usage + type: u32 + doc: Total memory across all queues + - + name: tc-fq-qd-stats + type: struct + members: + - + name: gc-flows + type: u64 + - + name: highprio-packets + type: u64 + doc: obsolete + - + name: tcp-retrans + type: u64 + doc: obsolete + - + name: throttled + type: u64 + - + name: flows-plimit + type: u64 + - + name: pkts-too-long + type: u64 + - + name: allocation-errors + type: u64 + - + name: time-next-delayed-flow + type: s64 + - + name: flows + type: u32 + - + name: inactive-flows + type: u32 + - + name: throttled-flows + type: u32 + - + name: unthrottle-latency-ns + type: u32 + - + name: ce-mark + type: u64 + doc: Packets above ce-threshold + - + name: horizon-drops + type: u64 + - + name: horizon-caps + type: u64 + - + name: fastpath-packets + type: u64 + - + name: band-drops + type: binary + len: 24 + - + name: band-pkt-count + type: binary + len: 12 + - + name: pad + type: pad + len: 4 + - + name: tc-hhf-xstats + type: struct + members: + - + name: drop-overlimit + type: u32 + doc: Number of times max qdisc packet limit was hit + - + name: hh-overlimit + type: u32 + doc: Number of times max heavy-hitters was hit + - + name: hh-tot-count + type: u32 + doc: Number of captured heavy-hitters so far + - + name: hh-cur-count + type: u32 + doc: Number of current heavy-hitters + - + name: tc-pie-xstats + type: struct + members: + - + name: prob + type: u64 + doc: Current probability + - + name: delay + type: u32 + doc: Current delay in ms + - + name: avg-dq-rate + type: u32 + doc: Current average dq rate in bits/pie-time + - + name: dq-rate-estimating + type: u32 + doc: Is avg-dq-rate being calculated? + - + name: packets-in + type: u32 + doc: Total number of packets enqueued + - + name: dropped + type: u32 + doc: Packets dropped due to pie action + - + name: overlimit + type: u32 + doc: Dropped due to lack of space in queue + - + name: maxq + type: u32 + doc: Maximum queue size + - + name: ecn-mark + type: u32 + doc: Packets marked with ecn + - + name: tc-red-xstats + type: struct + members: + - + name: early + type: u32 + doc: Early drops + - + name: pdrop + type: u32 + doc: Drops due to queue limits + - + name: other + type: u32 + doc: Drops due to drop() calls + - + name: marked + type: u32 + doc: Marked packets + - + name: tc-sfb-xstats + type: struct + members: + - + name: earlydrop + type: u32 + - + name: penaltydrop + type: u32 + - + name: bucketdrop + type: u32 + - + name: queuedrop + type: u32 + - + name: childdrop + type: u32 + doc: drops in child qdisc + - + name: marked + type: u32 + - + name: maxqlen + type: u32 + - + name: maxprob + type: u32 + - + name: avgprob + type: u32 + - + name: tc-sfq-xstats + type: struct + members: + - + name: allot + type: s32 + - + name: gnet-stats-basic + type: struct + members: + - + name: bytes + type: u64 + - + name: packets + type: u32 + - + name: gnet-stats-rate-est + type: struct + members: + - + name: bps + type: u32 + - + name: pps + type: u32 + - + name: gnet-stats-rate-est64 + type: struct + members: + - + name: bps + type: u64 + - + name: pps + type: u64 + - + name: gnet-stats-queue + type: struct + members: + - + name: qlen + type: u32 + - + name: backlog + type: u32 + - + name: drops + type: u32 + - + name: requeues + type: u32 + - + name: overlimits + type: u32 + - + name: tc-u32-key + type: struct + members: + - + name: mask + type: u32 + byte-order: big-endian + - + name: val + type: u32 + byte-order: big-endian + - + name: "off" + type: s32 + - + name: offmask + type: s32 + - + name: tc-u32-sel + type: struct + members: + - + name: flags + type: u8 + - + name: offshift + type: u8 + - + name: nkeys + type: u8 + - + name: offmask + type: u16 + byte-order: big-endian + - + name: "off" + type: u16 + - + name: offoff + type: s16 + - + name: hoff + type: s16 + - + name: hmask + type: u32 + byte-order: big-endian + - + name: keys + type: binary + struct: tc-u32-key # TODO: array + - + name: tc-u32-pcnt + type: struct + members: + - + name: rcnt + type: u64 + - + name: rhit + type: u64 + - + name: kcnts + type: u64 # TODO: array + - + name: tcf-t + type: struct + members: + - + name: install + type: u64 + - + name: lastuse + type: u64 + - + name: expires + type: u64 + - + name: firstuse + type: u64 + - + name: tc-gen + type: struct + members: + - + name: index + type: u32 + - + name: capab + type: u32 + - + name: action + type: s32 + - + name: refcnt + type: s32 + - + name: bindcnt + type: s32 + - + name: tc-gact-p + type: struct + members: + - + name: ptype + type: u16 + - + name: pval + type: u16 + - + name: paction + type: s32 + - + name: tcf-ematch-tree-hdr + type: struct + members: + - + name: nmatches + type: u16 + - + name: progid + type: u16 + - + name: tc-basic-pcnt + type: struct + members: + - + name: rcnt + type: u64 + - + name: rhit + type: u64 + - + name: tc-matchall-pcnt + type: struct + members: + - + name: rhit + type: u64 + - + name: tc-mpls + type: struct + members: + - + name: index + type: u32 + - + name: capab + type: u32 + - + name: action + type: s32 + - + name: refcnt + type: s32 + - + name: bindcnt + type: s32 + - + name: m-action + type: s32 + - + name: tc-police + type: struct + members: + - + name: index + type: u32 + - + name: action + type: s32 + - + name: limit + type: u32 + - + name: burst + type: u32 + - + name: mtu + type: u32 + - + name: rate + type: binary + struct: tc-ratespec + - + name: peakrate + type: binary + struct: tc-ratespec + - + name: refcnt + type: s32 + - + name: bindcnt + type: s32 + - + name: capab + type: u32 + - + name: tc-pedit-sel + type: struct + members: + - + name: index + type: u32 + - + name: capab + type: u32 + - + name: action + type: s32 + - + name: refcnt + type: s32 + - + name: bindcnt + type: s32 + - + name: nkeys + type: u8 + - + name: flags + type: u8 + - + name: keys + type: binary + struct: tc-pedit-key # TODO: array + - + name: tc-pedit-key + type: struct + members: + - + name: mask + type: u32 + - + name: val + type: u32 + - + name: "off" + type: u32 + - + name: at + type: u32 + - + name: offmask + type: u32 + - + name: shift + type: u32 + - + name: tc-vlan + type: struct + members: + - + name: index + type: u32 + - + name: capab + type: u32 + - + name: action + type: s32 + - + name: refcnt + type: s32 + - + name: bindcnt + type: s32 + - + name: v-action + type: s32 attribute-sets: - name: tc-attrs @@ -512,7 +1370,9 @@ attribute-sets: struct: tc-stats - name: xstats - type: binary + type: sub-message + sub-message: tca-stats-app-msg + selector: kind - name: rate type: binary @@ -553,6 +1413,582 @@ attribute-sets: name: ext-warn-msg type: string - + name: tc-act-attrs + attributes: + - + name: kind + type: string + - + name: options + type: sub-message + sub-message: tc-act-options-msg + selector: kind + - + name: index + type: u32 + - + name: stats + type: nest + nested-attributes: tc-act-stats-attrs + - + name: pad + type: pad + - + name: cookie + type: binary + - + name: flags + type: bitfield32 + - + name: hw-stats + type: bitfield32 + - + name: used-hw-stats + type: bitfield32 + - + name: in-hw-count + type: u32 + - + name: tc-act-stats-attrs + attributes: + - + name: basic + type: binary + struct: gnet-stats-basic + - + name: rate-est + type: binary + struct: gnet-stats-rate-est + - + name: queue + type: binary + struct: gnet-stats-queue + - + name: app + type: binary + - + name: rate-est64 + type: binary + struct: gnet-stats-rate-est64 + - + name: pad + type: pad + - + name: basic-hw + type: binary + struct: gnet-stats-basic + - + name: pkt64 + type: u64 + - + name: tc-act-bpf-attrs + attributes: + - + name: tm + type: binary + struct: tcf-t + - + name: parms + type: binary + - + name: ops-len + type: u16 + - + name: ops + type: binary + - + name: fd + type: u32 + - + name: name + type: string + - + name: pad + type: pad + - + name: tag + type: binary + - + name: id + type: binary + - + name: tc-act-connmark-attrs + attributes: + - + name: parms + type: binary + - + name: tm + type: binary + struct: tcf-t + - + name: pad + type: pad + - + name: tc-act-csum-attrs + attributes: + - + name: parms + type: binary + - + name: tm + type: binary + struct: tcf-t + - + name: pad + type: pad + - + name: tc-act-ct-attrs + attributes: + - + name: parms + type: binary + - + name: tm + type: binary + struct: tcf-t + - + name: action + type: u16 + - + name: zone + type: u16 + - + name: mark + type: u32 + - + name: mark-mask + type: u32 + - + name: labels + type: binary + - + name: labels-mask + type: binary + - + name: nat-ipv4-min + type: u32 + byte-order: big-endian + - + name: nat-ipv4-max + type: u32 + byte-order: big-endian + - + name: nat-ipv6-min + type: binary + - + name: nat-ipv6-max + type: binary + - + name: nat-port-min + type: u16 + byte-order: big-endian + - + name: nat-port-max + type: u16 + byte-order: big-endian + - + name: pad + type: pad + - + name: helper-name + type: string + - + name: helper-family + type: u8 + - + name: helper-proto + type: u8 + - + name: tc-act-ctinfo-attrs + attributes: + - + name: pad + type: pad + - + name: tm + type: binary + struct: tcf-t + - + name: act + type: binary + - + name: zone + type: u16 + - + name: parms-dscp-mask + type: u32 + - + name: parms-dscp-statemask + type: u32 + - + name: parms-cpmark-mask + type: u32 + - + name: stats-dscp-set + type: u64 + - + name: stats-dscp-error + type: u64 + - + name: stats-cpmark-set + type: u64 + - + name: tc-act-gate-attrs + attributes: + - + name: tm + type: binary + struct: tcf-t + - + name: parms + type: binary + - + name: pad + type: pad + - + name: priority + type: s32 + - + name: entry-list + type: binary + - + name: base-time + type: u64 + - + name: cycle-time + type: u64 + - + name: cycle-time-ext + type: u64 + - + name: flags + type: u32 + - + name: clockid + type: s32 + - + name: tc-act-ife-attrs + attributes: + - + name: parms + type: binary + - + name: tm + type: binary + struct: tcf-t + - + name: dmac + type: binary + - + name: smac + type: binary + - + name: type + type: u16 + - + name: metalst + type: binary + - + name: pad + type: pad + - + name: tc-act-mirred-attrs + attributes: + - + name: tm + type: binary + struct: tcf-t + - + name: parms + type: binary + - + name: pad + type: pad + - + name: blockid + type: binary + - + name: tc-act-mpls-attrs + attributes: + - + name: tm + type: binary + struct: tcf-t + - + name: parms + type: binary + struct: tc-mpls + - + name: pad + type: pad + - + name: proto + type: u16 + byte-order: big-endian + - + name: label + type: u32 + - + name: tc + type: u8 + - + name: ttl + type: u8 + - + name: bos + type: u8 + - + name: tc-act-nat-attrs + attributes: + - + name: parms + type: binary + - + name: tm + type: binary + struct: tcf-t + - + name: pad + type: pad + - + name: tc-act-pedit-attrs + attributes: + - + name: tm + type: binary + struct: tcf-t + - + name: parms + type: binary + struct: tc-pedit-sel + - + name: pad + type: pad + - + name: parms-ex + type: binary + - + name: keys-ex + type: binary + - + name: key-ex + type: binary + - + name: tc-act-simple-attrs + attributes: + - + name: tm + type: binary + struct: tcf-t + - + name: parms + type: binary + - + name: data + type: binary + - + name: pad + type: pad + - + name: tc-act-skbedit-attrs + attributes: + - + name: tm + type: binary + struct: tcf-t + - + name: parms + type: binary + - + name: priority + type: u32 + - + name: queue-mapping + type: u16 + - + name: mark + type: u32 + - + name: pad + type: pad + - + name: ptype + type: u16 + - + name: mask + type: u32 + - + name: flags + type: u64 + - + name: queue-mapping-max + type: u16 + - + name: tc-act-skbmod-attrs + attributes: + - + name: tm + type: binary + struct: tcf-t + - + name: parms + type: binary + - + name: dmac + type: binary + - + name: smac + type: binary + - + name: etype + type: binary + - + name: pad + type: pad + - + name: tc-act-tunnel-key-attrs + attributes: + - + name: tm + type: binary + struct: tcf-t + - + name: parms + type: binary + - + name: enc-ipv4-src + type: u32 + byte-order: big-endian + - + name: enc-ipv4-dst + type: u32 + byte-order: big-endian + - + name: enc-ipv6-src + type: binary + - + name: enc-ipv6-dst + type: binary + - + name: enc-key-id + type: u64 + byte-order: big-endian + - + name: pad + type: pad + - + name: enc-dst-port + type: u16 + byte-order: big-endian + - + name: no-csum + type: u8 + - + name: enc-opts + type: binary + - + name: enc-tos + type: u8 + - + name: enc-ttl + type: u8 + - + name: no-frag + type: flag + - + name: tc-act-vlan-attrs + attributes: + - + name: tm + type: binary + struct: tcf-t + - + name: parms + type: binary + struct: tc-vlan + - + name: push-vlan-id + type: u16 + - + name: push-vlan-protocol + type: u16 + - + name: pad + type: pad + - + name: push-vlan-priority + type: u8 + - + name: push-eth-dst + type: binary + - + name: push-eth-src + type: binary + - + name: tc-basic-attrs + attributes: + - + name: classid + type: u32 + - + name: ematches + type: nest + nested-attributes: tc-ematch-attrs + - + name: act + type: array-nest + nested-attributes: tc-act-attrs + - + name: police + type: nest + nested-attributes: tc-police-attrs + - + name: pcnt + type: binary + struct: tc-basic-pcnt + - + name: pad + type: pad + - + name: tc-bpf-attrs + attributes: + - + name: act + type: nest + nested-attributes: tc-act-attrs + - + name: police + type: nest + nested-attributes: tc-police-attrs + - + name: classid + type: u32 + - + name: ops-len + type: u16 + - + name: ops + type: binary + - + name: fd + type: u32 + - + name: name + type: string + - + name: flags + type: u32 + - + name: flags-gen + type: u32 + - + name: tag + type: binary + - + name: id + type: u32 + - name: tc-cake-attrs attributes: - @@ -641,7 +2077,8 @@ attribute-sets: type: u32 - name: tin-stats - type: binary + type: array-nest + nested-attributes: tc-cake-tin-stats-attrs - name: deficit type: s32 @@ -661,6 +2098,84 @@ attribute-sets: name: blue-timer-us type: s32 - + name: tc-cake-tin-stats-attrs + attributes: + - + name: pad + type: pad + - + name: sent-packets + type: u32 + - + name: sent-bytes64 + type: u64 + - + name: dropped-packets + type: u32 + - + name: dropped-bytes64 + type: u64 + - + name: acks-dropped-packets + type: u32 + - + name: acks-dropped-bytes64 + type: u64 + - + name: ecn-marked-packets + type: u32 + - + name: ecn-marked-bytes64 + type: u64 + - + name: backlog-packets + type: u32 + - + name: backlog-bytes + type: u32 + - + name: threshold-rate64 + type: u64 + - + name: target-us + type: u32 + - + name: interval-us + type: u32 + - + name: way-indirect-hits + type: u32 + - + name: way-misses + type: u32 + - + name: way-collisions + type: u32 + - + name: peak-delay-us + type: u32 + - + name: avg-delay-us + type: u32 + - + name: base-delay-us + type: u32 + - + name: sparse-flows + type: u32 + - + name: bulk-flows + type: u32 + - + name: unresponsive-flows + type: u32 + - + name: max-skblen + type: u32 + - + name: flow-quantum + type: u32 + - name: tc-cbs-attrs attributes: - @@ -668,6 +2183,20 @@ attribute-sets: type: binary struct: tc-cbs-qopt - + name: tc-cgroup-attrs + attributes: + - + name: act + type: nest + nested-attributes: tc-act-attrs + - + name: police + type: nest + nested-attributes: tc-police-attrs + - + name: ematches + type: binary + - name: tc-choke-attrs attributes: - @@ -677,6 +2206,9 @@ attribute-sets: - name: stab type: binary + checks: + min-len: 256 + max-len: 256 - name: max-p type: u32 @@ -705,6 +2237,56 @@ attribute-sets: name: quantum type: u32 - + name: tc-ematch-attrs + attributes: + - + name: tree-hdr + type: binary + struct: tcf-ematch-tree-hdr + - + name: tree-list + type: binary + - + name: tc-flow-attrs + attributes: + - + name: keys + type: u32 + - + name: mode + type: u32 + - + name: baseclass + type: u32 + - + name: rshift + type: u32 + - + name: addend + type: u32 + - + name: mask + type: u32 + - + name: xor + type: u32 + - + name: divisor + type: u32 + - + name: act + type: binary + - + name: police + type: nest + nested-attributes: tc-police-attrs + - + name: ematches + type: binary + - + name: perturb + type: u32 + - name: tc-flower-attrs attributes: - @@ -953,15 +2535,19 @@ attribute-sets: - name: key-arp-sha type: binary + display-hint: mac - name: key-arp-sha-mask type: binary + display-hint: mac - name: key-arp-tha type: binary + display-hint: mac - name: key-arp-tha-mask type: binary + display-hint: mac - name: key-mpls-ttl type: u8 @@ -1020,10 +2606,12 @@ attribute-sets: type: u8 - name: key-enc-opts - type: binary + type: nest + nested-attributes: tc-flower-key-enc-opts-attrs - name: key-enc-opts-mask - type: binary + type: nest + nested-attributes: tc-flower-key-enc-opts-attrs - name: in-hw-count type: u32 @@ -1069,7 +2657,8 @@ attribute-sets: type: binary - name: key-mpls-opts - type: binary + type: nest + nested-attributes: tc-flower-key-mpls-opt-attrs - name: key-hash type: u32 @@ -1091,6 +2680,129 @@ attribute-sets: name: key-l2-tpv3-sid type: u32 byte-order: big-endian + - + name: l2-miss + type: u8 + - + name: key-cfm + type: nest + nested-attributes: tc-flower-key-cfm-attrs + - + name: key-spi + type: u32 + byte-order: big-endian + - + name: key-spi-mask + type: u32 + byte-order: big-endian + - + name: tc-flower-key-enc-opts-attrs + attributes: + - + name: geneve + type: nest + nested-attributes: tc-flower-key-enc-opt-geneve-attrs + - + name: vxlan + type: nest + nested-attributes: tc-flower-key-enc-opt-vxlan-attrs + - + name: erspan + type: nest + nested-attributes: tc-flower-key-enc-opt-erspan-attrs + - + name: gtp + type: nest + nested-attributes: tc-flower-key-enc-opt-gtp-attrs + - + name: tc-flower-key-enc-opt-geneve-attrs + attributes: + - + name: class + type: u16 + - + name: type + type: u8 + - + name: data + type: binary + - + name: tc-flower-key-enc-opt-vxlan-attrs + attributes: + - + name: gbp + type: u32 + - + name: tc-flower-key-enc-opt-erspan-attrs + attributes: + - + name: ver + type: u8 + - + name: index + type: u32 + - + name: dir + type: u8 + - + name: hwid + type: u8 + - + name: tc-flower-key-enc-opt-gtp-attrs + attributes: + - + name: pdu-type + type: u8 + - + name: qfi + type: u8 + - + name: tc-flower-key-mpls-opt-attrs + attributes: + - + name: lse-depth + type: u8 + - + name: lse-ttl + type: u8 + - + name: lse-bos + type: u8 + - + name: lse-tc + type: u8 + - + name: lse-label + type: u32 + - + name: tc-flower-key-cfm-attrs + attributes: + - + name: md-level + type: u8 + - + name: opcode + type: u8 + - + name: tc-fw-attrs + attributes: + - + name: classid + type: u32 + - + name: police + type: nest + nested-attributes: tc-police-attrs + - + name: indev + type: string + - + name: act + type: array-nest + nested-attributes: tc-act-attrs + - + name: mask + type: u32 - name: tc-gred-attrs attributes: @@ -1135,7 +2847,7 @@ attribute-sets: type: u32 - name: stat-bytes - type: u32 + type: u64 - name: stat-packets type: u32 @@ -1232,40 +2944,25 @@ attribute-sets: name: offload type: flag - - name: tc-act-attrs + name: tc-matchall-attrs attributes: - - name: kind - type: string + name: classid + type: u32 - - name: options - type: sub-message - sub-message: tc-act-options-msg - selector: kind + name: act + type: array-nest + nested-attributes: tc-act-attrs - - name: index + name: flags type: u32 - - name: stats + name: pcnt type: binary + struct: tc-matchall-pcnt - name: pad type: pad - - - name: cookie - type: binary - - - name: flags - type: bitfield32 - - - name: hw-stats - type: bitfield32 - - - name: used-hw-stats - type: bitfield32 - - - name: in-hw-count - type: u32 - name: tc-etf-attrs attributes: @@ -1304,48 +3001,71 @@ attribute-sets: - name: plimit type: u32 + doc: Limit of total number of packets in queue - name: flow-plimit type: u32 + doc: Limit of packets per flow - name: quantum type: u32 + doc: RR quantum - name: initial-quantum type: u32 + doc: RR quantum for new flow - name: rate-enable type: u32 + doc: Enable / disable rate limiting - name: flow-default-rate type: u32 + doc: Obsolete, do not use - name: flow-max-rate type: u32 + doc: Per flow max rate - name: buckets-log type: u32 + doc: log2(number of buckets) - name: flow-refill-delay type: u32 + doc: Flow credit refill delay in usec - name: orphan-mask type: u32 + doc: Mask applied to orphaned skb hashes - name: low-rate-threshold type: u32 + doc: Per packet delay under this rate - name: ce-threshold type: u32 + doc: DCTCP-like CE marking threshold - name: timer-slack type: u32 - name: horizon type: u32 + doc: Time horizon in usec - name: horizon-drop type: u8 + doc: Drop packets beyond horizon, or cap their EDT + - + name: priomap + type: binary + struct: tc-prio-qopt + - + name: weights + type: binary + sub-type: s32 + doc: Weights for each band - name: tc-fq-codel-attrs attributes: @@ -1427,6 +3147,7 @@ attribute-sets: - name: corr type: binary + struct: tc-netem-corr - name: delay-dist type: binary @@ -1434,15 +3155,19 @@ attribute-sets: - name: reorder type: binary + struct: tc-netem-reorder - name: corrupt type: binary + struct: tc-netem-corrupt - name: loss - type: binary + type: nest + nested-attributes: tc-netem-loss-attrs - name: rate type: binary + struct: tc-netem-rate - name: ecn type: u32 @@ -1461,10 +3186,27 @@ attribute-sets: - name: slot type: binary + struct: tc-netem-slot - name: slot-dist type: binary sub-type: s16 + - + name: prng-seed + type: u64 + - + name: tc-netem-loss-attrs + attributes: + - + name: gi + type: binary + doc: General Intuitive - 4 state model + struct: tc-netem-gimodel + - + name: ge + type: binary + doc: Gilbert Elliot models + struct: tc-netem-gemodel - name: tc-pie-attrs attributes: @@ -1493,6 +3235,44 @@ attribute-sets: name: dq-rate-estimator type: u32 - + name: tc-police-attrs + attributes: + - + name: tbf + type: binary + struct: tc-police + - + name: rate + type: binary + - + name: peakrate + type: binary + - + name: avrate + type: u32 + - + name: result + type: u32 + - + name: tm + type: binary + struct: tcf-t + - + name: pad + type: pad + - + name: rate64 + type: u64 + - + name: peakrate64 + type: u64 + - + name: pktrate64 + type: u64 + - + name: pktburst64 + type: u64 + - name: tc-qfq-attrs attributes: - @@ -1516,7 +3296,7 @@ attribute-sets: type: u32 - name: flags - type: binary + type: bitfield32 - name: early-drop-block type: u32 @@ -1524,6 +3304,29 @@ attribute-sets: name: mark-block type: u32 - + name: tc-route-attrs + attributes: + - + name: classid + type: u32 + - + name: to + type: u32 + - + name: from + type: u32 + - + name: iif + type: u32 + - + name: police + type: nest + nested-attributes: tc-police-attrs + - + name: act + type: array-nest + nested-attributes: tc-act-attrs + - name: tc-taprio-attrs attributes: - @@ -1573,6 +3376,7 @@ attribute-sets: name: entry type: nest nested-attributes: tc-taprio-sched-entry + multi-attr: true - name: tc-taprio-sched-entry attributes: @@ -1629,17 +3433,43 @@ attribute-sets: name: pad type: pad - - name: tca-gact-attrs + name: tc-act-sample-attrs attributes: - name: tm type: binary + struct: tcf-t - name: parms type: binary + struct: tc-gen + - + name: rate + type: u32 + - + name: trunc-size + type: u32 + - + name: psample-group + type: u32 + - + name: pad + type: pad + - + name: tc-act-gact-attrs + attributes: + - + name: tm + type: binary + struct: tcf-t + - + name: parms + type: binary + struct: tc-gen - name: prob type: binary + struct: tc-gact-p - name: pad type: pad @@ -1659,35 +3489,90 @@ attribute-sets: - name: basic type: binary + struct: gnet-stats-basic - name: rate-est type: binary + struct: gnet-stats-rate-est - name: queue type: binary + struct: gnet-stats-queue - name: app - type: binary # TODO sub-message needs 2+ level deep lookup + type: sub-message sub-message: tca-stats-app-msg selector: kind - name: rate-est64 type: binary + struct: gnet-stats-rate-est64 - name: pad type: pad - name: basic-hw type: binary + struct: gnet-stats-basic - name: pkt64 + type: u64 + - + name: tc-u32-attrs + attributes: + - + name: classid + type: u32 + - + name: hash + type: u32 + - + name: link + type: u32 + - + name: divisor + type: u32 + - + name: sel type: binary + struct: tc-u32-sel + - + name: police + type: nest + nested-attributes: tc-police-attrs + - + name: act + type: array-nest + nested-attributes: tc-act-attrs + - + name: indev + type: string + - + name: pcnt + type: binary + struct: tc-u32-pcnt + - + name: mark + type: binary + struct: tc-u32-mark + - + name: flags + type: u32 + - + name: pad + type: pad sub-messages: - name: tc-options-msg formats: - + value: basic + attribute-set: tc-basic-attrs + - + value: bpf + attribute-set: tc-bpf-attrs + - value: bfifo fixed-header: tc-fifo-qopt - @@ -1697,6 +3582,9 @@ sub-messages: value: cbs attribute-set: tc-cbs-attrs - + value: cgroup + attribute-set: tc-cgroup-attrs + - value: choke attribute-set: tc-choke-attrs - @@ -1714,6 +3602,12 @@ sub-messages: value: ets attribute-set: tc-ets-attrs - + value: flow + attribute-set: tc-flow-attrs + - + value: flower + attribute-set: tc-flower-attrs + - value: fq attribute-set: tc-fq-attrs - @@ -1723,8 +3617,8 @@ sub-messages: value: fq_pie attribute-set: tc-fq-pie-attrs - - value: flower - attribute-set: tc-flower-attrs + value: fw + attribute-set: tc-fw-attrs - value: gred attribute-set: tc-gred-attrs @@ -1740,6 +3634,9 @@ sub-messages: - value: ingress # no content - + value: matchall + attribute-set: tc-matchall-attrs + - value: mq # no content - value: mqprio @@ -1776,6 +3673,9 @@ sub-messages: value: red attribute-set: tc-red-attrs - + value: route + attribute-set: tc-route-attrs + - value: sfb fixed-header: tc-sfb-qopt - @@ -1787,88 +3687,105 @@ sub-messages: - value: tbf attribute-set: tc-tbf-attrs + - + value: u32 + attribute-set: tc-u32-attrs - name: tc-act-options-msg formats: - - value: gact - attribute-set: tca-gact-attrs - - - name: tca-stats-app-msg - formats: + value: bpf + attribute-set: tc-act-bpf-attrs - - value: bfifo + value: connmark + attribute-set: tc-act-connmark-attrs - - value: blackhole + value: csum + attribute-set: tc-act-csum-attrs - - value: cake - attribute-set: tc-cake-stats-attrs + value: ct + attribute-set: tc-act-ct-attrs - - value: cbs + value: ctinfo + attribute-set: tc-act-ctinfo-attrs - - value: choke + value: gact + attribute-set: tc-act-gact-attrs - - value: clsact + value: gate + attribute-set: tc-act-gate-attrs - - value: codel + value: ife + attribute-set: tc-act-ife-attrs - - value: drr + value: mirred + attribute-set: tc-act-mirred-attrs - - value: etf + value: mpls + attribute-set: tc-act-mpls-attrs - - value: ets - - - value: fq - - - value: fq_codel + value: nat + attribute-set: tc-act-nat-attrs - - value: fq_pie + value: pedit + attribute-set: tc-act-pedit-attrs - - value: flower + value: police + attribute-set: tc-act-police-attrs - - value: gred + value: sample + attribute-set: tc-act-sample-attrs - - value: hfsc + value: simple + attribute-set: tc-act-simple-attrs - - value: hhf + value: skbedit + attribute-set: tc-act-skbedit-attrs - - value: htb + value: skbmod + attribute-set: tc-act-skbmod-attrs - - value: ingress + value: tunnel_key + attribute-set: tc-act-tunnel-key-attrs - - value: mq + value: vlan + attribute-set: tc-act-vlan-attrs + - + name: tca-stats-app-msg + formats: - - value: mqprio + value: cake + attribute-set: tc-cake-stats-attrs - - value: multiq + value: choke + fixed-header: tc-choke-xstats - - value: netem + value: codel + fixed-header: tc-codel-xstats - - value: noqueue + value: fq + fixed-header: tc-fq-qd-stats - - value: pfifo + value: fq_codel + fixed-header: tc-fq-codel-xstats - - value: pfifo_fast + value: fq_pie + fixed-header: tc-fq-pie-xstats - - value: pfifo_head_drop + value: hhf + fixed-header: tc-hhf-xstats - value: pie - - - value: plug - - - value: prio - - - value: qfq + fixed-header: tc-pie-xstats - value: red + fixed-header: tc-red-xstats - value: sfb + fixed-header: tc-sfb-xstats - value: sfq - - - value: taprio - - - value: tbf + fixed-header: tc-sfq-xstats operations: enum-model: directional diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index dceeb0d763aa..72da7057e4cf 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -329,23 +329,24 @@ XDP_SHARED_UMEM option and provide the initial socket's fd in the sxdp_shared_umem_fd field as you registered the UMEM on that socket. These two sockets will now share one and the same UMEM. -There is no need to supply an XDP program like the one in the previous -case where sockets were bound to the same queue id and -device. Instead, use the NIC's packet steering capabilities to steer -the packets to the right queue. In the previous example, there is only -one queue shared among sockets, so the NIC cannot do this steering. It -can only steer between queues. - -In libbpf, you need to use the xsk_socket__create_shared() API as it -takes a reference to a FILL ring and a COMPLETION ring that will be -created for you and bound to the shared UMEM. You can use this -function for all the sockets you create, or you can use it for the -second and following ones and use xsk_socket__create() for the first -one. Both methods yield the same result. +In this case, it is possible to use the NIC's packet steering +capabilities to steer the packets to the right queue. This is not +possible in the previous example as there is only one queue shared +among sockets, so the NIC cannot do this steering as it can only steer +between queues. + +In libxdp (or libbpf prior to version 1.0), you need to use the +xsk_socket__create_shared() API as it takes a reference to a FILL ring +and a COMPLETION ring that will be created for you and bound to the +shared UMEM. You can use this function for all the sockets you create, +or you can use it for the second and following ones and use +xsk_socket__create() for the first one. Both methods yield the same +result. Note that a UMEM can be shared between sockets on the same queue id and device, as well as between queues on the same device and between -devices at the same time. +devices at the same time. It is also possible to redirect to any +socket as long as it is bound to the same umem with XDP_SHARED_UMEM. XDP_USE_NEED_WAKEUP bind flag ----------------------------- @@ -822,6 +823,10 @@ A: The short answer is no, that is not supported at the moment. The switch, or other distribution mechanism, in your NIC to direct traffic to the correct queue id and socket. + Note that if you are using the XDP_SHARED_UMEM option, it is + possible to switch traffic between any socket bound to the same + umem. + Q: My packets are sometimes corrupted. What is wrong? A: Care has to be taken not to feed the same buffer in the UMEM into diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst index f7a73421eb76..e774b48de9f5 100644 --- a/Documentation/networking/bonding.rst +++ b/Documentation/networking/bonding.rst @@ -444,6 +444,18 @@ arp_missed_max The default value is 2, and the allowable range is 1 - 255. +coupled_control + + Specifies whether the LACP state machine's MUX in the 802.3ad mode + should have separate Collecting and Distributing states. + + This is by implementing the independent control state machine per + IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control + state machine. + + The default value is 1. This setting does not separate the Collecting + and Distributing states, maintaining the bond in coupled control. + downdelay Specifies the time, in milliseconds, to wait before disabling diff --git a/Documentation/networking/bridge.rst b/Documentation/networking/bridge.rst index ba14e7b07869..ef8b73e157b2 100644 --- a/Documentation/networking/bridge.rst +++ b/Documentation/networking/bridge.rst @@ -324,7 +324,7 @@ Contact Info The code is currently maintained by Roopa Prabhu <roopa@nvidia.com> and Nikolay Aleksandrov <razor@blackwall.org>. Bridge bugs and enhancements are discussed on the linux-netdev mailing list netdev@vger.kernel.org and -bridge@lists.linux-foundation.org. +bridge@lists.linux.dev. The list is open to anyone interested: http://vger.kernel.org/vger-lists.html#netdev diff --git a/Documentation/networking/can.rst b/Documentation/networking/can.rst index d7e1ada905b2..62519d38c58b 100644 --- a/Documentation/networking/can.rst +++ b/Documentation/networking/can.rst @@ -444,6 +444,24 @@ definitions are specified for CAN specific MTUs in include/linux/can.h: #define CANFD_MTU (sizeof(struct canfd_frame)) == 72 => CAN FD frame +Returned Message Flags +---------------------- + +When using the system call recvmsg(2) on a RAW or a BCM socket, the +msg->msg_flags field may contain the following flags: + +MSG_DONTROUTE: + set when the received frame was created on the local host. + +MSG_CONFIRM: + set when the frame was sent via the socket it is received on. + This flag can be interpreted as a 'transmission confirmation' when the + CAN driver supports the echo of frames on driver level, see + :ref:`socketcan-local-loopback1` and :ref:`socketcan-local-loopback2`. + (Note: In order to receive such messages on a RAW socket, + CAN_RAW_RECV_OWN_MSGS must be set.) + + .. _socketcan-raw-sockets: RAW Protocol Sockets with can_filters (SOCK_RAW) @@ -693,22 +711,6 @@ where the CAN_INV_FILTER flag is set in order to notch single CAN IDs or CAN ID ranges from the incoming traffic. -RAW Socket Returned Message Flags -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -When using recvmsg() call, the msg->msg_flags may contain following flags: - -MSG_DONTROUTE: - set when the received frame was created on the local host. - -MSG_CONFIRM: - set when the frame was sent via the socket it is received on. - This flag can be interpreted as a 'transmission confirmation' when the - CAN driver supports the echo of frames on driver level, see - :ref:`socketcan-local-loopback1` and :ref:`socketcan-local-loopback2`. - In order to receive such messages, CAN_RAW_RECV_OWN_MSGS must be set. - - Broadcast Manager Protocol Sockets (SOCK_DGRAM) ----------------------------------------------- diff --git a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst index b842bcb14255..a4c7d0c65fd7 100644 --- a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst +++ b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst @@ -211,10 +211,16 @@ Documentation/networking/net_dim.rst RX copybreak ============ + The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK and can be configured by the ETHTOOL_STUNABLE command of the SIOCETHTOOL ioctl. +This option controls the maximum packet length for which the RX +descriptor it was received on would be recycled. When a packet smaller +than RX copybreak bytes is received, it is copied into a new memory +buffer and the RX descriptor is returned to HW. + Statistics ========== diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst index 43de285b8a92..6932d8c043c2 100644 --- a/Documentation/networking/device_drivers/ethernet/index.rst +++ b/Documentation/networking/device_drivers/ethernet/index.rst @@ -42,6 +42,7 @@ Contents: intel/ice marvell/octeontx2 marvell/octeon_ep + marvell/octeon_ep_vf mellanox/mlx5/index microsoft/netvsc neterion/s2io diff --git a/Documentation/networking/device_drivers/ethernet/intel/ice.rst b/Documentation/networking/device_drivers/ethernet/intel/ice.rst index 5038e54586af..934752f675ba 100644 --- a/Documentation/networking/device_drivers/ethernet/intel/ice.rst +++ b/Documentation/networking/device_drivers/ethernet/intel/ice.rst @@ -368,15 +368,28 @@ more options for Receive Side Scaling (RSS) hash byte configuration. # ethtool -N <ethX> rx-flow-hash <type> <option> Where <type> is: - tcp4 signifying TCP over IPv4 - udp4 signifying UDP over IPv4 - tcp6 signifying TCP over IPv6 - udp6 signifying UDP over IPv6 + tcp4 signifying TCP over IPv4 + udp4 signifying UDP over IPv4 + gtpc4 signifying GTP-C over IPv4 + gtpc4t signifying GTP-C (include TEID) over IPv4 + gtpu4 signifying GTP-U over IPV4 + gtpu4e signifying GTP-U and Extension Header over IPV4 + gtpu4u signifying GTP-U PSC Uplink over IPV4 + gtpu4d signifying GTP-U PSC Downlink over IPV4 + tcp6 signifying TCP over IPv6 + udp6 signifying UDP over IPv6 + gtpc6 signifying GTP-C over IPv6 + gtpc6t signifying GTP-C (include TEID) over IPv6 + gtpu6 signifying GTP-U over IPV6 + gtpu6e signifying GTP-U and Extension Header over IPV6 + gtpu6u signifying GTP-U PSC Uplink over IPV6 + gtpu6d signifying GTP-U PSC Downlink over IPV6 And <option> is one or more of: s Hash on the IP source address of the Rx packet. d Hash on the IP destination address of the Rx packet. f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet. n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet. + e Hash on GTP Packet on TEID (4bytes) of the Rx packet. Accelerated Receive Flow Steering (aRFS) diff --git a/Documentation/networking/device_drivers/ethernet/marvell/octeon_ep_vf.rst b/Documentation/networking/device_drivers/ethernet/marvell/octeon_ep_vf.rst new file mode 100644 index 000000000000..603133d0b92f --- /dev/null +++ b/Documentation/networking/device_drivers/ethernet/marvell/octeon_ep_vf.rst @@ -0,0 +1,24 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +======================================================================= +Linux kernel networking driver for Marvell's Octeon PCI Endpoint NIC VF +======================================================================= + +Network driver for Marvell's Octeon PCI EndPoint NIC VF. +Copyright (c) 2020 Marvell International Ltd. + +Overview +======== +This driver implements networking functionality of Marvell's Octeon PCI +EndPoint NIC VF. + +Supported Devices +================= +Currently, this driver support following devices: + * Network controller: Cavium, Inc. Device b203 + * Network controller: Cavium, Inc. Device b403 + * Network controller: Cavium, Inc. Device b103 + * Network controller: Cavium, Inc. Device b903 + * Network controller: Cavium, Inc. Device ba03 + * Network controller: Cavium, Inc. Device bc03 + * Network controller: Cavium, Inc. Device bd03 diff --git a/Documentation/networking/device_drivers/wwan/t7xx.rst b/Documentation/networking/device_drivers/wwan/t7xx.rst index dd5b731957ca..f346f5f85f15 100644 --- a/Documentation/networking/device_drivers/wwan/t7xx.rst +++ b/Documentation/networking/device_drivers/wwan/t7xx.rst @@ -39,6 +39,34 @@ command and receive response: - open the AT control channel using a UART tool or a special user tool +Sysfs +===== +The driver provides sysfs interfaces to userspace. + +t7xx_mode +--------- +The sysfs interface provides userspace with access to the device mode, this interface +supports read and write operations. + +Device mode: + +- ``unknown`` represents that device in unknown status +- ``ready`` represents that device in ready status +- ``reset`` represents that device in reset status +- ``fastboot_switching`` represents that device in fastboot switching status +- ``fastboot_download`` represents that device in fastboot download status +- ``fastboot_dump`` represents that device in fastboot dump status + +Read from userspace to get the current device mode. + +:: + $ cat /sys/bus/pci/devices/${bdf}/t7xx_mode + +Write from userspace to set the device mode. + +:: + $ echo fastboot_switching > /sys/bus/pci/devices/${bdf}/t7xx_mode + Management application development ================================== The driver and userspace interfaces are described below. The MBIM protocol is @@ -97,6 +125,20 @@ The driver exposes an AT port by implementing AT WWAN Port. The userspace end of the control port is a /dev/wwan0at0 character device. Application shall use this interface to issue AT commands. +fastboot port userspace ABI +--------------------------- + +/dev/wwan0fastboot0 character device +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The driver exposes a fastboot protocol interface by implementing +fastboot WWAN Port. The userspace end of the fastboot channel pipe is a +/dev/wwan0fastboot0 character device. Application shall use this interface for +fastboot protocol communication. + +Please note that driver needs to be reloaded to export /dev/wwan0fastboot0 +port, because device needs a cold reset after enter ``fastboot_switching`` +mode. + The MediaTek's T700 modem supports the 3GPP TS 27.007 [4] specification. References @@ -118,3 +160,7 @@ speak the Mobile Interface Broadband Model (MBIM) protocol"* [4] *Specification # 27.007 - 3GPP* - https://www.3gpp.org/DynaReport/27007.htm + +[5] *fastboot "a mechanism for communicating with bootloaders"* + +- https://android.googlesource.com/platform/system/core/+/refs/heads/main/fastboot/README.md diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst index e33ad2401ad7..562f46b41274 100644 --- a/Documentation/networking/devlink/devlink-port.rst +++ b/Documentation/networking/devlink/devlink-port.rst @@ -126,7 +126,7 @@ Users may also set the RoCE capability of the function using `devlink port function set roce` command. Users may also set the function as migratable using -'devlink port function set migratable' command. +`devlink port function set migratable` command. Users may also set the IPsec crypto capability of the function using `devlink port function set ipsec_crypto` command. diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst index 702f204a3dbd..456985407475 100644 --- a/Documentation/networking/devlink/mlx5.rst +++ b/Documentation/networking/devlink/mlx5.rst @@ -97,6 +97,10 @@ parameters. When metadata is disabled, the above use cases will fail to initialize if users try to enable them. + + Note: Setting this parameter does not take effect immediately. Setting + must happen in legacy mode and eswitch port metadata takes effect after + enabling switchdev mode. * - ``hairpin_num_queues`` - u32 - driverinit @@ -246,7 +250,7 @@ them in realtime. Description of the vnic counters: -- total_q_under_processor_handle +- total_error_queues number of queues in an error state due to an async error or errored command. - send_queue_priority_update_flow @@ -255,7 +259,8 @@ Description of the vnic counters: number of times CQ entered an error state due to an overflow. - async_eq_overrun number of times an EQ mapped to async events was overrun. - comp_eq_overrun number of times an EQ mapped to completion events was +- comp_eq_overrun + number of times an EQ mapped to completion events was overrun. - quota_exceeded_command number of commands issued and failed due to quota exceeded. diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 69f3d6dcd9fd..473d72c36d61 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -74,6 +74,7 @@ Contents: mpls-sysctl mptcp-sysctl multiqueue + multi-pf-netdev napi net_cachelines/index netconsole diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 7afff42612e9..bd50df6a5a42 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -2503,7 +2503,7 @@ use_tempaddr - INTEGER temp_valid_lft - INTEGER valid lifetime (in seconds) for temporary addresses. If less than the - minimum required lifetime (typically 5 seconds), temporary addresses + minimum required lifetime (typically 5-7 seconds), temporary addresses will not be created. Default: 172800 (2 days) @@ -2511,7 +2511,7 @@ temp_valid_lft - INTEGER temp_prefered_lft - INTEGER Preferred lifetime (in seconds) for temporary addresses. If temp_prefered_lft is less than the minimum required lifetime (typically - 5 seconds), temporary addresses will not be created. If + 5-7 seconds), the preferred lifetime is the minimum required. If temp_prefered_lft is greater than temp_valid_lft, the preferred lifetime is temp_valid_lft. @@ -2535,6 +2535,16 @@ max_desync_factor - INTEGER Default: 600 +regen_min_advance - INTEGER + How far in advance (in seconds), at minimum, to create a new temporary + address before the current one is deprecated. This value is added to + the amount of time that may be required for duplicate address detection + to determine when to create a new address. Linux permits setting this + value to less than the default of 2 seconds, but a value less than 2 + does not conform to RFC 8981. + + Default: 2 + regen_max_retry - INTEGER Number of attempts before give up attempting to generate valid temporary addresses. diff --git a/Documentation/networking/l2tp.rst b/Documentation/networking/l2tp.rst index 7f383e99dbad..8496b467dea4 100644 --- a/Documentation/networking/l2tp.rst +++ b/Documentation/networking/l2tp.rst @@ -386,12 +386,19 @@ Sample userspace code: - Create session PPPoX data socket:: + /* Input: the L2TP tunnel UDP socket `tunnel_fd`, which needs to be + * bound already (both sockname and peername), otherwise it will not be + * ready. + */ + struct sockaddr_pppol2tp sax; - int fd; + int session_fd; + int ret; + + session_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP); + if (session_fd < 0) + return -errno; - /* Note, the tunnel socket must be bound already, else it - * will not be ready - */ sax.sa_family = AF_PPPOX; sax.sa_protocol = PX_PROTO_OL2TP; sax.pppol2tp.fd = tunnel_fd; @@ -406,12 +413,128 @@ Sample userspace code: /* session_fd is the fd of the session's PPPoL2TP socket. * tunnel_fd is the fd of the tunnel UDP / L2TPIP socket. */ - fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax)); - if (fd < 0 ) { + ret = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax)); + if (ret < 0 ) { + close(session_fd); + return -errno; + } + + return session_fd; + +L2TP control packets will still be available for read on `tunnel_fd`. + + - Create PPP channel:: + + /* Input: the session PPPoX data socket `session_fd` which was created + * as described above. + */ + + int ppp_chan_fd; + int chindx; + int ret; + + ret = ioctl(session_fd, PPPIOCGCHAN, &chindx); + if (ret < 0) + return -errno; + + ppp_chan_fd = open("/dev/ppp", O_RDWR); + if (ppp_chan_fd < 0) + return -errno; + + ret = ioctl(ppp_chan_fd, PPPIOCATTCHAN, &chindx); + if (ret < 0) { + close(ppp_chan_fd); return -errno; } + + return ppp_chan_fd; + +LCP PPP frames will be available for read on `ppp_chan_fd`. + + - Create PPP interface:: + + /* Input: the PPP channel `ppp_chan_fd` which was created as described + * above. + */ + + int ifunit = -1; + int ppp_if_fd; + int ret; + + ppp_if_fd = open("/dev/ppp", O_RDWR); + if (ppp_if_fd < 0) + return -errno; + + ret = ioctl(ppp_if_fd, PPPIOCNEWUNIT, &ifunit); + if (ret < 0) { + close(ppp_if_fd); + return -errno; + } + + ret = ioctl(ppp_chan_fd, PPPIOCCONNECT, &ifunit); + if (ret < 0) { + close(ppp_if_fd); + return -errno; + } + + return ppp_if_fd; + +IPCP/IPv6CP PPP frames will be available for read on `ppp_if_fd`. + +The ppp<ifunit> interface can then be configured as usual with netlink's +RTM_NEWLINK, RTM_NEWADDR, RTM_NEWROUTE, or ioctl's SIOCSIFMTU, SIOCSIFADDR, +SIOCSIFDSTADDR, SIOCSIFNETMASK, SIOCSIFFLAGS, or with the `ip` command. + + - Bridging L2TP sessions which have PPP pseudowire types (this is also called + L2TP tunnel switching or L2TP multihop) is supported by bridging the PPP + channels of the two L2TP sessions to be bridged:: + + /* Input: the session PPPoX data sockets `session_fd1` and `session_fd2` + * which were created as described further above. + */ + + int ppp_chan_fd; + int chindx1; + int chindx2; + int ret; + + ret = ioctl(session_fd1, PPPIOCGCHAN, &chindx1); + if (ret < 0) + return -errno; + + ret = ioctl(session_fd2, PPPIOCGCHAN, &chindx2); + if (ret < 0) + return -errno; + + ppp_chan_fd = open("/dev/ppp", O_RDWR); + if (ppp_chan_fd < 0) + return -errno; + + ret = ioctl(ppp_chan_fd, PPPIOCATTCHAN, &chindx1); + if (ret < 0) { + close(ppp_chan_fd); + return -errno; + } + + ret = ioctl(ppp_chan_fd, PPPIOCBRIDGECHAN, &chindx2); + close(ppp_chan_fd); + if (ret < 0) + return -errno; + return 0; +It can be noted that when bridging PPP channels, the PPP session is not locally +terminated, and no local PPP interface is created. PPP frames arriving on one +channel are directly passed to the other channel, and vice versa. + +The PPP channel does not need to be kept open. Only the session PPPoX data +sockets need to be kept open. + +More generally, it is also possible in the same way to e.g. bridge a PPPoL2TP +PPP channel with other types of PPP channels, such as PPPoE. + +See more details for the PPP side in ppp_generic.rst. + Old L2TPv2-only API ------------------- diff --git a/Documentation/networking/multi-pf-netdev.rst b/Documentation/networking/multi-pf-netdev.rst new file mode 100644 index 000000000000..be8e4bcadf11 --- /dev/null +++ b/Documentation/networking/multi-pf-netdev.rst @@ -0,0 +1,174 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> + +=============== +Multi-PF Netdev +=============== + +Contents +======== + +- `Background`_ +- `Overview`_ +- `mlx5 implementation`_ +- `Channels distribution`_ +- `Observability`_ +- `Steering`_ +- `Mutually exclusive features`_ + +Background +========== + +The Multi-PF NIC technology enables several CPUs within a multi-socket server to connect directly to +the network, each through its own dedicated PCIe interface. Through either a connection harness that +splits the PCIe lanes between two cards or by bifurcating a PCIe slot for a single card. This +results in eliminating the network traffic traversing over the internal bus between the sockets, +significantly reducing overhead and latency, in addition to reducing CPU utilization and increasing +network throughput. + +Overview +======== + +The feature adds support for combining multiple PFs of the same port in a Multi-PF environment under +one netdev instance. It is implemented in the netdev layer. Lower-layer instances like pci func, +sysfs entry, and devlink are kept separate. +Passing traffic through different devices belonging to different NUMA sockets saves cross-NUMA +traffic and allows apps running on the same netdev from different NUMAs to still feel a sense of +proximity to the device and achieve improved performance. + +mlx5 implementation +=================== + +Multi-PF or Socket-direct in mlx5 is achieved by grouping PFs together which belong to the same +NIC and has the socket-direct property enabled, once all PFs are probed, we create a single netdev +to represent all of them, symmetrically, we destroy the netdev whenever any of the PFs is removed. + +The netdev network channels are distributed between all devices, a proper configuration would utilize +the correct close NUMA node when working on a certain app/CPU. + +We pick one PF to be a primary (leader), and it fills a special role. The other devices +(secondaries) are disconnected from the network at the chip level (set to silent mode). In silent +mode, no south <-> north traffic flowing directly through a secondary PF. It needs the assistance of +the leader PF (east <-> west traffic) to function. All Rx/Tx traffic is steered through the primary +to/from the secondaries. + +Currently, we limit the support to PFs only, and up to two PFs (sockets). + +Channels distribution +===================== + +We distribute the channels between the different PFs to achieve local NUMA node performance +on multiple NUMA nodes. + +Each combined channel works against one specific PF, creating all its datapath queues against it. We +distribute channels to PFs in a round-robin policy. + +:: + + Example for 2 PFs and 5 channels: + +--------+--------+ + | ch idx | PF idx | + +--------+--------+ + | 0 | 0 | + | 1 | 1 | + | 2 | 0 | + | 3 | 1 | + | 4 | 0 | + +--------+--------+ + + +The reason we prefer round-robin is, it is less influenced by changes in the number of channels. The +mapping between a channel index and a PF is fixed, no matter how many channels the user configures. +As the channel stats are persistent across channel's closure, changing the mapping every single time +would turn the accumulative stats less representing of the channel's history. + +This is achieved by using the correct core device instance (mdev) in each channel, instead of them +all using the same instance under "priv->mdev". + +Observability +============= +The relation between PF, irq, napi, and queue can be observed via netlink spec: + +$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml --dump queue-get --json='{"ifindex": 13}' +[{'id': 0, 'ifindex': 13, 'napi-id': 539, 'type': 'rx'}, + {'id': 1, 'ifindex': 13, 'napi-id': 540, 'type': 'rx'}, + {'id': 2, 'ifindex': 13, 'napi-id': 541, 'type': 'rx'}, + {'id': 3, 'ifindex': 13, 'napi-id': 542, 'type': 'rx'}, + {'id': 4, 'ifindex': 13, 'napi-id': 543, 'type': 'rx'}, + {'id': 0, 'ifindex': 13, 'napi-id': 539, 'type': 'tx'}, + {'id': 1, 'ifindex': 13, 'napi-id': 540, 'type': 'tx'}, + {'id': 2, 'ifindex': 13, 'napi-id': 541, 'type': 'tx'}, + {'id': 3, 'ifindex': 13, 'napi-id': 542, 'type': 'tx'}, + {'id': 4, 'ifindex': 13, 'napi-id': 543, 'type': 'tx'}] + +$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml --dump napi-get --json='{"ifindex": 13}' +[{'id': 543, 'ifindex': 13, 'irq': 42}, + {'id': 542, 'ifindex': 13, 'irq': 41}, + {'id': 541, 'ifindex': 13, 'irq': 40}, + {'id': 540, 'ifindex': 13, 'irq': 39}, + {'id': 539, 'ifindex': 13, 'irq': 36}] + +Here you can clearly observe our channels distribution policy: + +$ ls /proc/irq/{36,39,40,41,42}/mlx5* -d -1 +/proc/irq/36/mlx5_comp1@pci:0000:08:00.0 +/proc/irq/39/mlx5_comp1@pci:0000:09:00.0 +/proc/irq/40/mlx5_comp2@pci:0000:08:00.0 +/proc/irq/41/mlx5_comp2@pci:0000:09:00.0 +/proc/irq/42/mlx5_comp3@pci:0000:08:00.0 + +Steering +======== +Secondary PFs are set to "silent" mode, meaning they are disconnected from the network. + +In Rx, the steering tables belong to the primary PF only, and it is its role to distribute incoming +traffic to other PFs, via cross-vhca steering capabilities. Still maintain a single default RSS table, +that is capable of pointing to the receive queues of a different PF. + +In Tx, the primary PF creates a new Tx flow table, which is aliased by the secondaries, so they can +go out to the network through it. + +In addition, we set default XPS configuration that, based on the CPU, selects an SQ belonging to the +PF on the same node as the CPU. + +XPS default config example: + +NUMA node(s): 2 +NUMA node0 CPU(s): 0-11 +NUMA node1 CPU(s): 12-23 + +PF0 on node0, PF1 on node1. + +- /sys/class/net/eth2/queues/tx-0/xps_cpus:000001 +- /sys/class/net/eth2/queues/tx-1/xps_cpus:001000 +- /sys/class/net/eth2/queues/tx-2/xps_cpus:000002 +- /sys/class/net/eth2/queues/tx-3/xps_cpus:002000 +- /sys/class/net/eth2/queues/tx-4/xps_cpus:000004 +- /sys/class/net/eth2/queues/tx-5/xps_cpus:004000 +- /sys/class/net/eth2/queues/tx-6/xps_cpus:000008 +- /sys/class/net/eth2/queues/tx-7/xps_cpus:008000 +- /sys/class/net/eth2/queues/tx-8/xps_cpus:000010 +- /sys/class/net/eth2/queues/tx-9/xps_cpus:010000 +- /sys/class/net/eth2/queues/tx-10/xps_cpus:000020 +- /sys/class/net/eth2/queues/tx-11/xps_cpus:020000 +- /sys/class/net/eth2/queues/tx-12/xps_cpus:000040 +- /sys/class/net/eth2/queues/tx-13/xps_cpus:040000 +- /sys/class/net/eth2/queues/tx-14/xps_cpus:000080 +- /sys/class/net/eth2/queues/tx-15/xps_cpus:080000 +- /sys/class/net/eth2/queues/tx-16/xps_cpus:000100 +- /sys/class/net/eth2/queues/tx-17/xps_cpus:100000 +- /sys/class/net/eth2/queues/tx-18/xps_cpus:000200 +- /sys/class/net/eth2/queues/tx-19/xps_cpus:200000 +- /sys/class/net/eth2/queues/tx-20/xps_cpus:000400 +- /sys/class/net/eth2/queues/tx-21/xps_cpus:400000 +- /sys/class/net/eth2/queues/tx-22/xps_cpus:000800 +- /sys/class/net/eth2/queues/tx-23/xps_cpus:800000 + +Mutually exclusive features +=========================== + +The nature of Multi-PF, where different channels work with different PFs, conflicts with +stateful features where the state is maintained in one of the PFs. +For example, in the TLS device-offload feature, special context objects are created per connection +and maintained in the PF. Transitioning between different RQs/SQs would break the feature. Hence, +we disable this combination for now. diff --git a/Documentation/networking/net_cachelines/inet_sock.rst b/Documentation/networking/net_cachelines/inet_sock.rst index a2babd0d7954..595d7ef5fc8b 100644 --- a/Documentation/networking/net_cachelines/inet_sock.rst +++ b/Documentation/networking/net_cachelines/inet_sock.rst @@ -1,9 +1,9 @@ .. SPDX-License-Identifier: GPL-2.0 .. Copyright (C) 2023 Google LLC -===================================================== -inet_connection_sock struct fast path usage breakdown -===================================================== +========================================== +inet_sock struct fast path usage breakdown +========================================== Type Name fastpath_tx_access fastpath_rx_access comment ..struct ..inet_sock diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst index e75a53593bb9..dceb49d56a91 100644 --- a/Documentation/networking/net_cachelines/net_device.rst +++ b/Documentation/networking/net_cachelines/net_device.rst @@ -136,8 +136,8 @@ struct_netpoll_info* npinfo - possible_net_t nd_net - read_mostly (dev_net)napi_busy_loop,tcp_v(4/6)_rcv,ip(v6)_rcv,ip(6)_input,ip(6)_input_finish void* ml_priv enum_netdev_ml_priv_type ml_priv_type -struct_pcpu_lstats__percpu* lstats -struct_pcpu_sw_netstats__percpu* tstats +struct_pcpu_lstats__percpu* lstats read_mostly dev_lstats_add() +struct_pcpu_sw_netstats__percpu* tstats read_mostly dev_sw_netstats_tx_add() struct_pcpu_dstats__percpu* dstats struct_garp_port* garp_port struct_mrp_port* mrp_port diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst index 97d7a5c8e01c..1c154cbd1848 100644 --- a/Documentation/networking/net_cachelines/tcp_sock.rst +++ b/Documentation/networking/net_cachelines/tcp_sock.rst @@ -38,13 +38,13 @@ u32 max_window read_mostly - u32 mss_cache read_mostly read_mostly tcp_rate_check_app_limited,tcp_current_mss,tcp_sync_mss,tcp_sndbuf_expand,tcp_tso_should_defer(tx);tcp_update_pacing_rate,tcp_clean_rtx_queue(rx) u32 window_clamp read_mostly read_write tcp_rcv_space_adjust,__tcp_select_window u32 rcv_ssthresh read_mostly - __tcp_select_window -u82 scaling_ratio +u8 scaling_ratio read_mostly read_mostly tcp_win_from_space struct tcp_rack u16 advmss - read_mostly tcp_rcv_space_adjust u8 compressed_ack u8:2 dup_ack_counter u8:1 tlp_retrans -u8:1 tcp_usec_ts +u8:1 tcp_usec_ts read_mostly read_mostly u32 chrono_start read_write - tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data) u32[3] chrono_stat read_write - tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data) u8:2 chrono_type read_write - tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data) diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst index 390730a74332..d55c2a22ec7a 100644 --- a/Documentation/networking/netconsole.rst +++ b/Documentation/networking/netconsole.rst @@ -15,6 +15,8 @@ Extended console support by Tejun Heo <tj@kernel.org>, May 1 2015 Release prepend support by Breno Leitao <leitao@debian.org>, Jul 7 2023 +Userdata append support by Matthew Wood <thepacketgeek@gmail.com>, Jan 22 2024 + Please send bug reports to Matt Mackall <mpm@selenic.com> Satyam Sharma <satyam.sharma@gmail.com>, and Cong Wang <xiyou.wangcong@gmail.com> @@ -171,6 +173,70 @@ You can modify these targets in runtime by creating the following targets:: cat cmdline1/remote_ip 10.0.0.3 +Append User Data +---------------- + +Custom user data can be appended to the end of messages with netconsole +dynamic configuration enabled. User data entries can be modified without +changing the "enabled" attribute of a target. + +Directories (keys) under `userdata` are limited to 53 character length, and +data in `userdata/<key>/value` are limited to 200 bytes:: + + cd /sys/kernel/config/netconsole && mkdir cmdline0 + cd cmdline0 + mkdir userdata/foo + echo bar > userdata/foo/value + mkdir userdata/qux + echo baz > userdata/qux/value + +Messages will now include this additional user data:: + + echo "This is a message" > /dev/kmsg + +Sends:: + + 12,607,22085407756,-;This is a message + foo=bar + qux=baz + +Preview the userdata that will be appended with:: + + cd /sys/kernel/config/netconsole/cmdline0/userdata + for f in `ls userdata`; do echo $f=$(cat userdata/$f/value); done + +If a `userdata` entry is created but no data is written to the `value` file, +the entry will be omitted from netconsole messages:: + + cd /sys/kernel/config/netconsole && mkdir cmdline0 + cd cmdline0 + mkdir userdata/foo + echo bar > userdata/foo/value + mkdir userdata/qux + +The `qux` key is omitted since it has no value:: + + echo "This is a message" > /dev/kmsg + 12,607,22085407756,-;This is a message + foo=bar + +Delete `userdata` entries with `rmdir`:: + + rmdir /sys/kernel/config/netconsole/cmdline0/userdata/qux + +.. warning:: + When writing strings to user data values, input is broken up per line in + configfs store calls and this can cause confusing behavior:: + + mkdir userdata/testing + printf "val1\nval2" > userdata/testing/value + # userdata store value is called twice, first with "val1\n" then "val2" + # so "val2" is stored, being the last value stored + cat userdata/testing/value + val2 + + It is recommended to not write user data values with newlines. + Extended console: ================= diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst index 9e4cccb90b87..c2476917a6c3 100644 --- a/Documentation/networking/netdevices.rst +++ b/Documentation/networking/netdevices.rst @@ -252,8 +252,8 @@ ndo_eth_ioctl: Context: process ndo_get_stats: - Synchronization: rtnl_lock() semaphore, dev_base_lock rwlock, or RCU. - Context: atomic (can't sleep under rwlock or RCU) + Synchronization: rtnl_lock() semaphore, or RCU. + Context: atomic (can't sleep under RCU) ndo_start_xmit: Synchronization: __netif_tx_lock spinlock. diff --git a/Documentation/networking/sfp-phylink.rst b/Documentation/networking/sfp-phylink.rst index 8054d33f449f..5bf285d73e8a 100644 --- a/Documentation/networking/sfp-phylink.rst +++ b/Documentation/networking/sfp-phylink.rst @@ -231,16 +231,136 @@ this documentation. For further information on these methods, please see the inline documentation in :c:type:`struct phylink_mac_ops <phylink_mac_ops>`. -9. Remove calls to of_parse_phandle() for the PHY, - of_phy_register_fixed_link() for fixed links etc. from the probe - function, and replace with: +9. Fill-in the :c:type:`struct phylink_config <phylink_config>` fields with + a reference to the :c:type:`struct device <device>` associated to your + :c:type:`struct net_device <net_device>`: .. code-block:: c - struct phylink *phylink; priv->phylink_config.dev = &dev.dev; priv->phylink_config.type = PHYLINK_NETDEV; + Fill-in the various speeds, pause and duplex modes your MAC can handle: + + .. code-block:: c + + priv->phylink_config.mac_capabilities = MAC_SYM_PAUSE | MAC_10 | MAC_100 | MAC_1000FD; + +10. Some Ethernet controllers work in pair with a PCS (Physical Coding Sublayer) + block, that can handle among other things the encoding/decoding, link + establishment detection and autonegotiation. While some MACs have internal + PCS whose operation is transparent, some other require dedicated PCS + configuration for the link to become functional. In that case, phylink + provides a PCS abstraction through :c:type:`struct phylink_pcs <phylink_pcs>`. + + Identify if your driver has one or more internal PCS blocks, and/or if + your controller can use an external PCS block that might be internally + connected to your controller. + + If your controller doesn't have any internal PCS, you can go to step 11. + + If your Ethernet controller contains one or several PCS blocks, create + one :c:type:`struct phylink_pcs <phylink_pcs>` instance per PCS block within + your driver's private data structure: + + .. code-block:: c + + struct phylink_pcs pcs; + + Populate the relevant :c:type:`struct phylink_pcs_ops <phylink_pcs_ops>` to + configure your PCS. Create a :c:func:`pcs_get_state` function that reports + the inband link state, a :c:func:`pcs_config` function to configure your + PCS according to phylink-provided parameters, and a :c:func:`pcs_validate` + function that report to phylink all accepted configuration parameters for + your PCS: + + .. code-block:: c + + struct phylink_pcs_ops foo_pcs_ops = { + .pcs_validate = foo_pcs_validate, + .pcs_get_state = foo_pcs_get_state, + .pcs_config = foo_pcs_config, + }; + + Arrange for PCS link state interrupts to be forwarded into + phylink, via: + + .. code-block:: c + + phylink_pcs_change(pcs, link_is_up); + + where ``link_is_up`` is true if the link is currently up or false + otherwise. If a PCS is unable to provide these interrupts, then + it should set ``pcs->pcs_poll = true;`` when creating the PCS. + +11. If your controller relies on, or accepts the presence of an external PCS + controlled through its own driver, add a pointer to a phylink_pcs instance + in your driver private data structure: + + .. code-block:: c + + struct phylink_pcs *pcs; + + The way of getting an instance of the actual PCS depends on the platform, + some PCS sit on an MDIO bus and are grabbed by passing a pointer to the + corresponding :c:type:`struct mii_bus <mii_bus>` and the PCS's address on + that bus. In this example, we assume the controller attaches to a Lynx PCS + instance: + + .. code-block:: c + + priv->pcs = lynx_pcs_create_mdiodev(bus, 0); + + Some PCS can be recovered based on firmware information: + + .. code-block:: c + + priv->pcs = lynx_pcs_create_fwnode(of_fwnode_handle(node)); + +12. Populate the :c:func:`mac_select_pcs` callback and add it to your + :c:type:`struct phylink_mac_ops <phylink_mac_ops>` set of ops. This function + must return a pointer to the relevant :c:type:`struct phylink_pcs <phylink_pcs>` + that will be used for the requested link configuration: + + .. code-block:: c + + static struct phylink_pcs *foo_select_pcs(struct phylink_config *config, + phy_interface_t interface) + { + struct foo_priv *priv = container_of(config, struct foo_priv, + phylink_config); + + if ( /* 'interface' needs a PCS to function */ ) + return priv->pcs; + + return NULL; + } + + See :c:func:`mvpp2_select_pcs` for an example of a driver that has multiple + internal PCS. + +13. Fill-in all the :c:type:`phy_interface_t <phy_interface_t>` (i.e. all MAC to + PHY link modes) that your MAC can output. The following example shows a + configuration for a MAC that can handle all RGMII modes, SGMII and 1000BaseX. + You must adjust these according to what your MAC and all PCS associated + with this MAC are capable of, and not just the interface you wish to use: + + .. code-block:: c + + phy_interface_set_rgmii(priv->phylink_config.supported_interfaces); + __set_bit(PHY_INTERFACE_MODE_SGMII, + priv->phylink_config.supported_interfaces); + __set_bit(PHY_INTERFACE_MODE_1000BASEX, + priv->phylink_config.supported_interfaces); + +14. Remove calls to of_parse_phandle() for the PHY, + of_phy_register_fixed_link() for fixed links etc. from the probe + function, and replace with: + + .. code-block:: c + + struct phylink *phylink; + phylink = phylink_create(&priv->phylink_config, node, phy_mode, &phylink_ops); if (IS_ERR(phylink)) { err = PTR_ERR(phylink); @@ -249,14 +369,14 @@ this documentation. priv->phylink = phylink; - and arrange to destroy the phylink in the probe failure path as - appropriate and the removal path too by calling: + and arrange to destroy the phylink in the probe failure path as + appropriate and the removal path too by calling: - .. code-block:: c + .. code-block:: c phylink_destroy(priv->phylink); -10. Arrange for MAC link state interrupts to be forwarded into +15. Arrange for MAC link state interrupts to be forwarded into phylink, via: .. code-block:: c @@ -264,17 +384,16 @@ this documentation. phylink_mac_change(priv->phylink, link_is_up); where ``link_is_up`` is true if the link is currently up or false - otherwise. If a MAC is unable to provide these interrupts, then - it should set ``priv->phylink_config.pcs_poll = true;`` in step 9. + otherwise. -11. Verify that the driver does not call:: +16. Verify that the driver does not call:: netif_carrier_on() netif_carrier_off() - as these will interfere with phylink's tracking of the link state, - and cause phylink to omit calls via the :c:func:`mac_link_up` and - :c:func:`mac_link_down` methods. + as these will interfere with phylink's tracking of the link state, + and cause phylink to omit calls via the :c:func:`mac_link_up` and + :c:func:`mac_link_down` methods. Network drivers should call phylink_stop() and phylink_start() via their suspend/resume paths, which ensures that the appropriate diff --git a/Documentation/networking/statistics.rst b/Documentation/networking/statistics.rst index 551b3cc29a41..75e017dfa825 100644 --- a/Documentation/networking/statistics.rst +++ b/Documentation/networking/statistics.rst @@ -41,6 +41,15 @@ If `-s` is specified once the detailed errors won't be shown. `ip` supports JSON formatting via the `-j` option. +Queue statistics +~~~~~~~~~~~~~~~~ + +Queue statistics are accessible via the netdev netlink family. + +Currently no widely distributed CLI exists to access those statistics. +Kernel development tools (ynl) can be used to experiment with them, +see `Documentation/userspace-api/netlink/intro-specs.rst`. + Protocol-specific statistics ---------------------------- @@ -147,6 +156,12 @@ Statistics are reported both in the responses to link information requests (`RTM_GETLINK`) and statistic requests (`RTM_GETSTATS`, when `IFLA_STATS_LINK_64` bit is set in the `.filter_mask` of the request). +netdev (netlink) +~~~~~~~~~~~~~~~~ + +`netdev` generic netlink family allows accessing page pool and per queue +statistics. + ethtool ------- diff --git a/Documentation/networking/xfrm_device.rst b/Documentation/networking/xfrm_device.rst index 535077cbeb07..bfea9d8579ed 100644 --- a/Documentation/networking/xfrm_device.rst +++ b/Documentation/networking/xfrm_device.rst @@ -71,9 +71,9 @@ Callbacks to implement bool (*xdo_dev_offload_ok) (struct sk_buff *skb, struct xfrm_state *x); void (*xdo_dev_state_advance_esn) (struct xfrm_state *x); + void (*xdo_dev_state_update_stats) (struct xfrm_state *x); /* Solely packet offload callbacks */ - void (*xdo_dev_state_update_curlft) (struct xfrm_state *x); int (*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack); void (*xdo_dev_policy_delete) (struct xfrm_policy *x); void (*xdo_dev_policy_free) (struct xfrm_policy *x); @@ -191,6 +191,6 @@ xdo_dev_policy_free() on any remaining offloaded states. Outcome of HW handling packets, the XFRM core can't count hard, soft limits. The HW/driver are responsible to perform it and provide accurate data when -xdo_dev_state_update_curlft() is called. In case of one of these limits +xdo_dev_state_update_stats() is called. In case of one of these limits occuried, the driver needs to call to xfrm_state_check_expire() to make sure that XFRM performs rekeying sequence. diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst index 13225965c9a4..ada4938c37e5 100644 --- a/Documentation/power/energy-model.rst +++ b/Documentation/power/energy-model.rst @@ -71,6 +71,31 @@ whose performance is scaled together. Performance domains generally have a required to have the same micro-architecture. CPUs in different performance domains can have different micro-architectures. +To better reflect power variation due to static power (leakage) the EM +supports runtime modifications of the power values. The mechanism relies on +RCU to free the modifiable EM perf_state table memory. Its user, the task +scheduler, also uses RCU to access this memory. The EM framework provides +API for allocating/freeing the new memory for the modifiable EM table. +The old memory is freed automatically using RCU callback mechanism when there +are no owners anymore for the given EM runtime table instance. This is tracked +using kref mechanism. The device driver which provided the new EM at runtime, +should call EM API to free it safely when it's no longer needed. The EM +framework will handle the clean-up when it's possible. + +The kernel code which want to modify the EM values is protected from concurrent +access using a mutex. Therefore, the device driver code must run in sleeping +context when it tries to modify the EM. + +With the runtime modifiable EM we switch from a 'single and during the entire +runtime static EM' (system property) design to a 'single EM which can be +changed during runtime according e.g. to the workload' (system and workload +property) design. + +It is possible also to modify the CPU performance values for each EM's +performance state. Thus, the full power and performance profile (which +is an exponential curve) can be changed according e.g. to the workload +or system property. + 2. Core APIs ------------ @@ -175,10 +200,82 @@ CPUfreq governor is in use in case of CPU device. Currently this calculation is not provided for other type of devices. More details about the above APIs can be found in ``<linux/energy_model.h>`` -or in Section 2.4 +or in Section 2.5 + + +2.4 Runtime modifications +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Drivers willing to update the EM at runtime should use the following dedicated +function to allocate a new instance of the modified EM. The API is listed +below:: + + struct em_perf_table __rcu *em_table_alloc(struct em_perf_domain *pd); + +This allows to allocate a structure which contains the new EM table with +also RCU and kref needed by the EM framework. The 'struct em_perf_table' +contains array 'struct em_perf_state state[]' which is a list of performance +states in ascending order. That list must be populated by the device driver +which wants to update the EM. The list of frequencies can be taken from +existing EM (created during boot). The content in the 'struct em_perf_state' +must be populated by the driver as well. + +This is the API which does the EM update, using RCU pointers swap:: + + int em_dev_update_perf_domain(struct device *dev, + struct em_perf_table __rcu *new_table); + +Drivers must provide a pointer to the allocated and initialized new EM +'struct em_perf_table'. That new EM will be safely used inside the EM framework +and will be visible to other sub-systems in the kernel (thermal, powercap). +The main design goal for this API is to be fast and avoid extra calculations +or memory allocations at runtime. When pre-computed EMs are available in the +device driver, than it should be possible to simply re-use them with low +performance overhead. + +In order to free the EM, provided earlier by the driver (e.g. when the module +is unloaded), there is a need to call the API:: + + void em_table_free(struct em_perf_table __rcu *table); + +It will allow the EM framework to safely remove the memory, when there is +no other sub-system using it, e.g. EAS. + +To use the power values in other sub-systems (like thermal, powercap) there is +a need to call API which protects the reader and provide consistency of the EM +table data:: + + struct em_perf_state *em_perf_state_from_pd(struct em_perf_domain *pd); + +It returns the 'struct em_perf_state' pointer which is an array of performance +states in ascending order. +This function must be called in the RCU read lock section (after the +rcu_read_lock()). When the EM table is not needed anymore there is a need to +call rcu_real_unlock(). In this way the EM safely uses the RCU read section +and protects the users. It also allows the EM framework to manage the memory +and free it. More details how to use it can be found in Section 3.2 in the +example driver. + +There is dedicated API for device drivers to calculate em_perf_state::cost +values:: + + int em_dev_compute_costs(struct device *dev, struct em_perf_state *table, + int nr_states); + +These 'cost' values from EM are used in EAS. The new EM table should be passed +together with the number of entries and device pointer. When the computation +of the cost values is done properly the return value from the function is 0. +The function takes care for right setting of inefficiency for each performance +state as well. It updates em_perf_state::flags accordingly. +Then such prepared new EM can be passed to the em_dev_update_perf_domain() +function, which will allow to use it. + +More details about the above APIs can be found in ``<linux/energy_model.h>`` +or in Section 3.2 with an example code showing simple implementation of the +updating mechanism in a device driver. -2.4 Description details of this API +2.5 Description details of this API ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. kernel-doc:: include/linux/energy_model.h :internal: @@ -187,8 +284,11 @@ or in Section 2.4 :export: -3. Example driver ------------------ +3. Examples +----------- + +3.1 Example driver with EM registration +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The CPUFreq framework supports dedicated callback for registering the EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em(). @@ -242,3 +342,78 @@ EM framework:: 39 static struct cpufreq_driver foo_cpufreq_driver = { 40 .register_em = foo_cpufreq_register_em, 41 }; + + +3.2 Example driver with EM modification +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This section provides a simple example of a thermal driver modifying the EM. +The driver implements a foo_thermal_em_update() function. The driver is woken +up periodically to check the temperature and modify the EM data:: + + -> drivers/soc/example/example_em_mod.c + + 01 static void foo_get_new_em(struct foo_context *ctx) + 02 { + 03 struct em_perf_table __rcu *em_table; + 04 struct em_perf_state *table, *new_table; + 05 struct device *dev = ctx->dev; + 06 struct em_perf_domain *pd; + 07 unsigned long freq; + 08 int i, ret; + 09 + 10 pd = em_pd_get(dev); + 11 if (!pd) + 12 return; + 13 + 14 em_table = em_table_alloc(pd); + 15 if (!em_table) + 16 return; + 17 + 18 new_table = em_table->state; + 19 + 20 rcu_read_lock(); + 21 table = em_perf_state_from_pd(pd); + 22 for (i = 0; i < pd->nr_perf_states; i++) { + 23 freq = table[i].frequency; + 24 foo_get_power_perf_values(dev, freq, &new_table[i]); + 25 } + 26 rcu_read_unlock(); + 27 + 28 /* Calculate 'cost' values for EAS */ + 29 ret = em_dev_compute_costs(dev, table, pd->nr_perf_states); + 30 if (ret) { + 31 dev_warn(dev, "EM: compute costs failed %d\n", ret); + 32 em_free_table(em_table); + 33 return; + 34 } + 35 + 36 ret = em_dev_update_perf_domain(dev, em_table); + 37 if (ret) { + 38 dev_warn(dev, "EM: update failed %d\n", ret); + 39 em_free_table(em_table); + 40 return; + 41 } + 42 + 43 /* + 44 * Since it's one-time-update drop the usage counter. + 45 * The EM framework will later free the table when needed. + 46 */ + 47 em_table_free(em_table); + 48 } + 49 + 50 /* + 51 * Function called periodically to check the temperature and + 52 * update the EM if needed + 53 */ + 54 static void foo_thermal_em_update(struct foo_context *ctx) + 55 { + 56 struct device *dev = ctx->dev; + 57 int cpu; + 58 + 59 ctx->temperature = foo_get_temp(dev, ctx); + 60 if (ctx->temperature < FOO_EM_UPDATE_TEMP_THRESHOLD) + 61 return; + 62 + 63 foo_get_new_em(ctx); + 64 } diff --git a/Documentation/power/opp.rst b/Documentation/power/opp.rst index a7c03c470980..1b7f1d854f14 100644 --- a/Documentation/power/opp.rst +++ b/Documentation/power/opp.rst @@ -305,7 +305,7 @@ dev_pm_opp_get_opp_count { /* Do things */ num_available = dev_pm_opp_get_opp_count(dev); - speeds = kzalloc(sizeof(u32) * num_available, GFP_KERNEL); + speeds = kcalloc(num_available, sizeof(u32), GFP_KERNEL); /* populate the table in increasing order */ freq = 0; while (!IS_ERR(opp = dev_pm_opp_find_freq_ceil(dev, &freq))) { diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst index a125544b4cb6..12070320307e 100644 --- a/Documentation/power/pci.rst +++ b/Documentation/power/pci.rst @@ -625,7 +625,7 @@ The PCI subsystem-level callbacks they correspond to:: pci_pm_poweroff() pci_pm_poweroff_noirq() -work in analogy with pci_pm_suspend() and pci_pm_poweroff_noirq(), respectively, +work in analogy with pci_pm_suspend() and pci_pm_suspend_noirq(), respectively, although they don't attempt to save the device's standard configuration registers. diff --git a/Documentation/power/runtime_pm.rst b/Documentation/power/runtime_pm.rst index 65b86e487afe..5c4e730f38d0 100644 --- a/Documentation/power/runtime_pm.rst +++ b/Documentation/power/runtime_pm.rst @@ -154,7 +154,7 @@ suspending the device are satisfied) and to queue up a suspend request for the device in that case. If there is no idle callback, or if the callback returns 0, then the PM core will attempt to carry out a runtime suspend of the device, also respecting devices configured for autosuspend. In essence this means a -call to pm_runtime_autosuspend() (do note that drivers needs to update the +call to __pm_runtime_autosuspend() (do note that drivers needs to update the device last busy mark, pm_runtime_mark_last_busy(), to control the delay under this circumstance). To prevent this (for example, if the callback routine has started a delayed suspend), the routine must return a non-zero value. Negative @@ -396,10 +396,9 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h: nonzero, increment the counter and return 1; otherwise return 0 without changing the counter - `int pm_runtime_get_if_active(struct device *dev, bool ign_usage_count);` + `int pm_runtime_get_if_active(struct device *dev);` - return -EINVAL if 'power.disable_depth' is nonzero; otherwise, if the - runtime PM status is RPM_ACTIVE, and either ign_usage_count is true - or the device's usage_count is non-zero, increment the counter and + runtime PM status is RPM_ACTIVE, increment the counter and return 1; otherwise return 0 without changing the counter `void pm_runtime_put_noidle(struct device *dev);` @@ -410,6 +409,10 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h: pm_request_idle(dev) and return its result `int pm_runtime_put_autosuspend(struct device *dev);` + - does the same as __pm_runtime_put_autosuspend() for now, but in the + future, will also call pm_runtime_mark_last_busy() as well, DO NOT USE! + + `int __pm_runtime_put_autosuspend(struct device *dev);` - decrement the device's usage counter; if the result is 0 then run pm_request_autosuspend(dev) and return its result @@ -540,6 +543,7 @@ It is safe to execute the following helper functions from interrupt context: - pm_runtime_put_noidle() - pm_runtime_put() - pm_runtime_put_autosuspend() +- __pm_runtime_put_autosuspend() - pm_runtime_enable() - pm_suspend_ignore_children() - pm_runtime_set_active() @@ -730,6 +734,7 @@ out the following operations: for it, respectively. 7. Generic subsystem callbacks +============================== Subsystems may wish to conserve code space by using the set of generic power management callbacks provided by the PM core, defined in @@ -865,9 +870,9 @@ automatically be delayed until the desired period of inactivity has elapsed. Inactivity is determined based on the power.last_busy field. Drivers should call pm_runtime_mark_last_busy() to update this field after carrying out I/O, -typically just before calling pm_runtime_put_autosuspend(). The desired length -of the inactivity period is a matter of policy. Subsystems can set this length -initially by calling pm_runtime_set_autosuspend_delay(), but after device +typically just before calling __pm_runtime_put_autosuspend(). The desired +length of the inactivity period is a matter of policy. Subsystems can set this +length initially by calling pm_runtime_set_autosuspend_delay(), but after device registration the length should be controlled by user space, using the /sys/devices/.../power/autosuspend_delay_ms attribute. @@ -878,7 +883,7 @@ instead of the non-autosuspend counterparts:: Instead of: pm_runtime_suspend use: pm_runtime_autosuspend; Instead of: pm_schedule_suspend use: pm_request_autosuspend; - Instead of: pm_runtime_put use: pm_runtime_put_autosuspend; + Instead of: pm_runtime_put use: __pm_runtime_put_autosuspend; Instead of: pm_runtime_put_sync use: pm_runtime_put_sync_autosuspend. Drivers may also continue to use the non-autosuspend helper functions; they @@ -917,7 +922,7 @@ Here is a schematic pseudo-code example:: lock(&foo->private_lock); if (--foo->num_pending_requests == 0) { pm_runtime_mark_last_busy(&foo->dev); - pm_runtime_put_autosuspend(&foo->dev); + __pm_runtime_put_autosuspend(&foo->dev); } else { foo_process_next_request(foo); } diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst index 50b3d1cb1115..ca611c9c2d1e 100644 --- a/Documentation/process/changes.rst +++ b/Documentation/process/changes.rst @@ -31,7 +31,7 @@ you probably needn't concern yourself with pcmciautils. ====================== =============== ======================================== GNU C 5.1 gcc --version Clang/LLVM (optional) 11.0.0 clang --version -Rust (optional) 1.74.1 rustc --version +Rust (optional) 1.76.0 rustc --version bindgen (optional) 0.65.1 bindgen --version GNU make 3.82 make --version bash 4.2 bash --version @@ -144,8 +144,8 @@ Bison Since Linux 4.16, the build system generates parsers during build. This requires bison 2.0 or later. -pahole: -------- +pahole +------ Since Linux 5.2, if CONFIG_DEBUG_INFO_BTF is selected, the build system generates BTF (BPF Type Format) from DWARF in vmlinux, a bit later from kernel diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst index c48382c6b477..9c7cf7347394 100644 --- a/Documentation/process/coding-style.rst +++ b/Documentation/process/coding-style.rst @@ -203,7 +203,7 @@ Do not unnecessarily use braces where a single statement will do. and -.. code-block:: none +.. code-block:: c if (condition) do_this(); @@ -586,9 +586,9 @@ fix for this is to split it up into two error labels ``err_free_bar:`` and .. code-block:: c - err_free_bar: + err_free_bar: kfree(foo->bar); - err_free_foo: + err_free_foo: kfree(foo); return ret; @@ -660,7 +660,7 @@ make a good program). So, you can either get rid of GNU emacs, or change it to use saner values. To do the latter, you can stick the following in your .emacs file: -.. code-block:: none +.. code-block:: elisp (defun c-lineup-arglist-tabs-only (ignored) "Line up argument lists by tabs, not spaces" @@ -679,7 +679,7 @@ values. To do the latter, you can stick the following in your .emacs file: (c-offsets-alist . ( (arglist-close . c-lineup-arglist-tabs-only) (arglist-cont-nonempty . - (c-lineup-gcc-asm-reg c-lineup-arglist-tabs-only)) + (c-lineup-gcc-asm-reg c-lineup-arglist-tabs-only)) (arglist-intro . +) (brace-list-intro . +) (c . c-lineup-C-comments) @@ -899,7 +899,8 @@ which you should use to make sure messages are matched to the right device and driver, and are tagged with the right level: dev_err(), dev_warn(), dev_info(), and so forth. For messages that aren't associated with a particular device, <linux/printk.h> defines pr_notice(), pr_info(), -pr_warn(), pr_err(), etc. +pr_warn(), pr_err(), etc. When drivers are working properly they are quiet, +so prefer to use dev_dbg/pr_debug unless something is wrong. Coming up with good debugging messages can be quite a challenge; and once you have them, they can be a huge help for remote troubleshooting. However diff --git a/Documentation/process/cve.rst b/Documentation/process/cve.rst new file mode 100644 index 000000000000..5e2753eff729 --- /dev/null +++ b/Documentation/process/cve.rst @@ -0,0 +1,121 @@ +==== +CVEs +==== + +Common Vulnerabilities and Exposure (CVE®) numbers were developed as an +unambiguous way to identify, define, and catalog publicly disclosed +security vulnerabilities. Over time, their usefulness has declined with +regards to the kernel project, and CVE numbers were very often assigned +in inappropriate ways and for inappropriate reasons. Because of this, +the kernel development community has tended to avoid them. However, the +combination of continuing pressure to assign CVEs and other forms of +security identifiers, and ongoing abuses by individuals and companies +outside of the kernel community has made it clear that the kernel +community should have control over those assignments. + +The Linux kernel developer team does have the ability to assign CVEs for +potential Linux kernel security issues. This assignment is independent +of the :doc:`normal Linux kernel security bug reporting +process<../process/security-bugs>`. + +A list of all assigned CVEs for the Linux kernel can be found in the +archives of the linux-cve mailing list, as seen on +https://lore.kernel.org/linux-cve-announce/. To get notice of the +assigned CVEs, please `subscribe +<https://subspace.kernel.org/subscribing.html>`_ to that mailing list. + +Process +======= + +As part of the normal stable release process, kernel changes that are +potentially security issues are identified by the developers responsible +for CVE number assignments and have CVE numbers automatically assigned +to them. These assignments are published on the linux-cve-announce +mailing list as announcements on a frequent basis. + +Note, due to the layer at which the Linux kernel is in a system, almost +any bug might be exploitable to compromise the security of the kernel, +but the possibility of exploitation is often not evident when the bug is +fixed. Because of this, the CVE assignment team is overly cautious and +assign CVE numbers to any bugfix that they identify. This +explains the seemingly large number of CVEs that are issued by the Linux +kernel team. + +If the CVE assignment team misses a specific fix that any user feels +should have a CVE assigned to it, please email them at <cve@kernel.org> +and the team there will work with you on it. Note that no potential +security issues should be sent to this alias, it is ONLY for assignment +of CVEs for fixes that are already in released kernel trees. If you +feel you have found an unfixed security issue, please follow the +:doc:`normal Linux kernel security bug reporting +process<../process/security-bugs>`. + +No CVEs will be automatically assigned for unfixed security issues in +the Linux kernel; assignment will only automatically happen after a fix +is available and applied to a stable kernel tree, and it will be tracked +that way by the git commit id of the original fix. If anyone wishes to +have a CVE assigned before an issue is resolved with a commit, please +contact the kernel CVE assignment team at <cve@kernel.org> to get an +identifier assigned from their batch of reserved identifiers. + +No CVEs will be assigned for any issue found in a version of the kernel +that is not currently being actively supported by the Stable/LTS kernel +team. A list of the currently supported kernel branches can be found at +https://kernel.org/releases.html + +Disputes of assigned CVEs +========================= + +The authority to dispute or modify an assigned CVE for a specific kernel +change lies solely with the maintainers of the relevant subsystem +affected. This principle ensures a high degree of accuracy and +accountability in vulnerability reporting. Only those individuals with +deep expertise and intimate knowledge of the subsystem can effectively +assess the validity and scope of a reported vulnerability and determine +its appropriate CVE designation. Any attempt to modify or dispute a CVE +outside of this designated authority could lead to confusion, inaccurate +reporting, and ultimately, compromised systems. + +Invalid CVEs +============ + +If a security issue is found in a Linux kernel that is only supported by +a Linux distribution due to the changes that have been made by that +distribution, or due to the distribution supporting a kernel version +that is no longer one of the kernel.org supported releases, then a CVE +can not be assigned by the Linux kernel CVE team, and must be asked for +from that Linux distribution itself. + +Any CVE that is assigned against the Linux kernel for an actively +supported kernel version, by any group other than the kernel assignment +CVE team should not be treated as a valid CVE. Please notify the +kernel CVE assignment team at <cve@kernel.org> so that they can work to +invalidate such entries through the CNA remediation process. + +Applicability of specific CVEs +============================== + +As the Linux kernel can be used in many different ways, with many +different ways of accessing it by external users, or no access at all, +the applicability of any specific CVE is up to the user of Linux to +determine, it is not up to the CVE assignment team. Please do not +contact us to attempt to determine the applicability of any specific +CVE. + +Also, as the source tree is so large, and any one system only uses a +small subset of the source tree, any users of Linux should be aware that +large numbers of assigned CVEs are not relevant for their systems. + +In short, we do not know your use case, and we do not know what portions +of the kernel that you use, so there is no way for us to determine if a +specific CVE is relevant for your system. + +As always, it is best to take all released kernel changes, as they are +tested together in a unified whole by many community members, and not as +individual cherry-picked changes. Also note that for many bugs, the +solution to the overall problem is not found in a single change, but by +the sum of many fixes on top of each other. Ideally CVEs will be +assigned to all fixes for all issues, but sometimes we will fail to +notice fixes, therefore assume that some changes without a CVE assigned +might be relevant to take. + diff --git a/Documentation/process/embargoed-hardware-issues.rst b/Documentation/process/embargoed-hardware-issues.rst index 31000f075707..bb2100228cc7 100644 --- a/Documentation/process/embargoed-hardware-issues.rst +++ b/Documentation/process/embargoed-hardware-issues.rst @@ -255,7 +255,7 @@ an involved disclosed party. The current ambassadors list: IBM Power Anton Blanchard <anton@linux.ibm.com> IBM Z Christian Borntraeger <borntraeger@de.ibm.com> Intel Tony Luck <tony.luck@intel.com> - Qualcomm Trilok Soni <tsoni@codeaurora.org> + Qualcomm Trilok Soni <quic_tsoni@quicinc.com> RISC-V Palmer Dabbelt <palmer@dabbelt.com> Samsung Javier González <javier.gonz@samsung.com> diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst index 6c73889c98fc..eebda4910a88 100644 --- a/Documentation/process/howto.rst +++ b/Documentation/process/howto.rst @@ -351,8 +351,8 @@ Managing bug reports -------------------- One of the best ways to put into practice your hacking skills is by fixing -bugs reported by other people. Not only you will help to make the kernel -more stable, but you'll also learn to fix real world problems and you will +bugs reported by other people. Not only will you help to make the kernel +more stable, but you'll also learn to fix real-world problems and you will improve your skills, and other developers will be aware of your presence. Fixing bugs is one of the best ways to get merits among other developers, because not many people like wasting time fixing other people's bugs. diff --git a/Documentation/process/index.rst b/Documentation/process/index.rst index 6cb732dfcc72..de9cbb7bd7eb 100644 --- a/Documentation/process/index.rst +++ b/Documentation/process/index.rst @@ -81,6 +81,7 @@ of special classes of bugs: regressions and security problems. handling-regressions security-bugs + cve embargoed-hardware-issues Maintainer information diff --git a/Documentation/process/maintainer-netdev.rst b/Documentation/process/maintainer-netdev.rst index 84ee60fceef2..fd96e4a3cef9 100644 --- a/Documentation/process/maintainer-netdev.rst +++ b/Documentation/process/maintainer-netdev.rst @@ -431,7 +431,7 @@ patchwork checks Checks in patchwork are mostly simple wrappers around existing kernel scripts, the sources are available at: -https://github.com/kuba-moo/nipa/tree/master/tests +https://github.com/linux-netdev/nipa/tree/master/tests **Do not** post your patches just to run them through the checks. You must ensure that your patches are ready by testing them locally diff --git a/Documentation/process/maintainer-tip.rst b/Documentation/process/maintainer-tip.rst index 08dd0f804410..497bb39727c8 100644 --- a/Documentation/process/maintainer-tip.rst +++ b/Documentation/process/maintainer-tip.rst @@ -304,13 +304,15 @@ following tag ordering scheme: - Reported-by: ``Reporter <reporter@mail>`` + - Closes: ``URL or Message-ID of the bug report this is fixing`` + - Originally-by: ``Original author <original-author@mail>`` - Suggested-by: ``Suggester <suggester@mail>`` - Co-developed-by: ``Co-author <co-author@mail>`` - Signed-off: ``Co-author <co-author@mail>`` + Signed-off-by: ``Co-author <co-author@mail>`` Note, that Co-developed-by and Signed-off-by of the co-author(s) must come in pairs. @@ -478,7 +480,7 @@ Multi-line comments:: * Larger multi-line comments should be split into paragraphs. */ -No tail comments: +No tail comments (see below): Please refrain from using tail comments. Tail comments disturb the reading flow in almost all contexts, but especially in code:: @@ -499,6 +501,34 @@ No tail comments: /* This magic initialization needs a comment. Maybe not? */ seed = MAGIC_CONSTANT; + Use C++ style, tail comments when documenting structs in headers to + achieve a more compact layout and better readability:: + + // eax + u32 x2apic_shift : 5, // Number of bits to shift APIC ID right + // for the topology ID at the next level + : 27; // Reserved + // ebx + u32 num_processors : 16, // Number of processors at current level + : 16; // Reserved + + versus:: + + /* eax */ + /* + * Number of bits to shift APIC ID right for the topology ID + * at the next level + */ + u32 x2apic_shift : 5, + /* Reserved */ + : 27; + + /* ebx */ + /* Number of processors at current level */ + u32 num_processors : 16, + /* Reserved */ + : 16; + Comment the important things: Comments should be added where the operation is not obvious. Documenting diff --git a/Documentation/process/researcher-guidelines.rst b/Documentation/process/researcher-guidelines.rst index d159cd4f5e5b..beb484c5965d 100644 --- a/Documentation/process/researcher-guidelines.rst +++ b/Documentation/process/researcher-guidelines.rst @@ -167,4 +167,4 @@ If no one can be found to internally review patches and you need help finding such a person, or if you have any other questions related to this document and the developer community's expectations, please reach out to the private Technical Advisory Board mailing list: -<tech-board@lists.linux-foundation.org>. +<tech-board@groups.linuxfoundation.org>. diff --git a/Documentation/process/security-bugs.rst b/Documentation/process/security-bugs.rst index 692a3ba56cca..56c560a00b37 100644 --- a/Documentation/process/security-bugs.rst +++ b/Documentation/process/security-bugs.rst @@ -99,9 +99,8 @@ CVE assignment The security team does not assign CVEs, nor do we require them for reports or fixes, as this can needlessly complicate the process and may delay the bug handling. If a reporter wishes to have a CVE identifier -assigned, they should find one by themselves, for example by contacting -MITRE directly. However under no circumstances will a patch inclusion -be delayed to wait for a CVE identifier to arrive. +assigned for a confirmed issue, they can contact the :doc:`kernel CVE +assignment team<../process/cve>` to obtain one. Non-disclosure agreements ------------------------- diff --git a/Documentation/process/submit-checklist.rst b/Documentation/process/submit-checklist.rst index b1bc2d37bd0a..e531dd504b6c 100644 --- a/Documentation/process/submit-checklist.rst +++ b/Documentation/process/submit-checklist.rst @@ -1,7 +1,8 @@ .. _submitchecklist: +======================================= Linux Kernel patch submission checklist -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +======================================= Here are some basic things that developers should do if they want to see their kernel patch submissions accepted more quickly. @@ -10,111 +11,123 @@ These are all above and beyond the documentation that is provided in :ref:`Documentation/process/submitting-patches.rst <submittingpatches>` and elsewhere regarding submitting Linux kernel patches. +Review your code +================ 1) If you use a facility then #include the file that defines/declares that facility. Don't depend on other header files pulling in ones that you use. -2) Builds cleanly: +2) Check your patch for general style as detailed in + :ref:`Documentation/process/coding-style.rst <codingstyle>`. - a) with applicable or modified ``CONFIG`` options ``=y``, ``=m``, and - ``=n``. No ``gcc`` warnings/errors, no linker warnings/errors. +3) All memory barriers {e.g., ``barrier()``, ``rmb()``, ``wmb()``} need a + comment in the source code that explains the logic of what they are doing + and why. - b) Passes ``allnoconfig``, ``allmodconfig`` +Review Kconfig changes +====================== - c) Builds successfully when using ``O=builddir`` +1) Any new or modified ``CONFIG`` options do not muck up the config menu and + default to off unless they meet the exception criteria documented in + ``Documentation/kbuild/kconfig-language.rst`` Menu attributes: default value. - d) Any Documentation/ changes build successfully without new warnings/errors. - Use ``make htmldocs`` or ``make pdfdocs`` to check the build and - fix any issues. +2) All new ``Kconfig`` options have help text. -3) Builds on multiple CPU architectures by using local cross-compile tools - or some other build farm. +3) Has been carefully reviewed with respect to relevant ``Kconfig`` + combinations. This is very hard to get right with testing---brainpower + pays off here. -4) ppc64 is a good architecture for cross-compilation checking because it - tends to use ``unsigned long`` for 64-bit quantities. +Provide documentation +===================== -5) Check your patch for general style as detailed in - :ref:`Documentation/process/coding-style.rst <codingstyle>`. - Check for trivial violations with the patch style checker prior to - submission (``scripts/checkpatch.pl``). - You should be able to justify all violations that remain in - your patch. +1) Include :ref:`kernel-doc <kernel_doc>` to document global kernel APIs. + (Not required for static functions, but OK there also.) -6) Any new or modified ``CONFIG`` options do not muck up the config menu and - default to off unless they meet the exception criteria documented in - ``Documentation/kbuild/kconfig-language.rst`` Menu attributes: default value. +2) All new ``/proc`` entries are documented under ``Documentation/`` -7) All new ``Kconfig`` options have help text. +3) All new kernel boot parameters are documented in + ``Documentation/admin-guide/kernel-parameters.rst``. -8) Has been carefully reviewed with respect to relevant ``Kconfig`` - combinations. This is very hard to get right with testing -- brainpower - pays off here. +4) All new module parameters are documented with ``MODULE_PARM_DESC()`` -9) Check cleanly with sparse. +5) All new userspace interfaces are documented in ``Documentation/ABI/``. + See ``Documentation/ABI/README`` for more information. + Patches that change userspace interfaces should be CCed to + linux-api@vger.kernel.org. -10) Use ``make checkstack`` and fix any problems that it finds. +6) If any ioctl's are added by the patch, then also update + ``Documentation/userspace-api/ioctl/ioctl-number.rst``. - .. note:: +Check your code with tools +========================== - ``checkstack`` does not point out problems explicitly, - but any one function that uses more than 512 bytes on the stack is a - candidate for change. +1) Check for trivial violations with the patch style checker prior to + submission (``scripts/checkpatch.pl``). + You should be able to justify all violations that remain in + your patch. -11) Include :ref:`kernel-doc <kernel_doc>` to document global kernel APIs. - (Not required for static functions, but OK there also.) Use - ``make htmldocs`` or ``make pdfdocs`` to check the - :ref:`kernel-doc <kernel_doc>` and fix any issues. +2) Check cleanly with sparse. -12) Has been tested with ``CONFIG_PREEMPT``, ``CONFIG_DEBUG_PREEMPT``, - ``CONFIG_DEBUG_SLAB``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``, - ``CONFIG_DEBUG_SPINLOCK``, ``CONFIG_DEBUG_ATOMIC_SLEEP``, - ``CONFIG_PROVE_RCU`` and ``CONFIG_DEBUG_OBJECTS_RCU_HEAD`` all - simultaneously enabled. +3) Use ``make checkstack`` and fix any problems that it finds. + Note that ``checkstack`` does not point out problems explicitly, + but any one function that uses more than 512 bytes on the stack is a + candidate for change. -13) Has been build- and runtime tested with and without ``CONFIG_SMP`` and - ``CONFIG_PREEMPT.`` +Build your code +=============== -14) All codepaths have been exercised with all lockdep features enabled. +1) Builds cleanly: -15) All new ``/proc`` entries are documented under ``Documentation/`` + a) with applicable or modified ``CONFIG`` options ``=y``, ``=m``, and + ``=n``. No ``gcc`` warnings/errors, no linker warnings/errors. -16) All new kernel boot parameters are documented in - ``Documentation/admin-guide/kernel-parameters.rst``. + b) Passes ``allnoconfig``, ``allmodconfig`` + + c) Builds successfully when using ``O=builddir`` + + d) Any Documentation/ changes build successfully without new warnings/errors. + Use ``make htmldocs`` or ``make pdfdocs`` to check the build and + fix any issues. -17) All new module parameters are documented with ``MODULE_PARM_DESC()`` +2) Builds on multiple CPU architectures by using local cross-compile tools + or some other build farm. Note that ppc64 is a good architecture for + cross-compilation checking because it tends to use ``unsigned long`` for + 64-bit quantities. -18) All new userspace interfaces are documented in ``Documentation/ABI/``. - See ``Documentation/ABI/README`` for more information. - Patches that change userspace interfaces should be CCed to - linux-api@vger.kernel.org. +3) Newly-added code has been compiled with ``gcc -W`` (use + ``make KCFLAGS=-W``). This will generate lots of noise, but is good + for finding bugs like "warning: comparison between signed and unsigned". -19) Has been checked with injection of at least slab and page-allocation - failures. See ``Documentation/fault-injection/``. +4) If your modified source code depends on or uses any of the kernel + APIs or features that are related to the following ``Kconfig`` symbols, + then test multiple builds with the related ``Kconfig`` symbols disabled + and/or ``=m`` (if that option is available) [not all of these at the + same time, just various/random combinations of them]: - If the new code is substantial, addition of subsystem-specific fault - injection might be appropriate. + ``CONFIG_SMP``, ``CONFIG_SYSFS``, ``CONFIG_PROC_FS``, ``CONFIG_INPUT``, + ``CONFIG_PCI``, ``CONFIG_BLOCK``, ``CONFIG_PM``, ``CONFIG_MAGIC_SYSRQ``, + ``CONFIG_NET``, ``CONFIG_INET=n`` (but latter with ``CONFIG_NET=y``). -20) Newly-added code has been compiled with ``gcc -W`` (use - ``make KCFLAGS=-W``). This will generate lots of noise, but is good - for finding bugs like "warning: comparison between signed and unsigned". +Test your code +============== -21) Tested after it has been merged into the -mm patchset to make sure - that it still works with all of the other queued patches and various - changes in the VM, VFS, and other subsystems. +1) Has been tested with ``CONFIG_PREEMPT``, ``CONFIG_DEBUG_PREEMPT``, + ``CONFIG_SLUB_DEBUG``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``, + ``CONFIG_DEBUG_SPINLOCK``, ``CONFIG_DEBUG_ATOMIC_SLEEP``, + ``CONFIG_PROVE_RCU`` and ``CONFIG_DEBUG_OBJECTS_RCU_HEAD`` all + simultaneously enabled. -22) All memory barriers {e.g., ``barrier()``, ``rmb()``, ``wmb()``} need a - comment in the source code that explains the logic of what they are doing - and why. +2) Has been build- and runtime tested with and without ``CONFIG_SMP`` and + ``CONFIG_PREEMPT.`` -23) If any ioctl's are added by the patch, then also update - ``Documentation/userspace-api/ioctl/ioctl-number.rst``. +3) All codepaths have been exercised with all lockdep features enabled. -24) If your modified source code depends on or uses any of the kernel - APIs or features that are related to the following ``Kconfig`` symbols, - then test multiple builds with the related ``Kconfig`` symbols disabled - and/or ``=m`` (if that option is available) [not all of these at the - same time, just various/random combinations of them]: +4) Has been checked with injection of at least slab and page-allocation + failures. See ``Documentation/fault-injection/``. + If the new code is substantial, addition of subsystem-specific fault + injection might be appropriate. - ``CONFIG_SMP``, ``CONFIG_SYSFS``, ``CONFIG_PROC_FS``, ``CONFIG_INPUT``, ``CONFIG_PCI``, ``CONFIG_BLOCK``, ``CONFIG_PM``, ``CONFIG_MAGIC_SYSRQ``, - ``CONFIG_NET``, ``CONFIG_INET=n`` (but latter with ``CONFIG_NET=y``). +5) Tested with the most recent tag of linux-next to make sure that it still + works with all of the other queued patches and various changes in the VM, + VFS, and other subsystems. diff --git a/Documentation/rust/general-information.rst b/Documentation/rust/general-information.rst index 236c6dd3c647..081397827a7e 100644 --- a/Documentation/rust/general-information.rst +++ b/Documentation/rust/general-information.rst @@ -77,27 +77,3 @@ configuration: #[cfg(CONFIG_X="y")] // Enabled as a built-in (`y`) #[cfg(CONFIG_X="m")] // Enabled as a module (`m`) #[cfg(not(CONFIG_X))] // Disabled - - -Testing -------- - -There are the tests that come from the examples in the Rust documentation -and get transformed into KUnit tests. These can be run via KUnit. For example -via ``kunit_tool`` (``kunit.py``) on the command line:: - - ./tools/testing/kunit/kunit.py run --make_options LLVM=1 --arch x86_64 --kconfig_add CONFIG_RUST=y - -Alternatively, KUnit can run them as kernel built-in at boot. Refer to -Documentation/dev-tools/kunit/index.rst for the general KUnit documentation -and Documentation/dev-tools/kunit/architecture.rst for the details of kernel -built-in vs. command line testing. - -Additionally, there are the ``#[test]`` tests. These can be run using -the ``rusttest`` Make target:: - - make LLVM=1 rusttest - -This requires the kernel ``.config`` and downloads external repositories. -It runs the ``#[test]`` tests on the host (currently) and thus is fairly -limited in what these tests can test. diff --git a/Documentation/rust/index.rst b/Documentation/rust/index.rst index 965f2db529e0..46d35bd395cf 100644 --- a/Documentation/rust/index.rst +++ b/Documentation/rust/index.rst @@ -40,6 +40,7 @@ configurations. general-information coding-guidelines arch-support + testing .. only:: subproject and html diff --git a/Documentation/rust/testing.rst b/Documentation/rust/testing.rst new file mode 100644 index 000000000000..6658998d1b6c --- /dev/null +++ b/Documentation/rust/testing.rst @@ -0,0 +1,135 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Testing +======= + +This document contains useful information how to test the Rust code in the +kernel. + +There are two sorts of tests: + +- The KUnit tests. +- The ``#[test]`` tests. + +The KUnit tests +--------------- + +These are the tests that come from the examples in the Rust documentation. They +get transformed into KUnit tests. + +Usage +***** + +These tests can be run via KUnit. For example via ``kunit_tool`` (``kunit.py``) +on the command line:: + + ./tools/testing/kunit/kunit.py run --make_options LLVM=1 --arch x86_64 --kconfig_add CONFIG_RUST=y + +Alternatively, KUnit can run them as kernel built-in at boot. Refer to +Documentation/dev-tools/kunit/index.rst for the general KUnit documentation +and Documentation/dev-tools/kunit/architecture.rst for the details of kernel +built-in vs. command line testing. + +To use these KUnit doctests, the following must be enabled:: + + CONFIG_KUNIT + Kernel hacking -> Kernel Testing and Coverage -> KUnit - Enable support for unit tests + CONFIG_RUST_KERNEL_DOCTESTS + Kernel hacking -> Rust hacking -> Doctests for the `kernel` crate + +in the kernel config system. + +KUnit tests are documentation tests +*********************************** + +These documentation tests are typically examples of usage of any item (e.g. +function, struct, module...). + +They are very convenient because they are just written alongside the +documentation. For instance: + +.. code-block:: rust + + /// Sums two numbers. + /// + /// ``` + /// assert_eq!(mymod::f(10, 20), 30); + /// ``` + pub fn f(a: i32, b: i32) -> i32 { + a + b + } + +In userspace, the tests are collected and run via ``rustdoc``. Using the tool +as-is would be useful already, since it allows verifying that examples compile +(thus enforcing they are kept in sync with the code they document) and as well +as running those that do not depend on in-kernel APIs. + +For the kernel, however, these tests get transformed into KUnit test suites. +This means that doctests get compiled as Rust kernel objects, allowing them to +run against a built kernel. + +A benefit of this KUnit integration is that Rust doctests get to reuse existing +testing facilities. For instance, the kernel log would look like:: + + KTAP version 1 + 1..1 + KTAP version 1 + # Subtest: rust_doctests_kernel + 1..59 + # rust_doctest_kernel_build_assert_rs_0.location: rust/kernel/build_assert.rs:13 + ok 1 rust_doctest_kernel_build_assert_rs_0 + # rust_doctest_kernel_build_assert_rs_1.location: rust/kernel/build_assert.rs:56 + ok 2 rust_doctest_kernel_build_assert_rs_1 + # rust_doctest_kernel_init_rs_0.location: rust/kernel/init.rs:122 + ok 3 rust_doctest_kernel_init_rs_0 + ... + # rust_doctest_kernel_types_rs_2.location: rust/kernel/types.rs:150 + ok 59 rust_doctest_kernel_types_rs_2 + # rust_doctests_kernel: pass:59 fail:0 skip:0 total:59 + # Totals: pass:59 fail:0 skip:0 total:59 + ok 1 rust_doctests_kernel + +Tests using the `? <https://doc.rust-lang.org/reference/expressions/operator-expr.html#the-question-mark-operator>`_ +operator are also supported as usual, e.g.: + +.. code-block:: rust + + /// ``` + /// # use kernel::{spawn_work_item, workqueue}; + /// spawn_work_item!(workqueue::system(), || pr_info!("x"))?; + /// # Ok::<(), Error>(()) + /// ``` + +The tests are also compiled with Clippy under ``CLIPPY=1``, just like normal +code, thus also benefitting from extra linting. + +In order for developers to easily see which line of doctest code caused a +failure, a KTAP diagnostic line is printed to the log. This contains the +location (file and line) of the original test (i.e. instead of the location in +the generated Rust file):: + + # rust_doctest_kernel_types_rs_2.location: rust/kernel/types.rs:150 + +Rust tests appear to assert using the usual ``assert!`` and ``assert_eq!`` +macros from the Rust standard library (``core``). We provide a custom version +that forwards the call to KUnit instead. Importantly, these macros do not +require passing context, unlike those for KUnit testing (i.e. +``struct kunit *``). This makes them easier to use, and readers of the +documentation do not need to care about which testing framework is used. In +addition, it may allow us to test third-party code more easily in the future. + +A current limitation is that KUnit does not support assertions in other tasks. +Thus, we presently simply print an error to the kernel log if an assertion +actually failed. Additionally, doctests are not run for nonpublic functions. + +The ``#[test]`` tests +--------------------- + +Additionally, there are the ``#[test]`` tests. These can be run using the +``rusttest`` Make target:: + + make LLVM=1 rusttest + +This requires the kernel ``.config`` and downloads external repositories. It +runs the ``#[test]`` tests on the host (currently) and thus is fairly limited in +what these tests can test. diff --git a/Documentation/sphinx/kernel_feat.py b/Documentation/sphinx/kernel_feat.py index b9df61eb4501..03ace5f01b5c 100644 --- a/Documentation/sphinx/kernel_feat.py +++ b/Documentation/sphinx/kernel_feat.py @@ -109,7 +109,7 @@ class KernelFeat(Directive): else: out_lines += line + "\n" - nodeList = self.nestedParse(out_lines, fname) + nodeList = self.nestedParse(out_lines, self.arguments[0]) return nodeList def nestedParse(self, lines, fname): diff --git a/Documentation/sphinx/kerneldoc-preamble.sty b/Documentation/sphinx/kerneldoc-preamble.sty index 9707e033c8c4..3092df051c95 100644 --- a/Documentation/sphinx/kerneldoc-preamble.sty +++ b/Documentation/sphinx/kerneldoc-preamble.sty @@ -54,9 +54,7 @@ \renewcommand*\l@section{\@dottedtocline{1}{2.4em}{3.2em}} \renewcommand*\l@subsection{\@dottedtocline{2}{5.6em}{4.3em}} \makeatother -%% Sphinx < 1.8 doesn't have \sphinxtableofcontentshook -\providecommand{\sphinxtableofcontentshook}{} -%% Undefine it for compatibility with Sphinx 1.7.9 +%% Prevent default \sphinxtableofcontentshook from overwriting above tweaks. \renewcommand{\sphinxtableofcontentshook}{} % Empty the hook % Prevent column squeezing of tabulary. \tymin is set by Sphinx as: @@ -136,9 +134,6 @@ } \newCJKfontfamily[JPsans]\jpsans{Noto Sans CJK JP}[AutoFakeSlant] \newCJKfontfamily[JPmono]\jpmono{Noto Sans Mono CJK JP}[AutoFakeSlant] - % Dummy commands for Sphinx < 2.3 (no 'extrapackages' support) - \providecommand{\onehalfspacing}{} - \providecommand{\singlespacing}{} % Define custom macros to on/off CJK %% One and half spacing for CJK contents \newcommand{\kerneldocCJKon}{\makexeCJKactive\onehalfspacing} diff --git a/Documentation/sphinx/kerneldoc.py b/Documentation/sphinx/kerneldoc.py index 7acf09963daa..ec1ddfff1863 100644 --- a/Documentation/sphinx/kerneldoc.py +++ b/Documentation/sphinx/kerneldoc.py @@ -61,9 +61,9 @@ class KernelDocDirective(Directive): env = self.state.document.settings.env cmd = [env.config.kerneldoc_bin, '-rst', '-enable-lineno'] - # Pass the version string to kernel-doc, as it needs to use a different - # dialect, depending what the C domain supports for each specific - # Sphinx versions + # Pass the version string to kernel-doc, as it needs to use a different + # dialect, depending what the C domain supports for each specific + # Sphinx versions cmd += ['-sphinx-version', sphinx.__version__] filename = env.config.kerneldoc_srctree + '/' + self.arguments[0] diff --git a/Documentation/sphinx/requirements.txt b/Documentation/sphinx/requirements.txt index 5d47ed443949..5017f307c8a4 100644 --- a/Documentation/sphinx/requirements.txt +++ b/Documentation/sphinx/requirements.txt @@ -1,6 +1,3 @@ -# jinja2>=3.1 is not compatible with Sphinx<4.0 -jinja2<3.1 -# alabaster>=0.7.14 is not compatible with Sphinx<=3.3 -alabaster<0.7.14 -Sphinx==2.4.4 +alabaster +Sphinx pyyaml diff --git a/Documentation/sphinx/templates/kernel-toc.html b/Documentation/sphinx/templates/kernel-toc.html index b58efa99df52..41f1efbe64bb 100644 --- a/Documentation/sphinx/templates/kernel-toc.html +++ b/Documentation/sphinx/templates/kernel-toc.html @@ -12,5 +12,7 @@ <script type="text/javascript"> <!-- var sbar = document.getElementsByClassName("sphinxsidebar")[0]; let currents = document.getElementsByClassName("current") - sbar.scrollTop = currents[currents.length - 1].offsetTop; + if (currents.length) { + sbar.scrollTop = currents[currents.length - 1].offsetTop; + } --> </script> diff --git a/Documentation/sphinx/translations.py b/Documentation/sphinx/translations.py index 47161e6eba99..32c2b32b2b5e 100644 --- a/Documentation/sphinx/translations.py +++ b/Documentation/sphinx/translations.py @@ -29,10 +29,7 @@ all_languages = { } class LanguagesNode(nodes.Element): - def __init__(self, current_language, *args, **kwargs): - super().__init__(*args, **kwargs) - - self.current_language = current_language + pass class TranslationsTransform(Transform): default_priority = 900 @@ -49,7 +46,8 @@ class TranslationsTransform(Transform): # normalize docname to be the untranslated one docname = os.path.join(*components[2:]) - new_nodes = LanguagesNode(all_languages[this_lang_code]) + new_nodes = LanguagesNode() + new_nodes['current_language'] = all_languages[this_lang_code] for lang_code, lang_name in all_languages.items(): if lang_code == this_lang_code: @@ -84,7 +82,7 @@ def process_languages(app, doctree, docname): html_content = app.builder.templates.render('translations.html', context={ - 'current_language': node.current_language, + 'current_language': node['current_language'], 'languages': languages, }) diff --git a/Documentation/spi/spi-summary.rst b/Documentation/spi/spi-summary.rst index 33f05901ccf3..546de37d6caf 100644 --- a/Documentation/spi/spi-summary.rst +++ b/Documentation/spi/spi-summary.rst @@ -9,7 +9,7 @@ What is SPI? The "Serial Peripheral Interface" (SPI) is a synchronous four wire serial link used to connect microcontrollers to sensors, memory, and peripherals. It's a simple "de facto" standard, not complicated enough to acquire a -standardization body. SPI uses a master/slave configuration. +standardization body. SPI uses a host/target configuration. The three signal wires hold a clock (SCK, often on the order of 10 MHz), and parallel data lines with "Master Out, Slave In" (MOSI) or "Master In, @@ -19,14 +19,14 @@ commonly used. Each clock cycle shifts data out and data in; the clock doesn't cycle except when there is a data bit to shift. Not all data bits are used though; not every protocol uses those full duplex capabilities. -SPI masters use a fourth "chip select" line to activate a given SPI slave +SPI hosts use a fourth "chip select" line to activate a given SPI target device, so those three signal wires may be connected to several chips -in parallel. All SPI slaves support chipselects; they are usually active -low signals, labeled nCSx for slave 'x' (e.g. nCS0). Some devices have -other signals, often including an interrupt to the master. +in parallel. All SPI targets support chipselects; they are usually active +low signals, labeled nCSx for target 'x' (e.g. nCS0). Some devices have +other signals, often including an interrupt to the host. Unlike serial busses like USB or SMBus, even low level protocols for -SPI slave functions are usually not interoperable between vendors +SPI target functions are usually not interoperable between vendors (except for commodities like SPI memory chips). - SPI may be used for request/response style device protocols, as with @@ -43,10 +43,10 @@ SPI slave functions are usually not interoperable between vendors - Sometimes SPI is used to daisy-chain devices, like shift registers. -In the same way, SPI slaves will only rarely support any kind of automatic -discovery/enumeration protocol. The tree of slave devices accessible from -a given SPI master will normally be set up manually, with configuration -tables. +In the same way, SPI targets will only rarely support any kind of automatic +discovery/enumeration protocol. The tree of target devices accessible from +a given SPI host controller will normally be set up manually, with +configuration tables. SPI is only one of the names used by such four-wire protocols, and most controllers have no problem handling "MicroWire" (think of it as @@ -62,8 +62,8 @@ course they won't handle full duplex transfers. You may find such chips described as using "three wire" signaling: SCK, data, nCSx. (That data line is sometimes called MOMI or SISO.) -Microcontrollers often support both master and slave sides of the SPI -protocol. This document (and Linux) supports both the master and slave +Microcontrollers often support both host and target sides of the SPI +protocol. This document (and Linux) supports both the host and target sides of SPI interactions. @@ -75,7 +75,7 @@ protocol supported by every MMC or SD memory card. (The older "DataFlash" cards, predating MMC cards but using the same connectors and card shape, support only SPI.) Some PC hardware uses SPI flash for BIOS code. -SPI slave chips range from digital/analog converters used for analog +SPI target chips range from digital/analog converters used for analog sensors and codecs, to memory, to peripherals like USB controllers or Ethernet adapters; and more. @@ -118,8 +118,8 @@ starting low (CPOL=0) and data stabilized for sampling during the trailing clock edge (CPHA=1), that's SPI mode 1. Note that the clock mode is relevant as soon as the chipselect goes -active. So the master must set the clock to inactive before selecting -a slave, and the slave can tell the chosen polarity by sampling the +active. So the host must set the clock to inactive before selecting +a target, and the target can tell the chosen polarity by sampling the clock level when its select line goes active. That's why many devices support for example both modes 0 and 3: they don't care about polarity, and always clock data in/out on rising clock edges. @@ -142,13 +142,13 @@ There are two types of SPI driver, here called: Controller drivers ... controllers may be built into System-On-Chip - processors, and often support both Master and Slave roles. + processors, and often support both Controller and target roles. These drivers touch hardware registers and may use DMA. Or they can be PIO bitbangers, needing just GPIO pins. Protocol drivers ... these pass messages through the controller - driver to communicate with a Slave or Master device on the + driver to communicate with a target or Controller device on the other side of an SPI link. So for example one protocol driver might talk to the MTD layer to export @@ -179,22 +179,22 @@ shows up in sysfs in several locations:: /sys/bus/spi/drivers/D ... driver for one or more spi*.* devices /sys/class/spi_master/spiB ... symlink to a logical node which could hold - class related state for the SPI master controller managing bus "B". + class related state for the SPI host controller managing bus "B". All spiB.* devices share one physical SPI bus segment, with SCLK, MOSI, and MISO. /sys/devices/.../CTLR/slave ... virtual file for (un)registering the - slave device for an SPI slave controller. - Writing the driver name of an SPI slave handler to this file - registers the slave device; writing "(null)" unregisters the slave + target device for an SPI target controller. + Writing the driver name of an SPI target handler to this file + registers the target device; writing "(null)" unregisters the target device. - Reading from this file shows the name of the slave device ("(null)" + Reading from this file shows the name of the target device ("(null)" if not registered). /sys/class/spi_slave/spiB ... symlink to a logical node which could hold - class related state for the SPI slave controller on bus "B". When + class related state for the SPI target controller on bus "B". When registered, a single spiB.* device is present here, possible sharing - the physical SPI bus segment with other SPI slave devices. + the physical SPI bus segment with other SPI target devices. At this time, the only class-specific state is the bus number ("B" in "spiB"), so those /sys/class entries are only useful to quickly identify busses. @@ -270,10 +270,10 @@ same SOC controller is used. For example, on one board SPI might use an external clock, where another derives the SPI clock from current settings of some master clock. -Declare Slave Devices -^^^^^^^^^^^^^^^^^^^^^ +Declare target Devices +^^^^^^^^^^^^^^^^^^^^^^ -The second kind of information is a list of what SPI slave devices exist +The second kind of information is a list of what SPI target devices exist on the target board, often with some board-specific data needed for the driver to work correctly. @@ -316,7 +316,7 @@ sharing a bus with a device that interprets chipselect "backwards" is not possible until the infrastructure knows how to deselect it. Then your board initialization code would register that table with the SPI -infrastructure, so that it's available later when the SPI master controller +infrastructure, so that it's available later when the SPI host controller driver is registered:: spi_register_board_info(spi_board_info, ARRAY_SIZE(spi_board_info)); @@ -469,39 +469,39 @@ routines are available to allocate and zero-initialize an spi_message with several transfers. -How do I write an "SPI Master Controller Driver"? +How do I write an "SPI Controller Driver"? ------------------------------------------------- An SPI controller will probably be registered on the platform_bus; write a driver to bind to the device, whichever bus is involved. -The main task of this type of driver is to provide an "spi_master". -Use spi_alloc_master() to allocate the master, and spi_master_get_devdata() -to get the driver-private data allocated for that device. +The main task of this type of driver is to provide an "spi_controller". +Use spi_alloc_host() to allocate the host controller, and +spi_controller_get_devdata() to get the driver-private data allocated for that +device. :: - struct spi_master *master; + struct spi_controller *ctlr; struct CONTROLLER *c; - master = spi_alloc_master(dev, sizeof *c); - if (!master) + ctlr = spi_alloc_host(dev, sizeof *c); + if (!ctlr) return -ENODEV; - c = spi_master_get_devdata(master); + c = spi_controller_get_devdata(ctlr); -The driver will initialize the fields of that spi_master, including the -bus number (maybe the same as the platform device ID) and three methods -used to interact with the SPI core and SPI protocol drivers. It will -also initialize its own internal state. (See below about bus numbering -and those methods.) +The driver will initialize the fields of that spi_controller, including the bus +number (maybe the same as the platform device ID) and three methods used to +interact with the SPI core and SPI protocol drivers. It will also initialize +its own internal state. (See below about bus numbering and those methods.) -After you initialize the spi_master, then use spi_register_master() to +After you initialize the spi_controller, then use spi_register_controller() to publish it to the rest of the system. At that time, device nodes for the controller and any predeclared spi devices will be made available, and the driver model core will take care of binding them to drivers. -If you need to remove your SPI controller driver, spi_unregister_master() -will reverse the effect of spi_register_master(). +If you need to remove your SPI controller driver, spi_unregister_controller() +will reverse the effect of spi_register_controller(). Bus Numbering @@ -519,49 +519,49 @@ then be replaced by a dynamically assigned number. You'd then need to treat this as a non-static configuration (see above). -SPI Master Methods -^^^^^^^^^^^^^^^^^^ +SPI Host Controller Methods +^^^^^^^^^^^^^^^^^^^^^^^^^^^ -``master->setup(struct spi_device *spi)`` +``ctlr->setup(struct spi_device *spi)`` This sets up the device clock rate, SPI mode, and word sizes. Drivers may change the defaults provided by board_info, and then call spi_setup(spi) to invoke this routine. It may sleep. - Unless each SPI slave has its own configuration registers, don't + Unless each SPI target has its own configuration registers, don't change them right away ... otherwise drivers could corrupt I/O that's in progress for other SPI devices. .. note:: BUG ALERT: for some reason the first version of - many spi_master drivers seems to get this wrong. + many spi_controller drivers seems to get this wrong. When you code setup(), ASSUME that the controller is actively processing transfers for another device. -``master->cleanup(struct spi_device *spi)`` +``ctlr->cleanup(struct spi_device *spi)`` Your controller driver may use spi_device.controller_state to hold state it dynamically associates with that device. If you do that, be sure to provide the cleanup() method to free that state. -``master->prepare_transfer_hardware(struct spi_master *master)`` +``ctlr->prepare_transfer_hardware(struct spi_controller *ctlr)`` This will be called by the queue mechanism to signal to the driver that a message is coming in soon, so the subsystem requests the driver to prepare the transfer hardware by issuing this call. This may sleep. -``master->unprepare_transfer_hardware(struct spi_master *master)`` +``ctlr->unprepare_transfer_hardware(struct spi_controller *ctlr)`` This will be called by the queue mechanism to signal to the driver that there are no more messages pending in the queue and it may relax the hardware (e.g. by power management calls). This may sleep. -``master->transfer_one_message(struct spi_master *master, struct spi_message *mesg)`` +``ctlr->transfer_one_message(struct spi_controller *ctlr, struct spi_message *mesg)`` The subsystem calls the driver to transfer a single message while queuing transfers that arrive in the meantime. When the driver is finished with this message, it must call spi_finalize_current_message() so the subsystem can issue the next message. This may sleep. -``master->transfer_one(struct spi_master *master, struct spi_device *spi, struct spi_transfer *transfer)`` +``ctrl->transfer_one(struct spi_controller *ctlr, struct spi_device *spi, struct spi_transfer *transfer)`` The subsystem calls the driver to transfer a single transfer while queuing transfers that arrive in the meantime. When the driver is finished with this transfer, it must call @@ -576,15 +576,15 @@ SPI Master Methods * 0: transfer is finished * 1: transfer is still in progress -``master->set_cs_timing(struct spi_device *spi, u8 setup_clk_cycles, u8 hold_clk_cycles, u8 inactive_clk_cycles)`` - This method allows SPI client drivers to request SPI master controller +``ctrl->set_cs_timing(struct spi_device *spi, u8 setup_clk_cycles, u8 hold_clk_cycles, u8 inactive_clk_cycles)`` + This method allows SPI client drivers to request SPI host controller for configuring device specific CS setup, hold and inactive timing requirements. Deprecated Methods ^^^^^^^^^^^^^^^^^^ -``master->transfer(struct spi_device *spi, struct spi_message *message)`` +``ctrl->transfer(struct spi_device *spi, struct spi_message *message)`` This must not sleep. Its responsibility is to arrange that the transfer happens and its complete() callback is issued. The two will normally happen later, after other transfers complete, and diff --git a/Documentation/staging/rpmsg.rst b/Documentation/staging/rpmsg.rst index dba3e5f65612..3713adaa1608 100644 --- a/Documentation/staging/rpmsg.rst +++ b/Documentation/staging/rpmsg.rst @@ -157,7 +157,7 @@ Returns 0 on success and an appropriate error value on failure. int rpmsg_trysendto(struct rpmsg_endpoint *ept, void *data, int len, u32 dst) -sends a message across to the remote processor from a given endoint, +sends a message across to the remote processor from a given endpoint, to a destination address provided by the user. The user should specify the channel, the data it wants to send, diff --git a/Documentation/subsystem-apis.rst b/Documentation/subsystem-apis.rst index 2d353fb8ea26..74af50d2ef7f 100644 --- a/Documentation/subsystem-apis.rst +++ b/Documentation/subsystem-apis.rst @@ -61,6 +61,8 @@ Storage interfaces scsi/index target/index +Other subsystems +---------------- **Fixme**: much more organizational work is needed here. .. toctree:: diff --git a/Documentation/translations/it_IT/RCU/index.rst b/Documentation/translations/it_IT/RCU/index.rst new file mode 100644 index 000000000000..22adf1d58752 --- /dev/null +++ b/Documentation/translations/it_IT/RCU/index.rst @@ -0,0 +1,19 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _it_rcu_concepts: + +=============== +Concetti su RCU +=============== + +.. toctree:: + :maxdepth: 3 + + torture + +.. only:: subproject and html + + Indici + ====== + + * :ref:`genindex` diff --git a/Documentation/translations/it_IT/RCU/torture.rst b/Documentation/translations/it_IT/RCU/torture.rst new file mode 100644 index 000000000000..189f7c6caebc --- /dev/null +++ b/Documentation/translations/it_IT/RCU/torture.rst @@ -0,0 +1,369 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-ita.rst + +============================================= +Le operazioni RCU per le verifiche *torture* +============================================= + +CONFIG_RCU_TORTURE_TEST +======================= + +L'opzione CONFIG_RCU_TORTURE_TEST è disponibile per tutte le implementazione di +RCU. L'opzione creerà un modulo rcutorture che potrete caricare per avviare le +verifiche. La verifica userà printk() per riportare lo stato, dunque potrete +visualizzarlo con dmesg (magari usate grep per filtrare "torture"). Le verifiche +inizieranno al caricamento, e si fermeranno alla sua rimozione. + +I parametri di modulo hanno tutti il prefisso "rcutortute.", vedere +Documentation/admin-guide/kernel-parameters.txt. + +Rapporto +======== + +Il rapporto sulle verifiche si presenta nel seguente modo:: + + rcu-torture:--- Start of test: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 + rcu-torture: rtc: (null) ver: 155441 tfle: 0 rta: 155441 rtaf: 8884 rtf: 155440 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 3055767 + rcu-torture: Reader Pipe: 727860534 34213 0 0 0 0 0 0 0 0 0 + rcu-torture: Reader Batch: 727877838 17003 0 0 0 0 0 0 0 0 0 + rcu-torture: Free-Block Circulation: 155440 155440 155440 155440 155440 155440 155440 155440 155440 155440 0 + rcu-torture:--- End of test: SUCCESS: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 + +Sulla maggior parte dei sistemi questo rapporto si produce col comando "dmesg | +grep torture:". Su configurazioni più esoteriche potrebbe essere necessario +usare altri comandi per visualizzare i messaggi di printk(). La funzione +printk() usa KERN_ALERT, dunque i messaggi dovrebbero essere ben visibili. ;-) + +La prima e l'ultima riga mostrano i parametri di module di rcutorture, e solo +sull'ultima riga abbiamo il risultato finale delle verifiche effettuate che può +essere "SUCCESS" (successo) or "FAILURE" (insuccesso). + +Le voci sono le seguenti: + +* "rtc": L'indirizzo in esadecimale della struttura attualmente visibile dai + lettori. + +* "ver": Il numero di volte dall'avvio che il processo scrittore di RCU ha + cambiato la struttura visible ai lettori. + +* "tfle": se non è zero, indica la lista di strutture "torture freelist" da + mettere in "rtc" è vuota. Questa condizione è importante perché potrebbe + illuderti che RCU stia funzionando mentre invece non è il caso. :-/ + +* "rta": numero di strutture allocate dalla lista "torture freelist". + +* "rtaf": il numero di allocazioni fallite dalla lista "torture freelist" a + causa del fatto che fosse vuota. Non è inusuale che sia diverso da zero, ma è + un brutto segno se questo numero rappresenta una frazione troppo alta di + "rta". + +* "rtf": il numero di rilasci nella lista "torture freelist" + +* "rtmbe": Un valore diverso da zero indica che rcutorture crede che + rcu_assign_pointer() e rcu_dereference() non funzionino correttamente. Il + valore dovrebbe essere zero. + +* "rtbe": un valore diverso da zero indica che le funzioni della famiglia + rcu_barrier() non funzionano correttamente. + +* "rtbke": rcutorture è stato capace di creare dei kthread real-time per forzare + l'inversione di priorità di RCU. Il valore dovrebbe essere zero. + +* "rtbre": sebbene rcutorture sia riuscito a creare dei kthread capaci di + forzare l'inversione di priorità, non è riuscito però ad impostarne la + priorità real-time al livello 1. Il valore dovrebbe essere zero. + +* "rtbf": Il numero di volte che è fallita la promozione della priorità per + risolvere un'inversione. + +* "rtb": Il numero di volte che rcutorture ha provato a forzare l'inversione di + priorità. Il valore dovrebbe essere diverso da zero Se state verificando la + promozione della priorità col parametro "test_bootst". + +* "nt": il numero di volte che rcutorture ha eseguito codice lato lettura + all'interno di un gestore di *timer*. Questo valore dovrebbe essere diverso da + zero se avete specificato il parametro "irqreader". + +* "Reader Pipe": un istogramma dell'età delle strutture viste dai lettori. RCU + non funziona correttamente se una qualunque voce, dalla terza in poi, ha un + valore diverso da zero. Se dovesse succedere, rcutorture stampa la stringa + "!!!" per renderlo ben visibile. L'età di una struttura appena creata è zero, + diventerà uno quando sparisce dalla visibilità di un lettore, e incrementata + successivamente per ogni periodo di grazia; infine rilasciata dopo essere + passata per (RCU_TORTURE_PIPE_LEN-2) periodi di grazia. + + L'istantanea qui sopra è stata presa da una corretta implementazione di RCU. + Se volete vedere come appare quando non funziona, sbizzarritevi nel romperla. + ;-) + +* "Reader Batch": un istogramma di età di strutture viste dai lettori, ma + conteggiata in termini di lotti piuttosto che periodi. Anche qui dalla terza + voce in poi devono essere zero. La ragione d'esistere di questo rapporto è che + a volte è più facile scatenare un terzo valore diverso da zero qui piuttosto + che nella lista "Reader Pipe". + +* "Free-Block Circulation": il numero di strutture *torture* che hanno raggiunto + un certo punto nella catena. Il primo numero dovrebbe corrispondere + strettamente al numero di strutture allocate; il secondo conta quelle rimosse + dalla vista dei lettori. Ad eccezione dell'ultimo valore, gli altri + corrispondono al numero di passaggi attraverso il periodo di grazia. L'ultimo + valore dovrebbe essere zero, perché viene incrementato solo se il contatore + della struttura torture viene in un qualche modo incrementato oltre il + normale. + +Una diversa implementazione di RCU potrebbe fornire informazioni aggiuntive. Per +esempio, *Tree SRCU* fornisce anche la seguente riga:: + + srcud-torture: Tree SRCU per-CPU(idx=0): 0(35,-21) 1(-4,24) 2(1,1) 3(-26,20) 4(28,-47) 5(-9,4) 6(-10,14) 7(-14,11) T(1,6) + +Questa riga mostra lo stato dei contatori per processore, in questo caso per +*Tree SRCU*, usando un'allocazione dinamica di srcu_struct (dunque "srcud-" +piuttosto che "srcu-"). I numeri fra parentesi sono i valori del "vecchio" +contatore e di quello "corrente" per ogni processore. Il valore "idx" mappa +questi due valori nell'array, ed è utile per il *debug*. La "T" finale contiene +il valore totale dei contatori. + +Uso su specifici kernel +======================= + +A volte può essere utile eseguire RCU torture su un kernel già compilato, ad +esempio quando lo si sta per mettere in proeduzione. In questo caso, il kernel +dev'essere compilato con CONFIG_RCU_TORTURE_TEST=m, cosicché le verifiche possano +essere avviate usano modprobe e terminate con rmmod. + +Per esempio, potreste usare questo script:: + + #!/bin/sh + + modprobe rcutorture + sleep 3600 + rmmod rcutorture + dmesg | grep torture: + +Potete controllare il rapporto verificando manualmente la presenza del marcatore +di errore "!!!". Ovviamente, siete liberi di scriverne uno più elaborato che +identifichi automaticamente gli errori. Il comando "rmmod" forza la stampa di +"SUCCESS" (successo), "FAILURE" (fallimento), o "RCU_HOTPLUG". I primi due sono +autoesplicativi; invece, l'ultimo indica che non son stati trovati problemi in +RCU, tuttavia ci sono stati problemi con CPU-hotplug. + + +Uso sul kernel di riferimento +============================= + +Quando si usa rcutorture per verificare modifiche ad RCU stesso, spesso è +necessario compilare un certo numero di kernel usando configurazioni diverse e +con parametri d'avvio diversi. In questi casi, usare modprobe ed rmmod potrebbe +richiedere molto tempo ed il processo essere suscettibile ad errori. + +Dunque, viene messo a disposizione il programma +tools/testing/selftests/rcutorture/bin/kvm.sh per le architetture x86, arm64 e +powerpc. Di base, eseguirà la serie di verifiche elencate in +tools/testing/selftests/rcutorture/configs/rcu/CFLIST. Ognuna di queste verrà +eseguita per 30 minuti in una macchina virtuale con uno spazio utente minimale +fornito da un initrd generato automaticamente. Al completamento, gli artefatti +prodotti e i messaggi vengono analizzati alla ricerca di errori, ed i risultati +delle esecuzioni riassunti in un rapporto. + +Su grandi sistemi, le verifiche di rcutorture posso essere velocizzare passano a +kvm.sh l'argomento --cpus. Per esempio, su un sistema a 64 processori, "--cpus +43" userà fino a 43 processori per eseguire contemporaneamente le verifiche. Su +un kernel v5.4 per eseguire tutti gli scenari in due serie, riduce il tempo +d'esecuzione da otto ore a un'ora (senza contare il tempo per compilare sedici +kernel). L'argomento "--dryrun sched" non eseguirà verifiche, piuttosto vi +informerà su come queste verranno organizzate in serie. Questo può essere utile +per capire quanti processori riservare per le verifiche in --cpus. + +Non serve eseguire tutti gli scenari di verifica per ogni modifica. Per esempio, +per una modifica a Tree SRCU potete eseguire gli scenari SRCU-N e SRCU-P. Per +farlo usate l'argomento --configs di kvm.sh in questo modo: "--configs 'SRCU-N +SRCU-P'". Su grandi sistemi si possono eseguire più copie degli stessi scenari, +per esempio, un hardware che permette di eseguire 448 thread, può eseguire 5 +istanze complete contemporaneamente. Per farlo:: + + kvm.sh --cpus 448 --configs '5*CFLIST' + +Oppure, lo stesso sistema, può eseguire contemporaneamente 56 istanze dello +scenario su otto processori:: + + kvm.sh --cpus 448 --configs '56*TREE04' + +O ancora 28 istanze per ogni scenario su otto processori:: + + kvm.sh --cpus 448 --configs '28*TREE03 28*TREE04' + +Ovviamente, ogni esecuzione utilizzerà della memoria. Potete limitarne l'uso con +l'argomento --memory, che di base assume il valore 512M. Per poter usare valori +piccoli dovrete disabilitare le verifiche *callback-flooding* usando il +parametro --bootargs che vedremo in seguito. + +A volte è utile avere informazioni aggiuntive di debug, in questo caso potete +usare il parametro --kconfig, per esempio, ``--kconfig +'CONFIG_RCU_EQS_DEBUG=y'``. In aggiunta, ci sono i parametri --gdb, --kasan, and +kcsan. Da notare che --gdb vi limiterà all'uso di un solo scenario per +esecuzione di kvm.sh e richiede di avere anche un'altra finestra aperta dalla +quale eseguire ``gdb`` come viene spiegato dal programma. + +Potete passare anche i parametri d'avvio del kernel, per esempio, per +controllare i parametri del modulo rcutorture. Per esempio, per verificare +modifiche del codice RCU CPU stall-warning, usate ``bootargs +'rcutorture.stall_cpu=30``. Il programma riporterà un fallimento, ossia il +risultato della verifica. Come visto in precedenza, ridurre la memoria richiede +la disabilitazione delle verifiche *callback-flooding*:: + + kvm.sh --cpus 448 --configs '56*TREE04' --memory 128M \ + --bootargs 'rcutorture.fwd_progress=0' + +A volte tutto quello che serve è una serie completa di compilazioni del kernel. +Questo si ottiene col parametro --buildonly. + +Il parametro --duration sovrascrive quello di base di 30 minuti. Per esempio, +con ``--duration 2d`` l'esecuzione sarà di due giorni, ``--duraction 5min`` di +cinque minuti, e ``--duration 45s`` di 45 secondi. L'ultimo può essere utile per +scovare rari errori nella sequenza d'avvio. + +Infine, il parametro --trust-make permette ad ogni nuova compilazione del kernel +di riutilizzare tutto il possibile da quelle precedenti. Da notare che senza il +parametro --trust-make, i vostri file di *tag* potrebbero essere distrutti. + +Ci sono altri parametri più misteriosi che sono documentati nel codice sorgente +dello programma kvm.sh. + +Se un'esecuzione contiene degli errori, il loro numero durante la compilazione e +all'esecuzione verranno elencati alla fine fra i risultati di kvm.sh (che vi +consigliamo caldamente di reindirizzare verso un file). I file prodotti dalla +compilazione ed i risultati stampati vengono salvati, usando un riferimento +temporale, nelle cartella tools/testing/selftests/rcutorture/res. Una cartella +di queste cartelle può essere fornita a kvm-find-errors.sh per estrarne gli +errori. Per esempio:: + + tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh \ + tools/testing/selftests/rcutorture/res/2020.01.20-15.54.23 + +Tuttavia, molto spesso è più conveniente aprire i file direttamente. I file +riguardanti tutti gli scenari di un'esecuzione di trovano nella cartella +principale (2020.01.20-15.54.23 nell'esempio precedente), mentre quelli +specifici per scenario si trovano in sotto cartelle che prendono il nome dello +scenario stesso (per esempio, "TREE04"). Se un dato scenario viene eseguito più +di una volta (come abbiamo visto con "--configs '56*TREE04'"), allora dalla +seconda esecuzione in poi le sottocartelle includeranno un numero di +progressione, per esempio "TREE04.2", "TREE04.3", e via dicendo. + +Il file solitamente più usato nella cartella principale è testid.txt. Se la +verifica viene eseguita in un repositorio git, allora questo file conterrà il +*commit* sul quale si basano le verifiche, mentre tutte le modifiche non +registrare verranno mostrate in formato diff. + +I file solitamente più usati nelle cartelle di scenario sono: + +.config + Questo file contiene le opzioni di Kconfig + +Make.out + Questo file contiene il risultato di compilazione per uno specifico scenario + +console.log + Questo file contiene il risultato d'esecuzione per uno specifico scenario. + Questo file può essere esaminato una volta che il kernel è stato avviato, + ma potrebbe non esistere se l'avvia non è fallito. + +vmlinux + Questo file contiene il kernel, e potrebbe essere utile da esaminare con + programmi come pbjdump e gdb + +Ci sono altri file, ma vengono usati meno. Molti sono utili all'analisi di +rcutorture stesso o dei suoi programmi. + +Nel kernel v5.4, su un sistema a 12 processori, un'esecuzione senza errori +usando gli scenari di base produce il seguente risultato:: + + SRCU-N ------- 804233 GPs (148.932/s) [srcu: g10008272 f0x0 ] + SRCU-P ------- 202320 GPs (37.4667/s) [srcud: g1809476 f0x0 ] + SRCU-t ------- 1122086 GPs (207.794/s) [srcu: g0 f0x0 ] + SRCU-u ------- 1111285 GPs (205.794/s) [srcud: g1 f0x0 ] + TASKS01 ------- 19666 GPs (3.64185/s) [tasks: g0 f0x0 ] + TASKS02 ------- 20541 GPs (3.80389/s) [tasks: g0 f0x0 ] + TASKS03 ------- 19416 GPs (3.59556/s) [tasks: g0 f0x0 ] + TINY01 ------- 836134 GPs (154.84/s) [rcu: g0 f0x0 ] n_max_cbs: 34198 + TINY02 ------- 850371 GPs (157.476/s) [rcu: g0 f0x0 ] n_max_cbs: 2631 + TREE01 ------- 162625 GPs (30.1157/s) [rcu: g1124169 f0x0 ] + TREE02 ------- 333003 GPs (61.6672/s) [rcu: g2647753 f0x0 ] n_max_cbs: 35844 + TREE03 ------- 306623 GPs (56.782/s) [rcu: g2975325 f0x0 ] n_max_cbs: 1496497 + CPU count limited from 16 to 12 + TREE04 ------- 246149 GPs (45.5831/s) [rcu: g1695737 f0x0 ] n_max_cbs: 434961 + TREE05 ------- 314603 GPs (58.2598/s) [rcu: g2257741 f0x2 ] n_max_cbs: 193997 + TREE07 ------- 167347 GPs (30.9902/s) [rcu: g1079021 f0x0 ] n_max_cbs: 478732 + CPU count limited from 16 to 12 + TREE09 ------- 752238 GPs (139.303/s) [rcu: g13075057 f0x0 ] n_max_cbs: 99011 + +Ripetizioni +=========== + +Immaginate di essere alla caccia di un raro problema che si verifica all'avvio. +Potreste usare kvm.sh, tuttavia questo ricompilerebbe il kernel ad ogni +esecuzione. Se avete bisogno di (diciamo) 1000 esecuzioni per essere sicuri di +aver risolto il problema, allora queste inutili ricompilazioni possono diventare +estremamente fastidiose. + +Per questo motivo esiste kvm-again.sh. + +Immaginate che un'esecuzione precedente di kvm.sh abbia lasciato i suoi +artefatti nella cartella:: + + tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28 + +Questa esecuzione può essere rieseguita senza ricompilazioni:: + + kvm-again.sh tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28 + +Alcuni dei parametri originali di kvm.sh possono essere sovrascritti, in +particolare --duration e --bootargs. Per esempio:: + + kvm-again.sh tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28 \ + --duration 45s + +rieseguirebbe il test precedente, ma solo per 45 secondi, e quindi aiutando a +trovare quel raro problema all'avvio sopracitato. + +Esecuzioni distribuite +====================== + +Sebbene kvm.sh sia utile, le sue verifiche sono limitate ad un singolo sistema. +Non è poi così difficile usare un qualsiasi ambiente di sviluppo per eseguire +(diciamo) 5 istanze di kvm.sh su altrettanti sistemi, ma questo avvierebbe +inutili ricompilazioni del kernel. In aggiunta, il processo di distribuzione +degli scenari di verifica per rcutorture sui sistemi disponibili richiede +scrupolo perché soggetto ad errori. + +Per questo esiste kvm-remote.sh. + +Se il seguente comando funziona:: + + ssh system0 date + +e funziona anche per system1, system2, system3, system4, e system5, e tutti +questi sistemi hanno 64 CPU, allora potere eseguire:: + + kvm-remote.sh "system0 system1 system2 system3 system4 system5" \ + --cpus 64 --duration 8h --configs "5*CFLIST" + +Questo compilerà lo scenario di base sul sistema locale, poi lo distribuirà agli +altri cinque sistemi elencati fra i parametri, ed eseguirà ogni scenario per +otto ore. Alla fine delle esecuzioni, i risultati verranno raccolti, registrati, +e stampati. La maggior parte dei parametri di kvm.sh possono essere usati con +kvm-remote.sh, tuttavia la lista dei sistemi deve venire sempre per prima. + +L'argomento di kvm.sh ``--dryrun scenarios`` può essere utile per scoprire +quanti scenari potrebbero essere eseguiti in gruppo di sistemi. + +Potete rieseguire anche una precedente esecuzione remota come abbiamo già fatto +per kvm.sh:: + + kvm-remote.sh "system0 system1 system2 system3 system4 system5" \ + tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28-remote \ + --duration 24h + +In questo caso, la maggior parte dei parametri di kvm-again.sh possono essere +usati dopo il percorso alla cartella contenente gli artefatti dell'esecuzione da +ripetere. diff --git a/Documentation/translations/it_IT/core-api/index.rst b/Documentation/translations/it_IT/core-api/index.rst index cc4c4328ad03..dad20402d11b 100644 --- a/Documentation/translations/it_IT/core-api/index.rst +++ b/Documentation/translations/it_IT/core-api/index.rst @@ -10,6 +10,18 @@ Utilità di base symbol-namespaces +Primitive di sincronizzazione +============================= + +Come Linux impedisce che tutto si verifichi contemporaneamente. Consultate +Documentation/translations/it_IT/locking/index.rst per maggiorni informazioni +sul tema. + +.. toctree:: + :maxdepth: 1 + + ../RCU/index + .. only:: subproject and html Indices diff --git a/Documentation/translations/it_IT/i2c/i2c-protocol.rst b/Documentation/translations/it_IT/i2c/i2c-protocol.rst new file mode 100644 index 000000000000..ba7c8cd8f560 --- /dev/null +++ b/Documentation/translations/it_IT/i2c/i2c-protocol.rst @@ -0,0 +1,99 @@ +================= +Il protocollo I2C +================= + +Questo documento è una panoramica delle transazioni di base I2C e delle API +del kernel per eseguirli. + +Spiegazione dei simboli +======================= + +=============== =========================================================== +S Condizione di avvio +P Condizione di stop +Rd/Wr (1 bit) Bit di lettura/scrittura. Rd vale 1, Wr vale 0. +A, NA (1 bit) Bit di riconoscimento (ACK) e di non riconoscimento (NACK). +Addr (7 bit) Indirizzo I2C a 7 bit. Nota che questo può essere espanso + per ottenere un indirizzo I2C a 10 bit. +Dati (8 bit) Un byte di dati. + +[..] Fra parentesi quadre i dati inviati da dispositivi I2C, + anziché dal master. +=============== =========================================================== + + +Transazione semplice di invio +============================= + +Implementato da i2c_master_send():: + + S Addr Wr [A] Dati [A] Dati [A] ... [A] Dati [A] P + + +Transazione semplice di ricezione +================================= + +Implementato da i2c_master_recv():: + + S Addr Rd [A] [Dati] A [Dati] A ... A [Dati] NA P + + +Transazioni combinate +===================== + +Implementato da i2c_transfer(). + +Sono come le transazioni di cui sopra, ma invece di uno condizione di stop P +viene inviata una condizione di inizio S e la transazione continua. +Un esempio di lettura di un byte, seguita da una scrittura di un byte:: + + S Addr Rd [A] [Dati] NA S Addr Wr [A] Dati [A] P + + +Transazioni modificate +====================== + +Le seguenti modifiche al protocollo I2C possono essere generate +impostando questi flag per i messaggi I2C. Ad eccezione di I2C_M_NOSTART, sono +di solito necessari solo per risolvere problemi di un dispositivo: + +I2C_M_IGNORE_NAK: + Normalmente il messaggio viene interrotto immediatamente se il dispositivo + risponde con [NA]. Impostando questo flag, si considera qualsiasi [NA] come + [A] e tutto il messaggio viene inviato. + Questi messaggi potrebbero comunque non riuscire a raggiungere il timeout + SCL basso->alto. + +I2C_M_NO_RD_ACK: + In un messaggio di lettura, il bit A/NA del master viene saltato. + +I2C_M_NOSTART: + In una transazione combinata, potrebbe non essere generato alcun + "S Addr Wr/Rd [A]". + Ad esempio, impostando I2C_M_NOSTART sul secondo messaggio parziale + genera qualcosa del tipo:: + + S Addr Rd [A] [Dati] NA Dati [A] P + + Se si imposta il flag I2C_M_NOSTART per il primo messaggio parziale, + non viene generato Addr, ma si genera la condizione di avvio S. + Questo probabilmente confonderà tutti gli altri dispositivi sul bus, quindi + meglio non usarlo. + + Questo viene spesso utilizzato per raccogliere le trasmissioni da più + buffer di dati presenti nella memoria di sistema in qualcosa che appare + come un singolo trasferimento verso il dispositivo I2C. Inoltre, alcuni + dispositivi particolari lo utilizzano anche tra i cambi di direzione. + +I2C_M_REV_DIR_ADDR: + Questo inverte il flag Rd/Wr. Cioè, se si vuole scrivere, ma si ha bisogno + di emettere una Rd invece di una Wr, o viceversa, si può impostare questo + flag. + Per esempio:: + + S Addr Rd [A] Dati [A] Dati [A] ... [A] Dati [A] P + +I2C_M_STOP: + Forza una condizione di stop (P) dopo il messaggio. Alcuni protocolli + simili a I2C come SCCB lo richiedono. Normalmente, non si vuole essere + interrotti tra i messaggi di un trasferimento. diff --git a/Documentation/translations/it_IT/i2c/index.rst b/Documentation/translations/it_IT/i2c/index.rst new file mode 100644 index 000000000000..14fbe3d78299 --- /dev/null +++ b/Documentation/translations/it_IT/i2c/index.rst @@ -0,0 +1,46 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================= +Il sottosistema I2C/SMBus +========================= + +Introduzione +============ + +.. toctree:: + :maxdepth: 1 + + summary + i2c-protocol + +Scrivere un device driver +========================= + +.. toctree:: + :maxdepth: 1 + +Debugging +========= + +.. toctree:: + :maxdepth: 1 + +Slave I2C +========= + +.. toctree:: + :maxdepth: 1 + + +Argomenti avanzati +================== + +.. toctree:: + :maxdepth: 1 + +.. only:: subproject and html + + Indici + ====== + + * :ref:`genindex` diff --git a/Documentation/translations/it_IT/i2c/summary.rst b/Documentation/translations/it_IT/i2c/summary.rst new file mode 100644 index 000000000000..1535e13a32e2 --- /dev/null +++ b/Documentation/translations/it_IT/i2c/summary.rst @@ -0,0 +1,64 @@ +========================== +Introduzione a I2C e SMBus +========================== + +I²C (letteralmente "I al quadrato C" e scritto I2C nella documentazione del +kernel) è un protocollo sviluppato da Philips. É un protocollo lento a 2 fili +(a velocità variabile, al massimo 400KHz), con un'estensione per le velocità +elevate (3.4 MHz). Questo protocollo offre un bus a basso costo per collegare +dispositivi di vario genere a cui si accede sporadicamente e utilizzando +poca banda. Alcuni sistemi usano varianti che non rispettano i requisiti +originali, per cui non sono indicati come I2C, ma hanno nomi diversi, per +esempio TWI (Interfaccia a due fili), IIC. + +L'ultima specifica ufficiale I2C è la `"Specifica I2C-bus e manuale utente" +(UM10204) <https://www.nxp.com/webapp/Download?colCode=UM10204>`_ +pubblicata da NXP Semiconductors. Tuttavia, è necessario effettuare il login +al sito per accedere al PDF. Una versione precedente della specifica +(revisione 6) è archiviata +`qui <https://web.archive.org/web/20210813122132/ +https://www.nxp.com/docs/en/user-guide/UM10204.pdf>`_. + +SMBus (Bus per la gestione del sistema) si basa sul protocollo I2C ed è +principalmente un sottoinsieme di protocolli e segnali I2C. Molti dispositivi +I2C funzioneranno su SMBus, ma alcuni protocolli SMBus aggiungono semantica +oltre quanto richiesto da I2C. Le moderne schede madri dei PC si affidano a +SMBus. I più comuni dispositivi collegati tramite SMBus sono moduli RAM +configurati utilizzando EEPROM I2C, e circuiti integrati di monitoraggio +hardware. + +Poiché SMBus è principalmente un sottoinsieme del bus I2C, +possiamo farne uso su molti sistemi I2C. Ci sono però sistemi che non +soddisfano i vincoli elettrici sia di SMBus che di I2C; e altri che non possono +implementare tutta la semantica o messaggi comuni del protocollo SMBus. + + +Terminologia +============ + +Utilizzando la terminologia della documentazione ufficiale, il bus I2C connette +uno o più circuiti integrati *master* e uno o più circuiti integrati *slave*. + +.. kernel-figure:: ../../../i2c/i2c_bus.svg + :alt: Un semplice bus I2C con un master e 3 slave + + Un semplice Bus I2C + +Un circuito integrato **master** è un nodo che inizia le comunicazioni con gli +slave. Nell'implementazione del kernel Linux è chiamato **adattatore** o bus. I +driver degli adattatori si trovano nella sottocartella ``drivers/i2c/busses/``. + +Un **algoritmo** contiene codice generico che può essere utilizzato per +implementare una intera classe di adattatori I2C. Ciascun driver dell' +adattatore specifico dipende da un driver dell'algoritmo nella sottocartella +``drivers/i2c/algos/`` o include la propria implementazione. + +Un circuito integrato **slave** è un nodo che risponde alle comunicazioni +quando indirizzato dal master. In Linux è chiamato **client** (dispositivo). I +driver dei dispositivi sono contenuti in una cartella specifica per la +funzionalità che forniscono, ad esempio ``drivers/media/gpio/`` per espansori +GPIO e ``drivers/media/i2c/`` per circuiti integrati relativi ai video. + +Per la configurazione di esempio in figura, avrai bisogno di un driver per il +tuo adattatore I2C e driver per i tuoi dispositivi I2C (solitamente un driver +per ciascuno dispositivo). diff --git a/Documentation/translations/it_IT/index.rst b/Documentation/translations/it_IT/index.rst index b95dfa1ded04..70ccd23b2cde 100644 --- a/Documentation/translations/it_IT/index.rst +++ b/Documentation/translations/it_IT/index.rst @@ -91,6 +91,8 @@ interfacciarsi con il resto del kernel. :maxdepth: 1 core-api/index + Sincronizzazione nel kernel <locking/index> + subsystem-apis Strumenti e processi per lo sviluppo ==================================== diff --git a/Documentation/translations/it_IT/locking/index.rst b/Documentation/translations/it_IT/locking/index.rst new file mode 100644 index 000000000000..19963d33e84d --- /dev/null +++ b/Documentation/translations/it_IT/locking/index.rst @@ -0,0 +1,20 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================ +Sincronizzazione +================ + +.. toctree:: + :maxdepth: 1 + + locktypes + lockdep-design + lockstat + locktorture + +.. only:: subproject and html + + Indici + ====== + + * :ref:`genindex` diff --git a/Documentation/translations/it_IT/locking/lockdep-design.rst b/Documentation/translations/it_IT/locking/lockdep-design.rst new file mode 100644 index 000000000000..9ed00d8cf280 --- /dev/null +++ b/Documentation/translations/it_IT/locking/lockdep-design.rst @@ -0,0 +1,678 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-ita.rst + +Validatore di sincronizzazione durante l'esecuzione +=================================================== + +Classi di blocchi +----------------- + +L'oggetto su cui il validatore lavora è una "classe" di blocchi. + +Una classe di blocchi è un gruppo di blocchi che seguono le stesse regole di +sincronizzazione, anche quando i blocchi potrebbero avere più istanze (anche +decine di migliaia). Per esempio un blocco nella struttura inode è una classe, +mentre ogni inode sarà un'istanza di questa classe di blocco. + +Il validatore traccia lo "stato d'uso" di una classe di blocchi e le sue +dipendenze con altre classi. L'uso di un blocco indica come quel blocco viene +usato rispetto al suo contesto d'interruzione, mentre le dipendenze di un blocco +possono essere interpretate come il loro ordine; per esempio L1 -> L2 suggerisce +che un processo cerca di acquisire L2 mentre già trattiene L1. Dal punto di +vista di lockdep, i due blocchi (L1 ed L2) non sono per forza correlati: quella +dipendenza indica solamente l'ordine in cui sono successe le cose. Il validatore +verifica permanentemente la correttezza dell'uso dei blocchi e delle loro +dipendenze, altrimenti ritornerà un errore. + +Il comportamento di una classe di blocchi viene costruito dall'insieme delle sue +istanze. Una classe di blocco viene registrata alla creazione della sua prima +istanza, mentre tutte le successive istanze verranno mappate; dunque, il loro +uso e le loro dipendenze contribuiranno a costruire quello della classe. Una +classe di blocco non sparisce quando sparisce una sua istanza, ma può essere +rimossa quando il suo spazio in memoria viene reclamato. Per esempio, questo +succede quando si rimuove un modulo, o quando una *workqueue* viene eliminata. + +Stato +----- + +Il validatore traccia l'uso cronologico delle classi di blocchi e ne divide +l'uso in categorie (4 USI * n STATI + 1). + +I quattro USI possono essere: + +- 'sempre trattenuto nel contesto <STATO>' +- 'sempre trattenuto come blocco di lettura nel contesto <STATO>' +- 'sempre trattenuto con <STATO> abilitato' +- 'sempre trattenuto come blocco di lettura con <STATO> abilitato' + +gli `n` STATI sono codificati in kernel/locking/lockdep_states.h, ad oggi +includono: + +- hardirq +- softirq + +infine l'ultima categoria è: + +- 'sempre trattenuto' [ == !unused ] + +Quando vengono violate le regole di sincronizzazione, questi bit di utilizzo +vengono presentati nei messaggi di errore di sincronizzazione, fra parentesi +graffe, per un totale di `2 * n` (`n`: bit STATO). Un esempio inventato:: + + modprobe/2287 is trying to acquire lock: + (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24 + + but task is already holding lock: + (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24 + +Per un dato blocco, da sinistra verso destra, la posizione del bit indica l'uso +del blocco e di un eventuale blocco di lettura, per ognuno degli `n` STATI elencati +precedentemente. Il carattere mostrato per ogni bit indica: + + === =========================================================================== + '.' acquisito con interruzioni disabilitate fuori da un contesto d'interruzione + '-' acquisito in contesto d'interruzione + '+' acquisito con interruzioni abilitate + '?' acquisito in contesto d'interruzione con interruzioni abilitate + === =========================================================================== + +Il seguente esempio mostra i bit:: + + (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24 + |||| + ||| \-> softirq disabilitati e fuori da un contesto di softirq + || \--> acquisito in un contesto di softirq + | \---> hardirq disabilitati e fuori da un contesto di hardirq + \----> acquisito in un contesto di hardirq + +Per un dato STATO, che il blocco sia mai stato acquisito in quel contesto di +STATO, o che lo STATO sia abilitato, ci lascia coi quattro possibili scenari +mostrati nella seguente tabella. Il carattere associato al bit indica con +esattezza in quale scenario ci si trova al momento del rapporto. + + +---------------+---------------+------------------+ + | | irq abilitati | irq disabilitati | + +---------------+---------------+------------------+ + | sempre in irq | '?' | '-' | + +---------------+---------------+------------------+ + | mai in irq | '+' | '.' | + +---------------+---------------+------------------+ + +Il carattere '-' suggerisce che le interruzioni sono disabilitate perché +altrimenti verrebbe mostrato il carattere '?'. Una deduzione simile può essere +fatta anche per '+' + +I blocchi inutilizzati (ad esempio i mutex) non possono essere fra le cause di +un errore. + +Regole dello stato per un blocco singolo +---------------------------------------- + +Avere un blocco sicuro in interruzioni (*irq-safe*) significa che è sempre stato +usato in un contesto d'interruzione, mentre un blocco insicuro in interruzioni +(*irq-unsafe*) significa che è sempre stato acquisito con le interruzioni +abilitate. + +Una classe softirq insicura è automaticamente insicura anche per hardirq. I +seguenti stati sono mutualmente esclusivi: solo una può essere vero quando viene +usata una classe di blocco:: + + <hardirq-safe> o <hardirq-unsafe> + <softirq-safe> o <softirq-unsafe> + +Questo perché se un blocco può essere usato in un contesto di interruzioni +(sicuro in interruzioni), allora non può mai essere acquisito con le +interruzioni abilitate (insicuro in interruzioni). Altrimenti potrebbe +verificarsi uno stallo. Per esempio, questo blocco viene acquisito, ma prima di +essere rilasciato il contesto d'esecuzione viene interrotto nuovamente, e quindi +si tenterà di acquisirlo nuovamente. Questo porterà ad uno stallo, in +particolare uno stallo ricorsivo. + +Il validatore rileva e riporta gli usi di blocchi che violano queste regole per +blocchi singoli. + +Regole per le dipendenze di blocchi multipli +-------------------------------------------- + +La stessa classe di blocco non deve essere acquisita due volte, questo perché +potrebbe portare ad uno blocco ricorsivo e dunque ad uno stallo. + +Inoltre, due blocchi non possono essere trattenuti in ordine inverso:: + + <L1> -> <L2> + <L2> -> <L1> + +perché porterebbe ad uno stallo - chiamato stallo da blocco inverso - in cui si +cerca di trattenere i due blocchi in un ciclo in cui entrambe i contesti +aspettano per sempre che l'altro termini. Il validatore è in grado di trovare +queste dipendenze cicliche di qualsiasi complessità, ovvero nel mezzo ci +potrebbero essere altre sequenze di blocchi. Il validatore troverà se questi +blocchi possono essere acquisiti circolarmente. + +In aggiunta, le seguenti sequenze di blocco nei contesti indicati non sono +permesse, indipendentemente da quale che sia la classe di blocco:: + + <hardirq-safe> -> <hardirq-unsafe> + <softirq-safe> -> <softirq-unsafe> + +La prima regola deriva dal fatto che un blocco sicuro in interruzioni può essere +trattenuto in un contesto d'interruzione che, per definizione, ha la possibilità +di interrompere un blocco insicuro in interruzioni; questo porterebbe ad uno +stallo da blocco inverso. La seconda, analogamente, ci dice che un blocco sicuro +in interruzioni software potrebbe essere trattenuto in un contesto di +interruzione software, dunque potrebbe interrompere un blocco insicuro in +interruzioni software. + +Le suddette regole vengono applicate per qualsiasi sequenza di blocchi: quando +si acquisiscono nuovi blocchi, il validatore verifica se vi è una violazione +delle regole fra il nuovo blocco e quelli già trattenuti. + +Quando una classe di blocco cambia stato, applicheremo le seguenti regole: + +- se viene trovato un nuovo blocco sicuro in interruzioni, verificheremo se + abbia mai trattenuto dei blocchi insicuri in interruzioni. + +- se viene trovato un nuovo blocco sicuro in interruzioni software, + verificheremo se abbia trattenuto dei blocchi insicuri in interruzioni + software. + +- se viene trovato un nuovo blocco insicuro in interruzioni, verificheremo se + abbia trattenuto dei blocchi sicuri in interruzioni. + +- se viene trovato un nuovo blocco insicuro in interruzioni software, + verificheremo se abbia trattenuto dei blocchi sicuri in interruzioni + software. + +(Di nuovo, questi controlli vengono fatti perché un contesto d'interruzione +potrebbe interrompere l'esecuzione di qualsiasi blocco insicuro portando ad uno +stallo; questo anche se lo stallo non si verifica in pratica) + +Eccezione: dipendenze annidate sui dati portano a blocchi annidati +------------------------------------------------------------------ + +Ci sono alcuni casi in cui il kernel Linux acquisisce più volte la stessa +istanza di una classe di blocco. Solitamente, questo succede quando esiste una +gerarchia fra oggetti dello stesso tipo. In questi casi viene ereditato +implicitamente l'ordine fra i due oggetti (definito dalle proprietà di questa +gerarchia), ed il kernel tratterrà i blocchi in questo ordine prefissato per +ognuno degli oggetti. + +Un esempio di questa gerarchia di oggetti che producono "blocchi annidati" sono +i *block-dev* che rappresentano l'intero disco e quelli che rappresentano una +sua partizione; la partizione è una parte del disco intero, e l'ordine dei +blocchi sarà corretto fintantoche uno acquisisce il blocco del disco intero e +poi quello della partizione. Il validatore non rileva automaticamente questo +ordine implicito, perché queste regole di sincronizzazione non sono statiche. + +Per istruire il validatore riguardo a questo uso corretto dei blocchi sono stati +introdotte nuove primitive per specificare i "livelli di annidamento". Per +esempio, per i blocchi a mutua esclusione dei *block-dev* si avrebbe una +chiamata simile a:: + + enum bdev_bd_mutex_lock_class + { + BD_MUTEX_NORMAL, + BD_MUTEX_WHOLE, + BD_MUTEX_PARTITION + }; + + mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION); + +In questo caso la sincronizzazione viene fatta su un *block-dev* sapendo che si +tratta di una partizione. + +Ai fini della validazione, il validatore lo considererà con una - sotto - classe +di blocco separata. + +Nota: Prestate estrema attenzione che la vostra gerarchia sia corretta quando si +vogliono usare le primitive _nested(); altrimenti potreste avere sia falsi +positivi che falsi negativi. + +Annotazioni +----------- + +Si possono utilizzare due costrutti per verificare ed annotare se certi blocchi +devono essere trattenuti: lockdep_assert_held*(&lock) e +lockdep_*pin_lock(&lock). + +Come suggerito dal nome, la famiglia di macro lockdep_assert_held* asseriscono +che un dato blocco in un dato momento deve essere trattenuto (altrimenti, verrà +generato un WARN()). Queste vengono usate abbondantemente nel kernel, per +esempio in kernel/sched/core.c:: + + void update_rq_clock(struct rq *rq) + { + s64 delta; + + lockdep_assert_held(&rq->lock); + [...] + } + +dove aver trattenuto rq->lock è necessario per aggiornare in sicurezza il clock +rq. + +L'altra famiglia di macro è lockdep_*pin_lock(), che a dire il vero viene usata +solo per rq->lock ATM. Se per caso un blocco non viene trattenuto, queste +genereranno un WARN(). Questo si rivela particolarmente utile quando si deve +verificare la correttezza di codice con *callback*, dove livelli superiori +potrebbero assumere che un blocco rimanga trattenuto, ma livelli inferiori +potrebbero invece pensare che il blocco possa essere rilasciato e poi +riacquisito (involontariamente si apre una sezione critica). lockdep_pin_lock() +restituisce 'struct pin_cookie' che viene usato da lockdep_unpin_lock() per +verificare che nessuno abbia manomesso il blocco. Per esempio in +kernel/sched/sched.h abbiamo:: + + static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf) + { + rf->cookie = lockdep_pin_lock(&rq->lock); + [...] + } + + static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf) + { + [...] + lockdep_unpin_lock(&rq->lock, rf->cookie); + } + +I commenti riguardo alla sincronizzazione possano fornire informazioni utili, +tuttavia sono le verifiche in esecuzione effettuate da queste macro ad essere +vitali per scovare problemi di sincronizzazione, ed inoltre forniscono lo stesso +livello di informazioni quando si ispeziona il codice. Nel dubbio, preferite +queste annotazioni! + +Dimostrazione di correttezza al 100% +------------------------------------ + +Il validatore verifica la proprietà di chiusura in senso matematico. Ovvero, per +ogni sequenza di sincronizzazione di un singolo processo che si verifichi almeno +una volta nel kernel, il validatore dimostrerà con una certezza del 100% che +nessuna combinazione e tempistica di queste sequenze possa causare uno stallo in +una qualsiasi classe di blocco. [1]_ + +In pratica, per dimostrare l'esistenza di uno stallo non servono complessi +scenari di sincronizzazione multi-processore e multi-processo. Il validatore può +dimostrare la correttezza basandosi sulla sola sequenza di sincronizzazione +apparsa almeno una volta (in qualunque momento, in qualunque processo o +contesto). Uno scenario complesso che avrebbe bisogno di 3 processori e una +sfortunata presenza di processi, interruzioni, e pessimo tempismo, può essere +riprodotto su un sistema a singolo processore. + +Questo riduce drasticamente la complessità del controllo di qualità della +sincronizzazione nel kernel: quello che deve essere fatto è di innescare nel +kernel quante più possibili "semplici" sequenze di sincronizzazione, almeno una +volta, allo scopo di dimostrarne la correttezza. Questo al posto di innescare +una verifica per ogni possibile combinazione di sincronizzazione fra processori, +e differenti scenari con hardirq e softirq e annidamenti vari (nella pratica, +impossibile da fare) + +.. [1] + + assumendo che il validatore sia corretto al 100%, e che nessun altra parte + del sistema possa corromperne lo stato. Assumiamo anche che tutti i percorsi + MNI/SMM [potrebbero interrompere anche percorsi dove le interruzioni sono + disabilitate] sono corretti e non interferiscono con il validatore. Inoltre, + assumiamo che un hash a 64-bit sia unico per ogni sequenza di + sincronizzazione nel sistema. Infine, la ricorsione dei blocchi non deve + essere maggiore di 20. + +Prestazione +----------- + +Le regole sopracitate hanno bisogno di una quantità **enorme** di verifiche +durante l'esecuzione. Il sistema sarebbe diventato praticamente inutilizzabile +per la sua lentezza se le avessimo fatte davvero per ogni blocco trattenuto e +per ogni abilitazione delle interruzioni. La complessità della verifica è +O(N^2), quindi avremmo dovuto fare decine di migliaia di verifiche per ogni +evento, il tutto per poche centinaia di classi. + +Il problema è stato risolto facendo una singola verifica per ogni 'scenario di +sincronizzazione' (una sequenza unica di blocchi trattenuti uno dopo l'altro). +Per farlo, viene mantenuta una pila dei blocchi trattenuti, e viene calcolato un +hash a 64-bit unico per ogni sequenza. Quando la sequenza viene verificata per +la prima volta, l'hash viene inserito in una tabella hash. La tabella potrà +essere verificata senza bisogno di blocchi. Se la sequenza dovesse ripetersi, la +tabella ci dirà che non è necessario verificarla nuovamente. + +Risoluzione dei problemi +------------------------ + +Il massimo numero di classi di blocco che il validatore può tracciare è: +MAX_LOCKDEP_KEYS. Oltrepassare questo limite indurrà lokdep a generare il +seguente avviso:: + + (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS)) + +Di base questo valore è 8191, e un classico sistema da ufficio ha meno di 1000 +classi, dunque questo avviso è solitamente la conseguenza di un problema di +perdita delle classi di blocco o d'inizializzazione dei blocchi. Di seguito una +descrizione dei due problemi: + +1. caricare e rimuovere continuamente i moduli mentre il validatore è in + esecuzione porterà ad una perdita di classi di blocco. Il problema è che ogni + caricamento crea un nuovo insieme di classi di blocco per tutti i blocchi di + quel modulo. Tuttavia, la rimozione del modulo non rimuove le vecchie classi + (vedi dopo perché non le riusiamo). Dunque, il continuo caricamento e + rimozione di un modulo non fa altro che aumentare il contatore di classi fino + a raggiungere, eventualmente, il limite. + +2. Usare array con un gran numero di blocchi che non vengono esplicitamente + inizializzati. Per esempio, una tabella hash con 8192 *bucket* dove ognuno ha + il proprio spinlock_t consumerà 8192 classi di blocco a meno che non vengano + esplicitamente inizializzati in esecuzione usando spin_lock_init() invece + dell'inizializzazione durante la compilazione con __SPIN_LOCK_UNLOCKED(). + Sbagliare questa inizializzazione garantisce un esaurimento di classi di + blocco. Viceversa, un ciclo che invoca spin_lock_init() su tutti i blocchi li + mapperebbe tutti alla stessa classe di blocco. + + La morale della favola è che dovete sempre inizializzare esplicitamente i + vostri blocchi. + +Qualcuno potrebbe argomentare che il validatore debba permettere il riuso di +classi di blocco. Tuttavia, se siete tentati dall'argomento, prima revisionate +il codice e pensate alla modifiche necessarie, e tenendo a mente che le classi +di blocco da rimuovere probabilmente sono legate al grafo delle dipendenze. Più +facile a dirsi che a farsi. + +Ovviamente, se non esaurite le classi di blocco, la prossima cosa da fare è +quella di trovare le classi non funzionanti. Per prima cosa, il seguente comando +ritorna il numero di classi attualmente in uso assieme al valore massimo:: + + grep "lock-classes" /proc/lockdep_stats + +Questo comando produce il seguente messaggio:: + + lock-classes: 748 [max: 8191] + +Se il numero di assegnazioni (748 qui sopra) aumenta continuamente nel tempo, +allora c'è probabilmente un problema da qualche parte. Il seguente comando può +essere utilizzato per identificare le classi di blocchi problematiche:: + + grep "BD" /proc/lockdep + +Eseguite il comando e salvatene l'output, quindi confrontatelo con l'output di +un'esecuzione successiva per identificare eventuali problemi. Questo stesso +output può anche aiutarti a trovare situazioni in cui l'inizializzazione del +blocco è stata omessa. + +Lettura ricorsiva dei blocchi +----------------------------- + +Il resto di questo documento vuole dimostrare che certi cicli equivalgono ad una +possibilità di stallo. + +Ci sono tre tipi di bloccatori: gli scrittori (bloccatori esclusivi, come +spin_lock() o write_lock()), lettori non ricorsivi (bloccatori condivisi, come +down_read()), e lettori ricorsivi (bloccatori condivisi ricorsivi, come +rcu_read_lock()). D'ora in poi, per questi tipi di bloccatori, useremo la +seguente notazione: + + W o E: per gli scrittori (bloccatori esclusivi) (W dall'inglese per + *Writer*, ed E per *Exclusive*). + + r: per i lettori non ricorsivi (r dall'inglese per *reader*). + + R: per i lettori ricorsivi (R dall'inglese per *Reader*). + + S: per qualsiasi lettore (non ricorsivi + ricorsivi), dato che entrambe + sono bloccatori condivisi (S dall'inglese per *Shared*). + + N: per gli scrittori ed i lettori non ricorsivi, dato che entrambe sono + non ricorsivi. + +Ovviamente, N equivale a "r o W" ed S a "r o R". + +Come suggerisce il nome, i lettori ricorsivi sono dei bloccatori a cui è +permesso di acquisire la stessa istanza di blocco anche all'interno della +sezione critica di un altro lettore. In altre parole, permette di annidare la +stessa istanza di blocco nelle sezioni critiche dei lettori. + +Dall'altro canto, lo stesso comportamento indurrebbe un lettore non ricorsivo ad +auto infliggersi uno stallo. + +La differenza fra questi due tipi di lettori esiste perché: quelli ricorsivi +vengono bloccati solo dal trattenimento di un blocco di scrittura, mentre quelli +non ricorsivi possono essere bloccati dall'attesa di un blocco di scrittura. +Consideriamo il seguente esempio:: + + TASK A: TASK B: + + read_lock(X); + write_lock(X); + read_lock_2(X); + +L'attività A acquisisce il blocco di lettura X (non importa se di tipo ricorsivo +o meno) usando read_lock(). Quando l'attività B tenterà di acquisire il blocco +X, si fermerà e rimarrà in attesa che venga rilasciato. Ora se read_lock_2() è +un tipo lettore ricorsivo, l'attività A continuerà perché gli scrittori in +attesa non possono bloccare lettori ricorsivi, e non avremo alcuno stallo. +Tuttavia, se read_lock_2() è un lettore non ricorsivo, allora verrà bloccato +dall'attività B e si causerà uno stallo. + +Condizioni bloccanti per lettori/scrittori su uno stesso blocco +--------------------------------------------------------------- +Essenzialmente ci sono quattro condizioni bloccanti: + +1. Uno scrittore blocca un altro scrittore. +2. Un lettore blocca uno scrittore. +3. Uno scrittore blocca sia i lettori ricorsivi che non ricorsivi. +4. Un lettore (ricorsivo o meno) non blocca altri lettori ricorsivi ma potrebbe + bloccare quelli non ricorsivi (perché potrebbero esistere degli scrittori in + attesa). + +Di seguito le tabella delle condizioni bloccanti, Y (*Yes*) significa che il +tipo in riga blocca quello in colonna, mentre N l'opposto. + + +---+---+---+---+ + | | W | r | R | + +---+---+---+---+ + | W | Y | Y | Y | + +---+---+---+---+ + | r | Y | Y | N | + +---+---+---+---+ + | R | Y | Y | N | + +---+---+---+---+ + + (W: scrittori, r: lettori non ricorsivi, R: lettori ricorsivi) + +Al contrario dei blocchi per lettori non ricorsivi, quelli ricorsivi vengono +trattenuti da chi trattiene il blocco di scrittura piuttosto che da chi ne +attende il rilascio. Per esempio:: + + TASK A: TASK B: + + read_lock(X); + + write_lock(X); + + read_lock(X); + +non produce uno stallo per i lettori ricorsivi, in quanto il processo B rimane +in attesta del blocco X, mentre il secondo read_lock() non ha bisogno di +aspettare perché si tratta di un lettore ricorsivo. Tuttavia, se read_lock() +fosse un lettore non ricorsivo, questo codice produrrebbe uno stallo. + +Da notare che in funzione dell'operazione di blocco usate per l'acquisizione (in +particolare il valore del parametro 'read' in lock_acquire()), un blocco può +essere di scrittura (blocco esclusivo), di lettura non ricorsivo (blocco +condiviso e non ricorsivo), o di lettura ricorsivo (blocco condiviso e +ricorsivo). In altre parole, per un'istanza di blocco esistono tre tipi di +acquisizione che dipendono dalla funzione di acquisizione usata: esclusiva, di +lettura non ricorsiva, e di lettura ricorsiva. + +In breve, chiamiamo "non ricorsivi" blocchi di scrittura e quelli di lettura non +ricorsiva, mentre "ricorsivi" i blocchi di lettura ricorsivi. + +I blocchi ricorsivi non si bloccano a vicenda, mentre quelli non ricorsivi sì +(anche in lettura). Un blocco di lettura non ricorsivi può bloccare uno +ricorsivo, e viceversa. + +Il seguente esempio mostra uno stallo con blocchi ricorsivi:: + + TASK A: TASK B: + + read_lock(X); + read_lock(Y); + write_lock(Y); + write_lock(X); + +Il processo A attende che il processo B esegua read_unlock() so Y, mentre il +processo B attende che A esegua read_unlock() su X. + +Tipi di dipendenze e percorsi forti +----------------------------------- +Le dipendenze fra blocchi tracciano l'ordine con cui una coppia di blocchi viene +acquisita, e perché vi sono 3 tipi di bloccatori, allora avremo 9 tipi di +dipendenze. Tuttavia, vi mostreremo che 4 sono sufficienti per individuare gli +stalli. + +Per ogni dipendenza fra blocchi avremo:: + + L1 -> L2 + +Questo significa che lockdep ha visto acquisire L1 prima di L2 nello stesso +contesto di esecuzione. Per quanto riguarda l'individuazione degli stalli, ci +interessa sapere se possiamo rimanere bloccati da L2 mentre L1 viene trattenuto. +In altre parole, vogliamo sapere se esiste un bloccatore L3 che viene bloccato +da L1 e un L2 che viene bloccato da L3. Dunque, siamo interessati a (1) quello +che L1 blocca e (2) quello che blocca L2. Di conseguenza, possiamo combinare +lettori ricorsivi e non per L1 (perché bloccano gli stessi tipi) e possiamo +combinare scrittori e lettori non ricorsivi per L2 (perché vengono bloccati +dagli stessi tipi). + +Con questa semplificazione, possiamo dedurre che ci sono 4 tipi di rami nel +grafo delle dipendenze di lockdep: + +1) -(ER)->: + dipendenza da scrittore esclusivo a lettore ricorsivo. "X -(ER)-> Y" + significa X -> Y, dove X è uno scrittore e Y un lettore ricorsivo. + +2) -(EN)->: + dipendenza da scrittore esclusivo a bloccatore non ricorsivo. + "X -(EN)->" significa X-> Y, dove X è uno scrittore e Y può essere + o uno scrittore o un lettore non ricorsivo. + +3) -(SR)->: + dipendenza da lettore condiviso a lettore ricorsivo. "X -(SR)->" + significa X -> Y, dove X è un lettore (ricorsivo o meno) e Y è un + lettore ricorsivo. + +4) -(SN)->: + dipendenza da lettore condiviso a bloccatore non ricorsivo. + "X -(SN)-> Y" significa X -> Y , dove X è un lettore (ricorsivo + o meno) e Y può essere o uno scrittore o un lettore non ricorsivo. + +Da notare che presi due blocchi, questi potrebbero avere più dipendenza fra di +loro. Per esempio:: + + TASK A: + + read_lock(X); + write_lock(Y); + ... + + TASK B: + + write_lock(X); + write_lock(Y); + +Nel grafo delle dipendenze avremo sia X -(SN)-> Y che X -(EN)-> Y. + +Usiamo -(xN)-> per rappresentare i rami sia per -(EN)-> che -(SN)->, allo stesso +modo -(Ex)->, -(xR)-> e -(Sx)-> + +Un "percorso" in un grafo è una serie di nodi e degli archi che li congiungono. +Definiamo un percorso "forte", come il percorso che non ha archi (dipendenze) di +tipo -(xR)-> e -(Sx)->. In altre parole, un percorso "forte" è un percorso da un +blocco ad un altro attraverso le varie dipendenze, e se sul percorso abbiamo X +-> Y -> Z (dove X, Y, e Z sono blocchi), e da X a Y si ha una dipendenza -(SR)-> +o -(ER)->, allora fra Y e Z non deve esserci una dipendenza -(SN)-> o -(SR)->. + +Nella prossima sezione vedremo perché definiamo questo percorso "forte". + +Identificazione di stalli da lettura ricorsiva +---------------------------------------------- +Ora vogliamo dimostrare altre due cose: + +Lemma 1: + +Se esiste un percorso chiuso forte (ciclo forte), allora esiste anche una +combinazione di sequenze di blocchi che causa uno stallo. In altre parole, +l'esistenza di un ciclo forte è sufficiente alla scoperta di uno stallo. + +Lemma 2: + +Se non esiste un percorso chiuso forte (ciclo forte), allora non esiste una +combinazione di sequenze di blocchi che causino uno stallo. In altre parole, i +cicli forti sono necessari alla rilevazione degli stallo. + +Con questi due lemmi possiamo facilmente affermare che un percorso chiuso forte +è sia sufficiente che necessario per avere gli stalli, dunque averli equivale +alla possibilità di imbattersi concretamente in uno stallo. Un percorso chiuso +forte significa che può causare stalli, per questo lo definiamo "forte", ma ci +sono anche cicli di dipendenze che non causeranno stalli. + +Dimostrazione di sufficienza (lemma 1): + +Immaginiamo d'avere un ciclo forte:: + + L1 -> L2 ... -> Ln -> L1 + +Questo significa che abbiamo le seguenti dipendenze:: + + L1 -> L2 + L2 -> L3 + ... + Ln-1 -> Ln + Ln -> L1 + +Ora possiamo costruire una combinazione di sequenze di blocchi che causano lo +stallo. + +Per prima cosa facciamo sì che un processo/processore prenda L1 in L1 -> L2, poi +un altro prende L2 in L2 -> L3, e così via. Alla fine, tutti i Lx in Lx -> Lx+1 +saranno trattenuti da processi/processori diversi. + +Poi visto che abbiamo L1 -> L2, chi trattiene L1 vorrà acquisire L2 in L1 -> L2, +ma prima dovrà attendere che venga rilasciato da chi lo trattiene. Questo perché +L2 è già trattenuto da un altro processo/processore, ed in più L1 -> L2 e L2 -> +L3 non sono -(xR)-> né -(Sx)-> (la definizione di forte). Questo significa che L2 +in L1 -> L2 non è un bloccatore non ricorsivo (bloccabile da chiunque), e L2 in +L2 -> L3 non è uno scrittore (che blocca chiunque). + +In aggiunta, possiamo trarre una simile conclusione per chi sta trattenendo L2: +deve aspettare che L3 venga rilasciato, e così via. Ora possiamo dimostrare che +chi trattiene Lx deve aspettare che Lx+1 venga rilasciato. Notiamo che Ln+1 è +L1, dunque si è creato un ciclo dal quale non possiamo uscire, quindi si ha uno +stallo. + +Dimostrazione della necessità (lemma 2): + +Questo lemma equivale a dire che: se siamo in uno scenario di stallo, allora +deve esiste un ciclo forte nel grafo delle dipendenze. + +Secondo Wikipedia[1], se c'è uno stallo, allora deve esserci un ciclo di attese, +ovvero ci sono N processi/processori dove P1 aspetta un blocco trattenuto da P2, +e P2 ne aspetta uno trattenuto da P3, ... e Pn attende che il blocco P1 venga +rilasciato. Chiamiamo Lx il blocco che attende Px, quindi P1 aspetta L1 e +trattiene Ln. Quindi avremo Ln -> L1 nel grafo delle dipendenze. Similarmente, +nel grafo delle dipendenze avremo L1 -> L2, L2 -> L3, ..., Ln-1 -> Ln, il che +significa che abbiamo un ciclo:: + + Ln -> L1 -> L2 -> ... -> Ln + +, ed ora dimostriamo d'avere un ciclo forte. + +Per un blocco Lx, il processo Px contribuisce alla dipendenza Lx-1 -> Lx e Px+1 +contribuisce a quella Lx -> Lx+1. Visto che Px aspetta che Px+1 rilasci Lx, sarà +impossibile che Lx in Px+1 sia un lettore e che Lx in Px sia un lettore +ricorsivo. Questo perché i lettori (ricorsivi o meno) non bloccano lettori +ricorsivi. Dunque, Lx-1 -> Lx e Lx -> Lx+1 non possono essere una coppia di +-(xR)-> -(Sx)->. Questo è vero per ogni ciclo, dunque, questo è un ciclo forte. + +Riferimenti +----------- + +[1]: https://it.wikipedia.org/wiki/Stallo_(informatica) + +[2]: Shibu, K. (2009). Intro To Embedded Systems (1st ed.). Tata McGraw-Hill diff --git a/Documentation/translations/it_IT/locking/lockstat.rst b/Documentation/translations/it_IT/locking/lockstat.rst new file mode 100644 index 000000000000..77972d971d7c --- /dev/null +++ b/Documentation/translations/it_IT/locking/lockstat.rst @@ -0,0 +1,230 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-ita.rst + +======================= +Statistiche sui blocchi +======================= + +Cosa +==== + +Come suggerisce il nome, fornisce statistiche sui blocchi. + + +Perché +====== + +Perché, tanto per fare un esempio, le contese sui blocchi possono influenzare +significativamente le prestazioni. + +Come +==== + +*Lockdep* ha punti di collegamento nelle funzioni di blocco e inoltre +mappa le istanze di blocco con le relative classi. Partiamo da questo punto +(vedere Documentation/translations/it_IT/locking/lockdep-design.rst). +Il grafico sottostante mostra la relazione che intercorre fra le +funzioni di blocco e i vari punti di collegamenti che ci sono al loro +interno:: + + __acquire + | + lock _____ + | \ + | __contended + | | + | <wait> + | _______/ + |/ + | + __acquired + | + . + <hold> + . + | + __release + | + unlock + + lock, unlock - le classiche funzioni di blocco + __* - i punti di collegamento + <> - stati + +Grazie a questi punti di collegamento possiamo fornire le seguenti statistiche: + +con-bounces + - numero di contese su un blocco che riguarda dati di un processore + +contentions + - numero di acquisizioni di blocchi che hanno dovuto attendere + +wait time + min + - tempo minimo (diverso da zero) che sia mai stato speso in attesa di + un blocco + + max + - tempo massimo che sia mai stato speso in attesa di un blocco + + total + - tempo totale speso in attesa di un blocco + + avg + - tempo medio speso in attesa di un blocco + +acq-bounces + - numero di acquisizioni di blocco che riguardavano i dati su un processore + +acquisitions + - numero di volte che un blocco è stato ottenuto + +hold time + min + - tempo minimo (diverso da zero) che sia mai stato speso trattenendo un blocco + + max + - tempo massimo che sia mai stato speso trattenendo un blocco + + total + - tempo totale di trattenimento di un blocco + + avg + - tempo medio di trattenimento di un blocco + +Questi numeri vengono raccolti per classe di blocco, e per ogni stato di +lettura/scrittura (quando applicabile). + +Inoltre, questa raccolta di statistiche tiene traccia di 4 punti di contesa +per classe di blocco. Un punto di contesa è una chiamata che ha dovuto +aspettare l'acquisizione di un blocco. + +Configurazione +-------------- + +Le statistiche sui blocchi si abilitano usando l'opzione di configurazione +CONFIG_LOCK_STAT. + +Uso +--- + +Abilitare la raccolta di statistiche:: + + # echo 1 >/proc/sys/kernel/lock_stat + +Disabilitare la raccolta di statistiche:: + + # echo 0 >/proc/sys/kernel/lock_stat + +Per vedere le statistiche correnti sui blocchi:: + + ( i numeri di riga non fanno parte dell'output del comando, ma sono stati + aggiunti ai fini di questa spiegazione ) + + # less /proc/lock_stat + + 01 lock_stat version 0.4 + 02----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + 03 class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg + 04----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + 05 + 06 &mm->mmap_sem-W: 46 84 0.26 939.10 16371.53 194.90 47291 2922365 0.16 2220301.69 17464026916.32 5975.99 + 07 &mm->mmap_sem-R: 37 100 1.31 299502.61 325629.52 3256.30 212344 34316685 0.10 7744.91 95016910.20 2.77 + 08 --------------- + 09 &mm->mmap_sem 1 [<ffffffff811502a7>] khugepaged_scan_mm_slot+0x57/0x280 + 10 &mm->mmap_sem 96 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510 + 11 &mm->mmap_sem 34 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0 + 12 &mm->mmap_sem 17 [<ffffffff81127e71>] vm_munmap+0x41/0x80 + 13 --------------- + 14 &mm->mmap_sem 1 [<ffffffff81046fda>] dup_mmap+0x2a/0x3f0 + 15 &mm->mmap_sem 60 [<ffffffff81129e29>] SyS_mprotect+0xe9/0x250 + 16 &mm->mmap_sem 41 [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510 + 17 &mm->mmap_sem 68 [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0 + 18 + 19............................................................................................................................................................................................................................. + 20 + 21 unix_table_lock: 110 112 0.21 49.24 163.91 1.46 21094 66312 0.12 624.42 31589.81 0.48 + 22 --------------- + 23 unix_table_lock 45 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0 + 24 unix_table_lock 47 [<ffffffff8150b111>] unix_release_sock+0x31/0x250 + 25 unix_table_lock 15 [<ffffffff8150ca37>] unix_find_other+0x117/0x230 + 26 unix_table_lock 5 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0 + 27 --------------- + 28 unix_table_lock 39 [<ffffffff8150b111>] unix_release_sock+0x31/0x250 + 29 unix_table_lock 49 [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0 + 30 unix_table_lock 20 [<ffffffff8150ca37>] unix_find_other+0x117/0x230 + 31 unix_table_lock 4 [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0 + +Questo estratto mostra le statistiche delle prime due classi di +blocco. La riga 01 mostra la versione dell'output - la versione +cambierà ogni volta che cambia il formato. Le righe dalla 02 alla 04 +rappresentano l'intestazione con la descrizione delle colonne. Le +statistiche sono mostrate nelle righe dalla 05 alla 18 e dalla 20 +alla 31. Queste statistiche sono divise in due parti: le statistiche, +seguite dai punti di contesa (righe 08 e 13) separati da un divisore. + +Le righe dalla 09 alla 12 mostrano i primi quattro punti di contesa +registrati (il codice che tenta di acquisire un blocco) e le righe +dalla 14 alla 17 mostrano i primi quattro punti contesi registrati +(ovvero codice che ha acquisito un blocco). È possibile che nelle +statistiche manchi il valore *max con-bounces*. + +Il primo blocco (righe dalla 05 alla 18) è di tipo lettura/scrittura e quindi +mostra due righe prima del divisore. I punti di contesa non corrispondono alla +descrizione delle colonne nell'intestazione; essi hanno due colonne: *punti di +contesa* e *[<IP>] simboli*. Il secondo gruppo di punti di contesa sono i punti +con cui si contende il blocco. + +La parte interna del tempo è espressa in us (microsecondi). + +Quando si ha a che fare con blocchi annidati si potrebbero vedere le +sottoclassi di blocco:: + + 32........................................................................................................................................................................................................................... + 33 + 34 &rq->lock: 13128 13128 0.43 190.53 103881.26 7.91 97454 3453404 0.00 401.11 13224683.11 3.82 + 35 --------- + 36 &rq->lock 645 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75 + 37 &rq->lock 297 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a + 38 &rq->lock 360 [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a + 39 &rq->lock 428 [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb + 40 --------- + 41 &rq->lock 77 [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75 + 42 &rq->lock 174 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a + 43 &rq->lock 4715 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54 + 44 &rq->lock 893 [<ffffffff81340524>] schedule+0x157/0x7b8 + 45 + 46........................................................................................................................................................................................................................... + 47 + 48 &rq->lock/1: 1526 11488 0.33 388.73 136294.31 11.86 21461 38404 0.00 37.93 109388.53 2.84 + 49 ----------- + 50 &rq->lock/1 11526 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54 + 51 ----------- + 52 &rq->lock/1 5645 [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54 + 53 &rq->lock/1 1224 [<ffffffff81340524>] schedule+0x157/0x7b8 + 54 &rq->lock/1 4336 [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54 + 55 &rq->lock/1 181 [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a + +La riga 48 mostra le statistiche per la seconda sottoclasse (/1) della +classe *&irq->lock* (le sottoclassi partono da 0); in questo caso, +come suggerito dalla riga 50, ``double_rq_lock`` tenta di acquisire un blocco +annidato di due spinlock. + +Per vedere i blocco più contesi:: + + # grep : /proc/lock_stat | head + clockevents_lock: 2926159 2947636 0.15 46882.81 1784540466.34 605.41 3381345 3879161 0.00 2260.97 53178395.68 13.71 + tick_broadcast_lock: 346460 346717 0.18 2257.43 39364622.71 113.54 3642919 4242696 0.00 2263.79 49173646.60 11.59 + &mapping->i_mmap_mutex: 203896 203899 3.36 645530.05 31767507988.39 155800.21 3361776 8893984 0.17 2254.15 14110121.02 1.59 + &rq->lock: 135014 136909 0.18 606.09 842160.68 6.15 1540728 10436146 0.00 728.72 17606683.41 1.69 + &(&zone->lru_lock)->rlock: 93000 94934 0.16 59.18 188253.78 1.98 1199912 3809894 0.15 391.40 3559518.81 0.93 + tasklist_lock-W: 40667 41130 0.23 1189.42 428980.51 10.43 270278 510106 0.16 653.51 3939674.91 7.72 + tasklist_lock-R: 21298 21305 0.20 1310.05 215511.12 10.12 186204 241258 0.14 1162.33 1179779.23 4.89 + rcu_node_1: 47656 49022 0.16 635.41 193616.41 3.95 844888 1865423 0.00 764.26 1656226.96 0.89 + &(&dentry->d_lockref.lock)->rlock: 39791 40179 0.15 1302.08 88851.96 2.21 2790851 12527025 0.10 1910.75 3379714.27 0.27 + rcu_node_0: 29203 30064 0.16 786.55 1555573.00 51.74 88963 244254 0.00 398.87 428872.51 1.76 + +Per cancellare le statistiche:: + + # echo 0 > /proc/lock_stat diff --git a/Documentation/translations/it_IT/locking/locktorture.rst b/Documentation/translations/it_IT/locking/locktorture.rst new file mode 100644 index 000000000000..87a0dbeaca77 --- /dev/null +++ b/Documentation/translations/it_IT/locking/locktorture.rst @@ -0,0 +1,181 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-ita.rst + +============================================ +Funzionamento del test *Kernel Lock Torture* +============================================ + +CONFIG_LOCK_TORTURE_TEST +======================== + +L'opzione di configurazione CONFIG_LOCK_TORTURE_TEST fornisce un +modulo kernel che esegue delle verifiche che *torturano* le primitive di +sincronizzazione del kernel. Se dovesse servire, il modulo kernel, +'locktorture', può essere generato successivamente su un kernel che +volete verificare. Periodicamente le verifiche stampano messaggi tramite +``printk()`` e che quindi possono essere letti tramite ``dmesg`` (magari +filtrate l'output con ``grep "torture"``). La verifica inizia quando +il modulo viene caricato e termina quando viene rimosso. Questo +programma si basa sulle modalità di verifica di RCU tramite rcutorture. + +Questa verifica consiste nella creazione di un certo numero di thread +del kernel che acquisiscono un blocco e lo trattengono per una certa +quantità di tempo così da simulare diversi comportamenti nelle sezioni +critiche. La quantità di contese su un blocco può essere simulata +allargando la sezione critica e/o creando più thread. + + +Parametri del modulo +==================== + +Questo modulo ha i seguenti parametri: + + +Specifici di locktorture +------------------------ + +nwriters_stress + Numero di thread del kernel che stresseranno l'acquisizione + esclusiva dei blocchi (scrittori). Il valore di base è il + doppio del numero di processori attivi presenti. + +nreaders_stress + Numero di thread del kernel che stresseranno l'acquisizione + condivisa dei blocchi (lettori). Il valore di base è lo stesso + di nwriters_stress. Se l'utente non ha specificato + nwriters_stress, allora entrambe i valori corrisponderanno + al numero di processori attivi presenti. + +torture_type + Tipo di blocco da verificare. Di base, solo gli spinlock + verranno verificati. Questo modulo può verificare anche + i seguenti tipi di blocchi: + + - "lock_busted": + Simula un'incorretta implementazione del + blocco. + + - "spin_lock": + coppie di spin_lock() e spin_unlock(). + + - "spin_lock_irq": + coppie di spin_lock_irq() e spin_unlock_irq(). + + - "rw_lock": + coppie di rwlock read/write lock() e unlock(). + + - "rw_lock_irq": + copie di rwlock read/write lock_irq() e + unlock_irq(). + + - "mutex_lock": + coppie di mutex_lock() e mutex_unlock(). + + - "rtmutex_lock": + coppie di rtmutex_lock() e rtmutex_unlock(). + Il kernel deve avere CONFIG_RT_MUTEXES=y. + + - "rwsem_lock": + coppie di semafori read/write down() e up(). + + +Generici dell'ambiente di sviluppo 'torture' (RCU + locking) +------------------------------------------------------------ + +shutdown_secs + Numero di secondi prima che la verifica termini e il sistema + venga spento. Il valore di base è zero, il che disabilita + la possibilità di terminare e spegnere. Questa funzionalità + può essere utile per verifiche automatizzate. + +onoff_interval + Numero di secondi fra ogni tentativo di esecuzione di + un'operazione casuale di CPU-hotplug. Di base è zero, il + che disabilita la funzionalità di CPU-hotplug. Nei kernel + con CONFIG_HOTPLUG_CPU=n, locktorture si rifiuterà, senza + dirlo, di effettuare una qualsiasi operazione di + CPU-hotplug indipendentemente dal valore specificato in + onoff_interval. + +onoff_holdoff + Numero di secondi da aspettare prima di iniziare le + operazioni di CPU-hotplug. Normalmente questo verrebbe + usato solamente quando locktorture è compilato come parte + integrante del kernel ed eseguito automaticamente all'avvio, + in questo caso è utile perché permette di non confondere + l'avvio con i processori che vanno e vengono. Questo + parametro è utile sono se CONFIG_HOTPLUG_CPU è abilitato. + +stat_interval + Numero di secondi fra una stampa (printk()) delle + statistiche e l'altra. Di base, locktorture riporta le + statistiche ogni 60 secondi. Impostando l'intervallo a 0 + ha l'effetto di stampare le statistiche -solo- quando il + modulo viene rimosso. + +stutter + Durata della verifica prima di effettuare una pausa di + eguale durata. Di base "stutter=5", quindi si eseguono + verifiche e pause di (circa) cinque secondi. + L'impostazione di "stutter=0" fa si che la verifica + venga eseguita continuamente senza fermarsi. + +shuffle_interval + Il numero di secondi per cui un thread debba mantenere + l'affinità con un sottoinsieme di processori, di base è + 3 secondi. Viene usato assieme a test_no_idle_hz. + +verbose + Abilita le stampe di debug, via printk(). Di base è + abilitato. Queste informazioni aggiuntive sono per la + maggior parte relative ad errori di alto livello e resoconti + da parte dell'struttura 'torture'. + + +Statistiche +=========== + +Le statistiche vengono stampate secondo il seguente formato:: + + spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0 + (A) (B) (C) (D) (E) + + (A): tipo di lock sotto verifica -- parametro torture_type. + + (B): Numero di acquisizione del blocco in scrittura. Se si ha a che fare + con una primitiva di lettura/scrittura apparirà di seguito anche una + seconda voce "Reads" + + (C): Numero di volte che il blocco è stato acquisito + + (D): Numero minimo e massimo di volte che un thread ha fallito + nell'acquisire il blocco + + (E): valori true/false nel caso di errori durante l'acquisizione del blocco. + Questo dovrebbe dare un riscontro positivo -solo- se c'è un baco + nell'implementazione delle primitive di sincronizzazione. Altrimenti un + blocco non dovrebbe mai fallire (per esempio, spin_lock()). + Ovviamente lo stesso si applica per (C). Un semplice esempio è il tipo + "lock_busted". + +Uso +=== + +Il seguente script può essere utilizzato per verificare i blocchi:: + + #!/bin/sh + + modprobe locktorture + sleep 3600 + rmmod locktorture + dmesg | grep torture: + +L'output può essere manualmente ispezionato cercando il marcatore d'errore +"!!!". Ovviamente potreste voler creare degli script più elaborati che +verificano automaticamente la presenza di errori. Il comando "rmmod" forza la +stampa (usando printk()) di "SUCCESS", "FAILURE", oppure "RCU_HOTPLUG". I primi +due si piegano da soli, mentre l'ultimo indica che non stati trovati problemi di +sincronizzazione, tuttavia ne sono stati trovati in CPU-hotplug. + +Consultate anche: Documentation/translations/it_IT/RCU/torture.rst diff --git a/Documentation/translations/it_IT/locking/locktypes.rst b/Documentation/translations/it_IT/locking/locktypes.rst new file mode 100644 index 000000000000..1c7056283b9d --- /dev/null +++ b/Documentation/translations/it_IT/locking/locktypes.rst @@ -0,0 +1,547 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-ita.rst + +.. _it_kernel_hacking_locktypes: + +======================================== +Tipologie di blocco e le loro istruzioni +======================================== + +Introduzione +============ + +Il kernel fornisce un certo numero di primitive di blocco che possiamo dividere +in tre categorie: + + - blocchi ad attesa con sospensione + - blocchi locali per CPU + - blocchi ad attesa attiva + +Questo documento descrive questi tre tipi e fornisce istruzioni su come +annidarli, ed usarli su kernel PREEMPT_RT. + +Categorie di blocchi +==================== + +Blocchi ad attesa con sospensione +--------------------------------- + +I blocchi ad attesa con sospensione possono essere acquisiti solo in un contesti +dov'è possibile la prelazione. + +Diverse implementazioni permettono di usare try_lock() anche in altri contesti, +nonostante ciò è bene considerare anche la sicurezza dei corrispondenti +unlock(). Inoltre, vanno prese in considerazione anche le varianti di *debug* +di queste primitive. Insomma, non usate i blocchi ad attesa con sospensioni in +altri contesti a meno che proprio non vi siano alternative. + +In questa categoria troviamo: + + - mutex + - rt_mutex + - semaphore + - rw_semaphore + - ww_mutex + - percpu_rw_semaphore + +Nei kernel con PREEMPT_RT, i seguenti blocchi sono convertiti in blocchi ad +attesa con sospensione: + + - local_lock + - spinlock_t + - rwlock_t + +Blocchi locali per CPU +---------------------- + + - local_lock + +Su kernel non-PREEMPT_RT, le funzioni local_lock gestiscono le primitive di +disabilitazione di prelazione ed interruzioni. Al contrario di altri meccanismi, +la disabilitazione della prelazione o delle interruzioni sono puri meccanismi +per il controllo della concorrenza su una CPU e quindi non sono adatti per la +gestione della concorrenza inter-CPU. + +Blocchi ad attesa attiva +------------------------ + + - raw_spinlcok_t + - bit spinlocks + + Nei kernel non-PREEMPT_RT, i seguenti blocchi sono ad attesa attiva: + + - spinlock_t + - rwlock_t + +Implicitamente, i blocchi ad attesa attiva disabilitano la prelazione e le +funzioni lock/unlock hanno anche dei suffissi per gestire il livello di +protezione: + + =================== ========================================================================= + _bh() disabilita / abilita *bottom halves* (interruzioni software) + _irq() disabilita / abilita le interruzioni + _irqsave/restore() salva e disabilita le interruzioni / ripristina ed attiva le interruzioni + =================== ========================================================================= + +Semantica del proprietario +========================== + +Eccetto i semafori, i sopracitati tipi di blocchi hanno tutti una semantica +molto stringente riguardo al proprietario di un blocco: + + Il contesto (attività) che ha acquisito il blocco deve rilasciarlo + +I semafori rw_semaphores hanno un'interfaccia speciale che permette anche ai non +proprietari del blocco di rilasciarlo per i lettori. + +rtmutex +======= + +I blocchi a mutua esclusione RT (*rtmutex*) sono un sistema a mutua esclusione +con supporto all'ereditarietà della priorità (PI). + +Questo meccanismo ha delle limitazioni sui kernel non-PREEMPT_RT dovuti alla +prelazione e alle sezioni con interruzioni disabilitate. + +Chiaramente, questo meccanismo non può avvalersi della prelazione su una sezione +dove la prelazione o le interruzioni sono disabilitate; anche sui kernel +PREEMPT_RT. Tuttavia, i kernel PREEMPT_RT eseguono la maggior parte delle +sezioni in contesti dov'è possibile la prelazione, specialmente in contesti +d'interruzione (anche software). Questa conversione permette a spinlock_t e +rwlock_t di essere implementati usando rtmutex. + +semaphore +========= + +La primitiva semaphore implementa un semaforo con contatore. + +I semafori vengono spesso utilizzati per la serializzazione e l'attesa, ma per +nuovi casi d'uso si dovrebbero usare meccanismi diversi, come mutex e +completion. + +semaphore e PREEMPT_RT +---------------------- + +I kernel PREEMPT_RT non cambiano l'implementazione di semaphore perché non hanno +un concetto di proprietario, dunque impediscono a PREEMPT_RT d'avere +l'ereditarietà della priorità sui semafori. Un proprietario sconosciuto non può +ottenere una priorità superiore. Di consequenza, bloccarsi sui semafori porta +all'inversione di priorità. + + +rw_semaphore +============ + +Il blocco rw_semaphore è un meccanismo che permette più lettori ma un solo scrittore. + +Sui kernel non-PREEMPT_RT l'implementazione è imparziale, quindi previene +l'inedia dei processi scrittori. + +Questi blocchi hanno una semantica molto stringente riguardo il proprietario, ma +offre anche interfacce speciali che permettono ai processi non proprietari di +rilasciare un processo lettore. Queste interfacce funzionano indipendentemente +dalla configurazione del kernel. + +rw_semaphore e PREEMPT_RT +------------------------- + +I kernel PREEMPT_RT sostituiscono i rw_semaphore con un'implementazione basata +su rt_mutex, e questo ne modifica l'imparzialità: + + Dato che uno scrittore rw_semaphore non può assicurare la propria priorità ai + suoi lettori, un lettore con priorità più bassa che ha subito la prelazione + continuerà a trattenere il blocco, quindi porta all'inedia anche gli scrittori + con priorità più alta. Per contro, dato che i lettori possono garantire la + propria priorità agli scrittori, uno scrittore a bassa priorità che subisce la + prelazione vedrà la propria priorità alzata finché non rilascerà il blocco, e + questo preverrà l'inedia dei processi lettori a causa di uno scrittore. + + +local_lock +========== + +I local_lock forniscono nomi agli ambiti di visibilità delle sezioni critiche +protette tramite la disattivazione della prelazione o delle interruzioni. + +Sui kernel non-PREEMPT_RT le operazioni local_lock si traducono +nell'abilitazione o disabilitazione della prelazione o le interruzioni. + + =============================== ====================== + local_lock(&llock) preempt_disable() + local_unlock(&llock) preempt_enable() + local_lock_irq(&llock) local_irq_disable() + local_unlock_irq(&llock) local_irq_enable() + local_lock_irqsave(&llock) local_irq_save() + local_unlock_irqrestore(&llock) local_irq_restore() + =============================== ====================== + +Gli ambiti di visibilità con nome hanno due vantaggi rispetto alle primitive di +base: + + - Il nome del blocco permette di fare un'analisi statica, ed è anche chiaro su + cosa si applichi la protezione cosa che invece non si può fare con le + classiche primitive in quanto sono opache e senza alcun ambito di + visibilità. + + - Se viene abilitato lockdep, allora local_lock ottiene un lockmap che + permette di verificare la bontà della protezione. Per esempio, questo può + identificare i casi dove una funzione usa preempt_disable() come meccanismo + di protezione in un contesto d'interruzione (anche software). A parte + questo, lockdep_assert_held(&llock) funziona come tutte le altre primitive + di sincronizzazione. + +local_lock e PREEMPT_RT +------------------------- + +I kernel PREEMPT_RT sostituiscono local_lock con uno spinlock_t per CPU, quindi +ne cambia la semantica: + + - Tutte le modifiche a spinlock_t si applicano anche a local_lock + +L'uso di local_lock +------------------- + +I local_lock dovrebbero essere usati su kernel non-PREEMPT_RT quando la +disabilitazione della prelazione o delle interruzioni è il modo più adeguato per +gestire l'accesso concorrente a strutture dati per CPU. + +Questo meccanismo non è adatto alla protezione da prelazione o interruzione su +kernel PREEMPT_RT dato che verrà convertito in spinlock_t. + + +raw_spinlock_t e spinlock_t +=========================== + +raw_spinlock_t +-------------- + +I blocco raw_spinlock_t è un blocco ad attesa attiva su tutti i tipi di kernel, +incluso quello PREEMPT_RT. Usate raw_spinlock_t solo in sezioni critiche nel +cuore del codice, nella gestione delle interruzioni di basso livello, e in posti +dove è necessario disabilitare la prelazione o le interruzioni. Per esempio, per +accedere in modo sicuro lo stato dell'hardware. A volte, i raw_spinlock_t +possono essere usati quando la sezione critica è minuscola, per evitare gli +eccessi di un rtmutex. + +spinlock_t +---------- + +Il significato di spinlock_t cambia in base allo stato di PREEMPT_RT. + +Sui kernel non-PREEMPT_RT, spinlock_t si traduce in un raw_spinlock_t ed ha +esattamente lo stesso significato. + +spinlock_t e PREEMPT_RT +----------------------- + +Sui kernel PREEMPT_RT, spinlock_t ha un'implementazione dedicata che si basa +sull'uso di rt_mutex. Questo ne modifica il significato: + + - La prelazione non viene disabilitata. + + - I suffissi relativi alla interruzioni (_irq, _irqsave / _irqrestore) per le + operazioni spin_lock / spin_unlock non hanno alcun effetto sullo stato delle + interruzioni della CPU. + + - I suffissi relativi alle interruzioni software (_bh()) disabilitano i + relativi gestori d'interruzione. + + I kernel non-PREEMPT_RT disabilitano la prelazione per ottenere lo stesso effetto. + + I kernel PREEMPT_RT usano un blocco per CPU per la serializzazione, il che + permette di tenere attiva la prelazione. Il blocco disabilita i gestori + d'interruzione software e previene la rientranza vista la prelazione attiva. + +A parte quanto appena discusso, i kernel PREEMPT_RT preservano il significato +di tutti gli altri aspetti di spinlock_t: + + - Le attività che trattengono un blocco spinlock_t non migrano su altri + processori. Disabilitando la prelazione, i kernel non-PREEMPT_RT evitano la + migrazione. Invece, i kernel PREEMPT_RT disabilitano la migrazione per + assicurarsi che i puntatori a variabili per CPU rimangano validi anche + quando un'attività subisce la prelazione. + + - Lo stato di un'attività si mantiene durante le acquisizioni del blocco al + fine di garantire che le regole basate sullo stato delle attività si possano + applicare a tutte le configurazioni del kernel. I kernel non-PREEMPT_RT + lasciano lo stato immutato. Tuttavia, la funzionalità PREEMPT_RT deve + cambiare lo stato se l'attività si blocca durante l'acquisizione. Dunque, + salva lo stato attuale prima di bloccarsi ed il rispettivo risveglio lo + ripristinerà come nell'esempio seguente:: + + task->state = TASK_INTERRUPTIBLE + lock() + block() + task->saved_state = task->state + task->state = TASK_UNINTERRUPTIBLE + schedule() + lock wakeup + task->state = task->saved_state + + Altri tipi di risvegli avrebbero impostato direttamente lo stato a RUNNING, + ma in questo caso non avrebbe funzionato perché l'attività deve rimanere + bloccata fintanto che il blocco viene trattenuto. Quindi, lo stato salvato + viene messo a RUNNING quando il risveglio di un non-blocco cerca di + risvegliare un'attività bloccata in attesa del rilascio di uno spinlock. Poi, + quando viene completata l'acquisizione del blocco, il suo risveglio + ripristinerà lo stato salvato, in questo caso a RUNNING:: + + task->state = TASK_INTERRUPTIBLE + lock() + block() + task->saved_state = task->state + task->state = TASK_UNINTERRUPTIBLE + schedule() + non lock wakeup + task->saved_state = TASK_RUNNING + + lock wakeup + task->state = task->saved_state + + Questo garantisce che il vero risveglio non venga perso. + +rwlock_t +======== + +Il blocco rwlock_t è un meccanismo che permette più lettori ma un solo scrittore. + +Sui kernel non-PREEMPT_RT questo è un blocco ad attesa e per i suoi suffissi si +applicano le stesse regole per spinlock_t. La sua implementazione è imparziale, +quindi previene l'inedia dei processi scrittori. + +rwlock_t e PREEMPT_RT +--------------------- + +Sui kernel PREEMPT_RT rwlock_t ha un'implementazione dedicata che si basa +sull'uso di rt_mutex. Questo ne modifica il significato: + + - Tutte le modifiche fatte a spinlock_t si applicano anche a rwlock_t. + + - Dato che uno scrittore rw_semaphore non può assicurare la propria priorità ai + suoi lettori, un lettore con priorità più bassa che ha subito la prelazione + continuerà a trattenere il blocco, quindi porta all'inedia anche gli + scrittori con priorità più alta. Per contro, dato che i lettori possono + garantire la propria priorità agli scrittori, uno scrittore a bassa priorità + che subisce la prelazione vedrà la propria priorità alzata finché non + rilascerà il blocco, e questo preverrà l'inedia dei processi lettori a causa + di uno scrittore. + + +Precisazioni su PREEMPT_RT +========================== + +local_lock su RT +---------------- + +Sui kernel PREEMPT_RT Ci sono alcune implicazioni dovute alla conversione di +local_lock in un spinlock_t. Per esempio, su un kernel non-PREEMPT_RT il +seguente codice funzionerà come ci si aspetta:: + + local_lock_irq(&local_lock); + raw_spin_lock(&lock); + +ed è equivalente a:: + + raw_spin_lock_irq(&lock); + +Ma su un kernel PREEMPT_RT questo codice non funzionerà perché local_lock_irq() +si traduce in uno spinlock_t per CPU che non disabilita né le interruzioni né la +prelazione. Il seguente codice funzionerà su entrambe i kernel con o senza +PREEMPT_RT:: + + local_lock_irq(&local_lock); + spin_lock(&lock); + +Un altro dettaglio da tenere a mente con local_lock è che ognuno di loro ha un +ambito di protezione ben preciso. Dunque, la seguente sostituzione è errate:: + + + func1() + { + local_irq_save(flags); -> local_lock_irqsave(&local_lock_1, flags); + func3(); + local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_1, flags); + } + + func2() + { + local_irq_save(flags); -> local_lock_irqsave(&local_lock_2, flags); + func3(); + local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_2, flags); + } + + func3() + { + lockdep_assert_irqs_disabled(); + access_protected_data(); + } + +Questo funziona correttamente su un kernel non-PREEMPT_RT, ma su un kernel +PREEMPT_RT local_lock_1 e local_lock_2 sono distinti e non possono serializzare +i chiamanti di func3(). L'*assert* di lockdep verrà attivato su un kernel +PREEMPT_RT perché local_lock_irqsave() non disabilita le interruzione a casa +della specifica semantica di spinlock_t in PREEMPT_RT. La corretta sostituzione +è:: + + func1() + { + local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags); + func3(); + local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags); + } + + func2() + { + local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags); + func3(); + local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags); + } + + func3() + { + lockdep_assert_held(&local_lock); + access_protected_data(); + } + +spinlock_t e rwlock_t +--------------------- + +Ci sono alcune conseguenze di cui tener conto dal cambiamento di semantica di +spinlock_t e rwlock_t sui kernel PREEMPT_RT. Per esempio, sui kernel non +PREEMPT_RT il seguente codice funziona come ci si aspetta:: + + local_irq_disable(); + spin_lock(&lock); + +ed è equivalente a:: + + spin_lock_irq(&lock); + +Lo stesso vale per rwlock_t e le varianti con _irqsave(). + +Sui kernel PREEMPT_RT questo codice non funzionerà perché gli rtmutex richiedono +un contesto con la possibilità di prelazione. Al suo posto, usate +spin_lock_irq() o spin_lock_irqsave() e le loro controparti per il rilascio. I +kernel PREEMPT_RT offrono un meccanismo local_lock per i casi in cui la +disabilitazione delle interruzioni ed acquisizione di un blocco devono rimanere +separati. Acquisire un local_lock àncora un processo ad una CPU permettendo cose +come un'acquisizione di un blocco con interruzioni disabilitate per singola CPU. + +Il tipico scenario è quando si vuole proteggere una variabile di processore nel +contesto di un thread:: + + + struct foo *p = get_cpu_ptr(&var1); + + spin_lock(&p->lock); + p->count += this_cpu_read(var2); + +Questo codice è corretto su un kernel non-PREEMPT_RT, ma non lo è su un +PREEMPT_RT. La modifica della semantica di spinlock_t su PREEMPT_RT non permette +di acquisire p->lock perché, implicitamente, get_cpu_ptr() disabilita la +prelazione. La seguente sostituzione funzionerà su entrambe i kernel:: + + struct foo *p; + + migrate_disable(); + p = this_cpu_ptr(&var1); + spin_lock(&p->lock); + p->count += this_cpu_read(var2); + +La funzione migrate_disable() assicura che il processo venga tenuto sulla CPU +corrente, e di conseguenza garantisce che gli accessi per-CPU alle variabili var1 e +var2 rimangano sulla stessa CPU fintanto che il processo rimane prelabile. + +La sostituzione con migrate_disable() non funzionerà nel seguente scenario:: + + func() + { + struct foo *p; + + migrate_disable(); + p = this_cpu_ptr(&var1); + p->val = func2(); + +Questo non funziona perché migrate_disable() non protegge dal ritorno da un +processo che aveva avuto il diritto di prelazione. Una sostituzione più adatta +per questo caso è:: + + func() + { + struct foo *p; + + local_lock(&foo_lock); + p = this_cpu_ptr(&var1); + p->val = func2(); + +Su un kernel non-PREEMPT_RT, questo codice protegge dal rientro disabilitando la +prelazione. Su un kernel PREEMPT_RT si ottiene lo stesso risultato acquisendo lo +spinlock di CPU. + +raw_spinlock_t su RT +-------------------- + +Acquisire un raw_spinlock_t disabilita la prelazione e possibilmente anche le +interruzioni, quindi la sezione critica deve evitare di acquisire uno spinlock_t +o rwlock_t. Per esempio, la sezione critica non deve fare allocazioni di +memoria. Su un kernel non-PREEMPT_RT il seguente codice funziona perfettamente:: + + raw_spin_lock(&lock); + p = kmalloc(sizeof(*p), GFP_ATOMIC); + +Ma lo stesso codice non funziona su un kernel PREEMPT_RT perché l'allocatore di +memoria può essere oggetto di prelazione e quindi non può essere chiamato in un +contesto atomico. Tuttavia, si può chiamare l'allocatore di memoria quando si +trattiene un blocco *non-raw* perché non disabilitano la prelazione sui kernel +PREEMPT_RT:: + + spin_lock(&lock); + p = kmalloc(sizeof(*p), GFP_ATOMIC); + + +bit spinlocks +------------- + +I kernel PREEMPT_RT non possono sostituire i bit spinlock perché un singolo bit +è troppo piccolo per farci stare un rtmutex. Dunque, la semantica dei bit +spinlock è mantenuta anche sui kernel PREEMPT_RT. Quindi, le precisazioni fatte +per raw_spinlock_t valgono anche qui. + +In PREEMPT_RT, alcuni bit spinlock sono sostituiti con normali spinlock_t usando +condizioni di preprocessore in base a dove vengono usati. Per contro, questo non +serve quando si sostituiscono gli spinlock_t. Invece, le condizioni poste sui +file d'intestazione e sul cuore dell'implementazione della sincronizzazione +permettono al compilatore di effettuare la sostituzione in modo trasparente. + + +Regole d'annidamento dei tipi di blocchi +======================================== + +Le regole principali sono: + + - I tipi di blocco appartenenti alla stessa categoria possono essere annidati + liberamente a patto che si rispetti l'ordine di blocco al fine di evitare + stalli. + + - I blocchi con sospensione non possono essere annidati in blocchi del tipo + CPU locale o ad attesa attiva + + - I blocchi ad attesa attiva e su CPU locale possono essere annidati nei + blocchi ad attesa con sospensione. + + - I blocchi ad attesa attiva possono essere annidati in qualsiasi altro tipo. + +Queste limitazioni si applicano ad entrambe i kernel con o senza PREEMPT_RT. + +Il fatto che un kernel PREEMPT_RT cambi i blocchi spinlock_t e rwlock_t dal tipo +ad attesa attiva a quello con sospensione, e che sostituisca local_lock con uno +spinlock_t per CPU, significa che non possono essere acquisiti quando si è in un +blocco raw_spinlock. Ne consegue il seguente ordine d'annidamento: + + 1) blocchi ad attesa con sospensione + 2) spinlock_t, rwlock_t, local_lock + 3) raw_spinlock_t e bit spinlocks + +Se queste regole verranno violate, allora lockdep se ne accorgerà e questo sia +con o senza PREEMPT_RT. diff --git a/Documentation/translations/it_IT/networking/netdev-FAQ.rst b/Documentation/translations/it_IT/networking/netdev-FAQ.rst deleted file mode 100644 index 8a1e049585c0..000000000000 --- a/Documentation/translations/it_IT/networking/netdev-FAQ.rst +++ /dev/null @@ -1,13 +0,0 @@ -.. include:: ../disclaimer-ita.rst - -:Original: :ref:`Documentation/process/maintainer-netdev.rst <netdev-FAQ>` - -.. _it_netdev-FAQ: - -========== -netdev FAQ -========== - -.. warning:: - - TODO ancora da tradurre diff --git a/Documentation/translations/it_IT/process/coding-style.rst b/Documentation/translations/it_IT/process/coding-style.rst index 5f244e16f511..284a75ac19f8 100644 --- a/Documentation/translations/it_IT/process/coding-style.rst +++ b/Documentation/translations/it_IT/process/coding-style.rst @@ -575,9 +575,9 @@ due parti ``err_free_bar:`` e ``err_free_foo:``: .. code-block:: c - err_free_bar: + err_free_bar: kfree(foo->bar); - err_free_foo: + err_free_foo: kfree(foo); return ret; @@ -671,7 +671,7 @@ segue nel vostro file .emacs: (c-offsets-alist . ( (arglist-close . c-lineup-arglist-tabs-only) (arglist-cont-nonempty . - (c-lineup-gcc-asm-reg c-lineup-arglist-tabs-only)) + (c-lineup-gcc-asm-reg c-lineup-arglist-tabs-only)) (arglist-intro . +) (brace-list-intro . +) (c . c-lineup-C-comments) diff --git a/Documentation/translations/it_IT/subsystem-apis.rst b/Documentation/translations/it_IT/subsystem-apis.rst new file mode 100644 index 000000000000..d179af60c26d --- /dev/null +++ b/Documentation/translations/it_IT/subsystem-apis.rst @@ -0,0 +1,47 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================================== +Documentazione dei sottosistemi del kernel +========================================== + +In questa parte della documentazione si entra nel dettaglio di come funzionano +i sottosistemi specifici del kernel dal punto di vista di uno sviluppatore del +kernel. Molte delle informazioni qui contenute provengono direttamente dai +sorgenti del kernel, con aggiunte di materiale dove è necessario (anche se +talora *non* è stato aggiunto tutto ciò che era necessario). + +Sottosistemi principali +----------------------- + +.. toctree:: + :maxdepth: 1 + + core-api/index + +Interfacce uomo-macchina +------------------------ + +.. toctree:: + :maxdepth: 1 + + +Interfacce di rete +------------------ + +.. toctree:: + :maxdepth: 1 + +Interfacce per l'archiviazione +------------------------------ + +.. toctree:: + :maxdepth: 1 + + +Interfacce varie +---------------- + +.. toctree:: + :maxdepth: 1 + + i2c/index diff --git a/Documentation/translations/ja_JP/index.rst b/Documentation/translations/ja_JP/index.rst index 43b9fb7246d3..0b476b429e3b 100644 --- a/Documentation/translations/ja_JP/index.rst +++ b/Documentation/translations/ja_JP/index.rst @@ -11,7 +11,7 @@ .. toctree:: :maxdepth: 1 - howto + process/howto .. raw:: latex diff --git a/Documentation/translations/ja_JP/howto.rst b/Documentation/translations/ja_JP/process/howto.rst index 8d856ebe873c..8d856ebe873c 100644 --- a/Documentation/translations/ja_JP/howto.rst +++ b/Documentation/translations/ja_JP/process/howto.rst diff --git a/Documentation/translations/sp_SP/process/coding-style.rst b/Documentation/translations/sp_SP/process/coding-style.rst index a0261ba5b902..a37274764371 100644 --- a/Documentation/translations/sp_SP/process/coding-style.rst +++ b/Documentation/translations/sp_SP/process/coding-style.rst @@ -604,9 +604,9 @@ Normalmente la solución para esto es dividirlo en dos etiquetas de error .. code-block:: c - err_free_bar: + err_free_bar: kfree(foo->bar); - err_free_foo: + err_free_foo: kfree(foo); return ret; @@ -698,7 +698,7 @@ sanos. Para hacer esto último, puede pegar lo siguiente en su archivo (c-offsets-alist . ( (arglist-close . c-lineup-arglist-tabs-only) (arglist-cont-nonempty . - (c-lineup-gcc-asm-reg c-lineup-arglist-tabs-only)) + (c-lineup-gcc-asm-reg c-lineup-arglist-tabs-only)) (arglist-intro . +) (brace-list-intro . +) (c . c-lineup-C-comments) diff --git a/Documentation/translations/sp_SP/process/embargoed-hardware-issues.rst b/Documentation/translations/sp_SP/process/embargoed-hardware-issues.rst index c261b428b3f0..7d4d694967c7 100644 --- a/Documentation/translations/sp_SP/process/embargoed-hardware-issues.rst +++ b/Documentation/translations/sp_SP/process/embargoed-hardware-issues.rst @@ -273,7 +273,7 @@ revelada involucrada. La lista de embajadores actuales: IBM Power Anton Blanchard <anton@linux.ibm.com> IBM Z Christian Borntraeger <borntraeger@de.ibm.com> Intel Tony Luck <tony.luck@intel.com> - Qualcomm Trilok Soni <tsoni@codeaurora.org> + Qualcomm Trilok Soni <quic_tsoni@quicinc.com> Samsung Javier González <javier.gonz@samsung.com> Microsoft James Morris <jamorris@linux.microsoft.com> diff --git a/Documentation/translations/sp_SP/process/researcher-guidelines.rst b/Documentation/translations/sp_SP/process/researcher-guidelines.rst index 462b3290b7b8..deccc908a68d 100644 --- a/Documentation/translations/sp_SP/process/researcher-guidelines.rst +++ b/Documentation/translations/sp_SP/process/researcher-guidelines.rst @@ -147,4 +147,4 @@ Si no se puede encontrar a nadie para revisar internamente los parches y necesit ayuda para encontrar a esa persona, o si tiene alguna otra pregunta relacionada con este documento y las expectativas de la comunidad de desarrolladores, por favor contacte con la lista de correo privada Technical Advisory Board: -<tech-board@lists.linux-foundation.org>. +<tech-board@groups.linuxfoundation.org>. diff --git a/Documentation/translations/zh_CN/power/opp.rst b/Documentation/translations/zh_CN/power/opp.rst index 8d6e3f6f6202..7470fa2d4c43 100644 --- a/Documentation/translations/zh_CN/power/opp.rst +++ b/Documentation/translations/zh_CN/power/opp.rst @@ -274,7 +274,7 @@ dev_pm_opp_get_opp_count { /* 做一些事情 */ num_available = dev_pm_opp_get_opp_count(dev); - speeds = kzalloc(sizeof(u32) * num_available, GFP_KERNEL); + speeds = kcalloc(num_available, sizeof(u32), GFP_KERNEL); /* 按升序填充表 */ freq = 0; while (!IS_ERR(opp = dev_pm_opp_find_freq_ceil(dev, &freq))) { diff --git a/Documentation/translations/zh_CN/process/coding-style.rst b/Documentation/translations/zh_CN/process/coding-style.rst index fa28ef0a7fee..3bc2810b151d 100644 --- a/Documentation/translations/zh_CN/process/coding-style.rst +++ b/Documentation/translations/zh_CN/process/coding-style.rst @@ -523,9 +523,9 @@ Linux 里这是提倡的做法,因为这样可以很简单的给读者提供 .. code-block:: c - err_free_bar: + err_free_bar: kfree(foo->bar); - err_free_foo: + err_free_foo: kfree(foo); return ret; diff --git a/Documentation/translations/zh_CN/process/embargoed-hardware-issues.rst b/Documentation/translations/zh_CN/process/embargoed-hardware-issues.rst index cf5f1fca3d92..c90ecb557811 100644 --- a/Documentation/translations/zh_CN/process/embargoed-hardware-issues.rst +++ b/Documentation/translations/zh_CN/process/embargoed-hardware-issues.rst @@ -177,7 +177,7 @@ CVE分配 AMD Tom Lendacky <thomas.lendacky@amd.com> IBM Intel Tony Luck <tony.luck@intel.com> - Qualcomm Trilok Soni <tsoni@codeaurora.org> + Qualcomm Trilok Soni <quic_tsoni@quicinc.com> Microsoft Sasha Levin <sashal@kernel.org> VMware diff --git a/Documentation/translations/zh_CN/userspace-api/accelerators/ocxl.rst b/Documentation/translations/zh_CN/userspace-api/accelerators/ocxl.rst index 845b932bf935..aefad87e9099 100644 --- a/Documentation/translations/zh_CN/userspace-api/accelerators/ocxl.rst +++ b/Documentation/translations/zh_CN/userspace-api/accelerators/ocxl.rst @@ -53,7 +53,7 @@ OpenCAPI定义了一个在物理链路层上实现的数据链路层(TL)和 Processor:处理器 Memory:内存 - Accelerated Function Unit:加速函数单元 + Accelerated Function Unit:加速功能单元 @@ -97,7 +97,7 @@ OpenCAPI拥有AFU向主机进程发送中断的可能性。它通过定义在传 ======== 驱动为每个在物理设备上发现的AFU创建一个字符设备。一个物理设备可能拥有多个 -函数,一个函数可以拥有多个AFU。不过编写这篇文档之时,只对导出一个AFU的设备 +功能,一个功能可以拥有多个AFU。不过编写这篇文档之时,只对导出一个AFU的设备 测试过。 字符设备可以在 /dev/ocxl/ 中被找到,其命名为: diff --git a/Documentation/translations/zh_TW/process/coding-style.rst b/Documentation/translations/zh_TW/process/coding-style.rst index f11dbb65ca21..c7ac504f6f40 100644 --- a/Documentation/translations/zh_TW/process/coding-style.rst +++ b/Documentation/translations/zh_TW/process/coding-style.rst @@ -526,9 +526,9 @@ Linux 裏這是提倡的做法,因爲這樣可以很簡單的給讀者提供 .. code-block:: c - err_free_bar: + err_free_bar: kfree(foo->bar); - err_free_foo: + err_free_foo: kfree(foo); return ret; diff --git a/Documentation/translations/zh_TW/process/embargoed-hardware-issues.rst b/Documentation/translations/zh_TW/process/embargoed-hardware-issues.rst index 3cce7db2ab7e..93d21fd88910 100644 --- a/Documentation/translations/zh_TW/process/embargoed-hardware-issues.rst +++ b/Documentation/translations/zh_TW/process/embargoed-hardware-issues.rst @@ -180,7 +180,7 @@ CVE分配 AMD Tom Lendacky <thomas.lendacky@amd.com> IBM Intel Tony Luck <tony.luck@intel.com> - Qualcomm Trilok Soni <tsoni@codeaurora.org> + Qualcomm Trilok Soni <quic_tsoni@quicinc.com> Microsoft Sasha Levin <sashal@kernel.org> VMware diff --git a/Documentation/usb/gadget-testing.rst b/Documentation/usb/gadget-testing.rst index 8cd62c466d20..077dfac7ed98 100644 --- a/Documentation/usb/gadget-testing.rst +++ b/Documentation/usb/gadget-testing.rst @@ -448,17 +448,17 @@ Function-specific configfs interface The function name to use when creating the function directory is "ncm". The NCM function provides these attributes in its function directory: - =============== ================================================== - ifname network device interface name associated with this - function instance - qmult queue length multiplier for high and super speed - host_addr MAC address of host's end of this - Ethernet over USB link - dev_addr MAC address of device's end of this - Ethernet over USB link - max_segment_size Segment size required for P2P connections. This - will set MTU to (max_segment_size - 14 bytes) - =============== ================================================== + ======================= ================================================== + ifname network device interface name associated with this + function instance + qmult queue length multiplier for high and super speed + host_addr MAC address of host's end of this + Ethernet over USB link + dev_addr MAC address of device's end of this + Ethernet over USB link + max_segment_size Segment size required for P2P connections. This + will set MTU to 14 bytes + ======================= ================================================== and after creating the functions/ncm.<instance name> they contain default values: qmult is 5, dev_addr and host_addr are randomly selected. diff --git a/Documentation/userspace-api/gpio/chardev.rst b/Documentation/userspace-api/gpio/chardev.rst new file mode 100644 index 000000000000..c58dd9771ac9 --- /dev/null +++ b/Documentation/userspace-api/gpio/chardev.rst @@ -0,0 +1,116 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +GPIO Character Device Userspace API +=================================== + +This is latest version (v2) of the character device API, as defined in +``include/uapi/linux/gpio.h.`` + +First added in 5.10. + +.. note:: + Do NOT abuse userspace APIs to control hardware that has proper kernel + drivers. There may already be a driver for your use case, and an existing + kernel driver is sure to provide a superior solution to bitbashing + from userspace. + + Read Documentation/driver-api/gpio/drivers-on-gpio.rst to avoid reinventing + kernel wheels in userspace. + + Similarly, for multi-function lines there may be other subsystems, such as + Documentation/spi/index.rst, Documentation/i2c/index.rst, + Documentation/driver-api/pwm.rst, Documentation/w1/index.rst etc, that + provide suitable drivers and APIs for your hardware. + +Basic examples using the character device API can be found in ``tools/gpio/*``. + +The API is based around two major objects, the :ref:`gpio-v2-chip` and the +:ref:`gpio-v2-line-request`. + +.. _gpio-v2-chip: + +Chip +==== + +The Chip represents a single GPIO chip and is exposed to userspace using device +files of the form ``/dev/gpiochipX``. + +Each chip supports a number of GPIO lines, +:c:type:`chip.lines<gpiochip_info>`. Lines on the chip are identified by an +``offset`` in the range from 0 to ``chip.lines - 1``, i.e. `[0,chip.lines)`. + +Lines are requested from the chip using gpio-v2-get-line-ioctl.rst +and the resulting line request is used to access the GPIO chip's lines or +monitor the lines for edge events. + +Within this documentation, the file descriptor returned by calling `open()` +on the GPIO device file is referred to as ``chip_fd``. + +Operations +---------- + +The following operations may be performed on the chip: + +.. toctree:: + :titlesonly: + + Get Line <gpio-v2-get-line-ioctl> + Get Chip Info <gpio-get-chipinfo-ioctl> + Get Line Info <gpio-v2-get-lineinfo-ioctl> + Watch Line Info <gpio-v2-get-lineinfo-watch-ioctl> + Unwatch Line Info <gpio-get-lineinfo-unwatch-ioctl> + Read Line Info Changed Events <gpio-v2-lineinfo-changed-read> + +.. _gpio-v2-line-request: + +Line Request +============ + +Line requests are created by gpio-v2-get-line-ioctl.rst and provide +access to a set of requested lines. The line request is exposed to userspace +via the anonymous file descriptor returned in +:c:type:`request.fd<gpio_v2_line_request>` by gpio-v2-get-line-ioctl.rst. + +Within this documentation, the line request file descriptor is referred to +as ``req_fd``. + +Operations +---------- + +The following operations may be performed on the line request: + +.. toctree:: + :titlesonly: + + Get Line Values <gpio-v2-line-get-values-ioctl> + Set Line Values <gpio-v2-line-set-values-ioctl> + Read Line Edge Events <gpio-v2-line-event-read> + Reconfigure Lines <gpio-v2-line-set-config-ioctl> + +Types +===== + +This section contains the structs and enums that are referenced by the API v2, +as defined in ``include/uapi/linux/gpio.h``. + +.. kernel-doc:: include/uapi/linux/gpio.h + :identifiers: + gpio_v2_line_attr_id + gpio_v2_line_attribute + gpio_v2_line_changed_type + gpio_v2_line_config + gpio_v2_line_config_attribute + gpio_v2_line_event + gpio_v2_line_event_id + gpio_v2_line_flag + gpio_v2_line_info + gpio_v2_line_info_changed + gpio_v2_line_request + gpio_v2_line_values + gpiochip_info + +.. toctree:: + :hidden: + + error-codes diff --git a/Documentation/userspace-api/gpio/chardev_v1.rst b/Documentation/userspace-api/gpio/chardev_v1.rst new file mode 100644 index 000000000000..67124b1d0487 --- /dev/null +++ b/Documentation/userspace-api/gpio/chardev_v1.rst @@ -0,0 +1,131 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================================== +GPIO Character Device Userspace API (v1) +======================================== + +.. warning:: + This API is obsoleted by chardev.rst (v2). + + New developments should use the v2 API, and existing developments are + encouraged to migrate as soon as possible, as this API will be removed + in the future. The v2 API is a functional superset of the v1 API so any + v1 call can be directly translated to a v2 equivalent. + + This interface will continue to be maintained for the migration period, + but new features will only be added to the new API. + +First added in 4.8. + +The API is based around three major objects, the :ref:`gpio-v1-chip`, the +:ref:`gpio-v1-line-handle`, and the :ref:`gpio-v1-line-event`. + +Where "line event" is used in this document it refers to the request that can +monitor a line for edge events, not the edge events themselves. + +.. _gpio-v1-chip: + +Chip +==== + +The Chip represents a single GPIO chip and is exposed to userspace using device +files of the form ``/dev/gpiochipX``. + +Each chip supports a number of GPIO lines, +:c:type:`chip.lines<gpiochip_info>`. Lines on the chip are identified by an +``offset`` in the range from 0 to ``chip.lines - 1``, i.e. `[0,chip.lines)`. + +Lines are requested from the chip using either gpio-get-linehandle-ioctl.rst +and the resulting line handle is used to access the GPIO chip's lines, or +gpio-get-lineevent-ioctl.rst and the resulting line event is used to monitor +a GPIO line for edge events. + +Within this documentation, the file descriptor returned by calling `open()` +on the GPIO device file is referred to as ``chip_fd``. + +Operations +---------- + +The following operations may be performed on the chip: + +.. toctree:: + :titlesonly: + + Get Line Handle <gpio-get-linehandle-ioctl> + Get Line Event <gpio-get-lineevent-ioctl> + Get Chip Info <gpio-get-chipinfo-ioctl> + Get Line Info <gpio-get-lineinfo-ioctl> + Watch Line Info <gpio-get-lineinfo-watch-ioctl> + Unwatch Line Info <gpio-get-lineinfo-unwatch-ioctl> + Read Line Info Changed Events <gpio-lineinfo-changed-read> + +.. _gpio-v1-line-handle: + +Line Handle +=========== + +Line handles are created by gpio-get-linehandle-ioctl.rst and provide +access to a set of requested lines. The line handle is exposed to userspace +via the anonymous file descriptor returned in +:c:type:`request.fd<gpiohandle_request>` by gpio-get-linehandle-ioctl.rst. + +Within this documentation, the line handle file descriptor is referred to +as ``handle_fd``. + +Operations +---------- + +The following operations may be performed on the line handle: + +.. toctree:: + :titlesonly: + + Get Line Values <gpio-handle-get-line-values-ioctl> + Set Line Values <gpio-handle-set-line-values-ioctl> + Reconfigure Lines <gpio-handle-set-config-ioctl> + +.. _gpio-v1-line-event: + +Line Event +========== + +Line events are created by gpio-get-lineevent-ioctl.rst and provide +access to a requested line. The line event is exposed to userspace +via the anonymous file descriptor returned in +:c:type:`request.fd<gpioevent_request>` by gpio-get-lineevent-ioctl.rst. + +Within this documentation, the line event file descriptor is referred to +as ``event_fd``. + +Operations +---------- + +The following operations may be performed on the line event: + +.. toctree:: + :titlesonly: + + Get Line Value <gpio-handle-get-line-values-ioctl> + Read Line Edge Events <gpio-lineevent-data-read> + +Types +===== + +This section contains the structs that are referenced by the ABI v1. + +The :c:type:`struct gpiochip_info<gpiochip_info>` is common to ABI v1 and v2. + +.. kernel-doc:: include/uapi/linux/gpio.h + :identifiers: + gpioevent_data + gpioevent_request + gpiohandle_config + gpiohandle_data + gpiohandle_request + gpioline_info + gpioline_info_changed + +.. toctree:: + :hidden: + + error-codes diff --git a/Documentation/userspace-api/gpio/error-codes.rst b/Documentation/userspace-api/gpio/error-codes.rst new file mode 100644 index 000000000000..6bf2948990cd --- /dev/null +++ b/Documentation/userspace-api/gpio/error-codes.rst @@ -0,0 +1,79 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _gpio_errors: + +******************* +GPIO Error Codes +******************* + +.. _gpio-errors: + +.. tabularcolumns:: |p{2.5cm}|p{15.0cm}| + +.. flat-table:: Common GPIO error codes + :header-rows: 0 + :stub-columns: 0 + :widths: 1 16 + + - - ``EAGAIN`` (aka ``EWOULDBLOCK``) + + - The device was opened in non-blocking mode and a read can't + be performed as there is no data available. + + - - ``EBADF`` + + - The file descriptor is not valid. + + - - ``EBUSY`` + + - The ioctl can't be handled because the device is busy. Typically + returned when an ioctl attempts something that would require the + usage of a resource that was already allocated. The ioctl must not + be retried without performing another action to fix the problem + first. + + - - ``EFAULT`` + + - There was a failure while copying data from/to userspace, probably + caused by an invalid pointer reference. + + - - ``EINVAL`` + + - One or more of the ioctl parameters are invalid or out of the + allowed range. This is a widely used error code. + + - - ``ENODEV`` + + - Device not found or was removed. + + - - ``ENOMEM`` + + - There's not enough memory to handle the desired operation. + + - - ``EPERM`` + + - Permission denied. Typically returned in response to an attempt + to perform an action incompatible with the current line + configuration. + + - - ``EIO`` + + - I/O error. Typically returned when there are problems communicating + with a hardware device or requesting features that hardware does not + support. This could indicate broken or flaky hardware. + It's a 'Something is wrong, I give up!' type of error. + + - - ``ENXIO`` + + - Typically returned when a feature requiring interrupt support was + requested, but the line does not support interrupts. + +.. note:: + + #. This list is not exhaustive; ioctls may return other error codes. + Since errors may have side effects such as a driver reset, + applications should abort on unexpected errors, or otherwise + assume that the device is in a bad state. + + #. Request-specific error codes are listed in the individual + requests descriptions. diff --git a/Documentation/userspace-api/gpio/gpio-get-chipinfo-ioctl.rst b/Documentation/userspace-api/gpio/gpio-get-chipinfo-ioctl.rst new file mode 100644 index 000000000000..05f07fdefe2f --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-get-chipinfo-ioctl.rst @@ -0,0 +1,41 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_GET_CHIPINFO_IOCTL: + +*********************** +GPIO_GET_CHIPINFO_IOCTL +*********************** + +Name +==== + +GPIO_GET_CHIPINFO_IOCTL - Get the publicly available information for a chip. + +Synopsis +======== + +.. c:macro:: GPIO_GET_CHIPINFO_IOCTL + +``int ioctl(int chip_fd, GPIO_GET_CHIPINFO_IOCTL, struct gpiochip_info *info)`` + +Arguments +========= + +``chip_fd`` + The file descriptor of the GPIO character device returned by `open()`. + +``info`` + The :c:type:`chip_info<gpiochip_info>` to be populated. + +Description +=========== + +Gets the publicly available information for a particular GPIO chip. + +Return Value +============ + +On success 0 and ``info`` is populated with the chip info. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-get-lineevent-ioctl.rst b/Documentation/userspace-api/gpio/gpio-get-lineevent-ioctl.rst new file mode 100644 index 000000000000..09a9254f38cf --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-get-lineevent-ioctl.rst @@ -0,0 +1,84 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_GET_LINEEVENT_IOCTL: + +************************ +GPIO_GET_LINEEVENT_IOCTL +************************ + +.. warning:: + This ioctl is part of chardev_v1.rst and is obsoleted by + gpio-v2-get-line-ioctl.rst. + +Name +==== + +GPIO_GET_LINEEVENT_IOCTL - Request a line with edge detection from the kernel. + +Synopsis +======== + +.. c:macro:: GPIO_GET_LINEEVENT_IOCTL + +``int ioctl(int chip_fd, GPIO_GET_LINEEVENT_IOCTL, struct gpioevent_request *request)`` + +Arguments +========= + +``chip_fd`` + The file descriptor of the GPIO character device returned by `open()`. + +``request`` + The :c:type:`event_request<gpioevent_request>` specifying the line + to request and its configuration. + +Description +=========== + +Request a line with edge detection from the kernel. + +On success, the requesting process is granted exclusive access to the line +value and may receive events when edges are detected on the line, as +described in gpio-lineevent-data-read.rst. + +The state of a line is guaranteed to remain as requested until the returned +file descriptor is closed. Once the file descriptor is closed, the state of +the line becomes uncontrolled from the userspace perspective, and may revert +to its default state. + +Requesting a line already in use is an error (**EBUSY**). + +Requesting edge detection on a line that does not support interrupts is an +error (**ENXIO**). + +As with the :ref:`line handle<gpio-get-linehandle-config-support>`, the +bias configuration is best effort. + +Closing the ``chip_fd`` has no effect on existing line events. + +Configuration Rules +------------------- + +The following configuration rules apply: + +The line event is requested as an input, so no flags specific to output lines, +``GPIOHANDLE_REQUEST_OUTPUT``, ``GPIOHANDLE_REQUEST_OPEN_DRAIN``, or +``GPIOHANDLE_REQUEST_OPEN_SOURCE``, may be set. + +Only one bias flag, ``GPIOHANDLE_REQUEST_BIAS_xxx``, may be set. +If no bias flags are set then the bias configuration is not changed. + +The edge flags, ``GPIOEVENT_REQUEST_RISING_EDGE`` and +``GPIOEVENT_REQUEST_FALLING_EDGE``, may be combined to detect both rising +and falling edges. + +Requesting an invalid configuration is an error (**EINVAL**). + +Return Value +============ + +On success 0 and the :c:type:`request.fd<gpioevent_request>` contains the file +descriptor for the request. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-get-linehandle-ioctl.rst b/Documentation/userspace-api/gpio/gpio-get-linehandle-ioctl.rst new file mode 100644 index 000000000000..9112a9d31174 --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-get-linehandle-ioctl.rst @@ -0,0 +1,125 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_GET_LINEHANDLE_IOCTL: + +************************* +GPIO_GET_LINEHANDLE_IOCTL +************************* + +.. warning:: + This ioctl is part of chardev_v1.rst and is obsoleted by + gpio-v2-get-line-ioctl.rst. + +Name +==== + +GPIO_GET_LINEHANDLE_IOCTL - Request a line or lines from the kernel. + +Synopsis +======== + +.. c:macro:: GPIO_GET_LINEHANDLE_IOCTL + +``int ioctl(int chip_fd, GPIO_GET_LINEHANDLE_IOCTL, struct gpiohandle_request *request)`` + +Arguments +========= + +``chip_fd`` + The file descriptor of the GPIO character device returned by `open()`. + +``request`` + The :c:type:`handle_request<gpiohandle_request>` specifying the lines to + request and their configuration. + +Description +=========== + +Request a line or lines from the kernel. + +While multiple lines may be requested, the same configuration applies to all +lines in the request. + +On success, the requesting process is granted exclusive access to the line +value and write access to the line configuration. + +The state of a line, including the value of output lines, is guaranteed to +remain as requested until the returned file descriptor is closed. Once the +file descriptor is closed, the state of the line becomes uncontrolled from +the userspace perspective, and may revert to its default state. + +Requesting a line already in use is an error (**EBUSY**). + +Closing the ``chip_fd`` has no effect on existing line handles. + +.. _gpio-get-linehandle-config-rules: + +Configuration Rules +------------------- + +The following configuration rules apply: + +The direction flags, ``GPIOHANDLE_REQUEST_INPUT`` and +``GPIOHANDLE_REQUEST_OUTPUT``, cannot be combined. If neither are set then the +only other flag that may be set is ``GPIOHANDLE_REQUEST_ACTIVE_LOW`` and the +line is requested "as-is" to allow reading of the line value without altering +the electrical configuration. + +The drive flags, ``GPIOHANDLE_REQUEST_OPEN_xxx``, require the +``GPIOHANDLE_REQUEST_OUTPUT`` to be set. +Only one drive flag may be set. +If none are set then the line is assumed push-pull. + +Only one bias flag, ``GPIOHANDLE_REQUEST_BIAS_xxx``, may be set, and +it requires a direction flag to also be set. +If no bias flags are set then the bias configuration is not changed. + +Requesting an invalid configuration is an error (**EINVAL**). + + +.. _gpio-get-linehandle-config-support: + +Configuration Support +--------------------- + +Where the requested configuration is not directly supported by the underlying +hardware and driver, the kernel applies one of these approaches: + + - reject the request + - emulate the feature in software + - treat the feature as best effort + +The approach applied depends on whether the feature can reasonably be emulated +in software, and the impact on the hardware and userspace if the feature is not +supported. +The approach applied for each feature is as follows: + +============== =========== +Feature Approach +============== =========== +Bias best effort +Direction reject +Drive emulate +============== =========== + +Bias is treated as best effort to allow userspace to apply the same +configuration for platforms that support internal bias as those that require +external bias. +Worst case the line floats rather than being biased as expected. + +Drive is emulated by switching the line to an input when the line should not +be driven. + +In all cases, the configuration reported by gpio-get-lineinfo-ioctl.rst +is the requested configuration, not the resulting hardware configuration. +Userspace cannot determine if a feature is supported in hardware, is +emulated, or is best effort. + +Return Value +============ + +On success 0 and the :c:type:`request.fd<gpiohandle_request>` contains the +file descriptor for the request. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-get-lineinfo-ioctl.rst b/Documentation/userspace-api/gpio/gpio-get-lineinfo-ioctl.rst new file mode 100644 index 000000000000..c895b8910b25 --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-get-lineinfo-ioctl.rst @@ -0,0 +1,54 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_GET_LINEINFO_IOCTL: + +*********************** +GPIO_GET_LINEINFO_IOCTL +*********************** + +.. warning:: + This ioctl is part of chardev_v1.rst and is obsoleted by + gpio-v2-get-lineinfo-ioctl.rst. + +Name +==== + +GPIO_GET_LINEINFO_IOCTL - Get the publicly available information for a line. + +Synopsis +======== + +.. c:macro:: GPIO_GET_LINEINFO_IOCTL + +``int ioctl(int chip_fd, GPIO_GET_LINEINFO_IOCTL, struct gpioline_info *info)`` + +Arguments +========= + +``chip_fd`` + The file descriptor of the GPIO character device returned by `open()`. + +``info`` + The :c:type:`line_info<gpioline_info>` to be populated, with the + ``offset`` field set to indicate the line to be collected. + +Description +=========== + +Get the publicly available information for a line. + +This information is available independent of whether the line is in use. + +.. note:: + The line info does not include the line value. + + The line must be requested using gpio-get-linehandle-ioctl.rst or + gpio-get-lineevent-ioctl.rst to access its value. + +Return Value +============ + +On success 0 and ``info`` is populated with the chip info. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-get-lineinfo-unwatch-ioctl.rst b/Documentation/userspace-api/gpio/gpio-get-lineinfo-unwatch-ioctl.rst new file mode 100644 index 000000000000..a82d0161daf8 --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-get-lineinfo-unwatch-ioctl.rst @@ -0,0 +1,49 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_GET_LINEINFO_UNWATCH_IOCTL: + +******************************* +GPIO_GET_LINEINFO_UNWATCH_IOCTL +******************************* + +Name +==== + +GPIO_GET_LINEINFO_UNWATCH_IOCTL - Disable watching a line for changes to its +requested state and configuration information. + +Synopsis +======== + +.. c:macro:: GPIO_GET_LINEINFO_UNWATCH_IOCTL + +``int ioctl(int chip_fd, GPIO_GET_LINEINFO_UNWATCH_IOCTL, u32 *offset)`` + +Arguments +========= + +``chip_fd`` + The file descriptor of the GPIO character device returned by `open()`. + +``offset`` + The offset of the line to no longer watch. + +Description +=========== + +Remove the line from the list of lines being watched on this ``chip_fd``. + +This is the reverse of gpio-v2-get-lineinfo-watch-ioctl.rst (v2) and +gpio-get-lineinfo-watch-ioctl.rst (v1). + +Unwatching a line that is not watched is an error (**EBUSY**). + +First added in 5.7. + +Return Value +============ + +On success 0. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-get-lineinfo-watch-ioctl.rst b/Documentation/userspace-api/gpio/gpio-get-lineinfo-watch-ioctl.rst new file mode 100644 index 000000000000..f5c92b69a496 --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-get-lineinfo-watch-ioctl.rst @@ -0,0 +1,74 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_GET_LINEINFO_WATCH_IOCTL: + +***************************** +GPIO_GET_LINEINFO_WATCH_IOCTL +***************************** + +.. warning:: + This ioctl is part of chardev_v1.rst and is obsoleted by + gpio-v2-get-lineinfo-watch-ioctl.rst. + +Name +==== + +GPIO_GET_LINEINFO_WATCH_IOCTL - Enable watching a line for changes to its +request state and configuration information. + +Synopsis +======== + +.. c:macro:: GPIO_GET_LINEINFO_WATCH_IOCTL + +``int ioctl(int chip_fd, GPIO_GET_LINEINFO_WATCH_IOCTL, struct gpioline_info *info)`` + +Arguments +========= + +``chip_fd`` + The file descriptor of the GPIO character device returned by `open()`. + +``info`` + The :c:type:`line_info<gpioline_info>` struct to be populated, with + the ``offset`` set to indicate the line to watch + +Description +=========== + +Enable watching a line for changes to its request state and configuration +information. Changes to line info include a line being requested, released +or reconfigured. + +.. note:: + Watching line info is not generally required, and would typically only be + used by a system monitoring component. + + The line info does NOT include the line value. + + The line must be requested using gpio-get-linehandle-ioctl.rst or + gpio-get-lineevent-ioctl.rst to access its value, and the line event can + monitor a line for events using gpio-lineevent-data-read.rst. + +By default all lines are unwatched when the GPIO chip is opened. + +Multiple lines may be watched simultaneously by adding a watch for each. + +Once a watch is set, any changes to line info will generate events which can be +read from the ``chip_fd`` as described in +gpio-lineinfo-changed-read.rst. + +Adding a watch to a line that is already watched is an error (**EBUSY**). + +Watches are specific to the ``chip_fd`` and are independent of watches +on the same GPIO chip opened with a separate call to `open()`. + +First added in 5.7. + +Return Value +============ + +On success 0 and ``info`` is populated with the current line info. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-handle-get-line-values-ioctl.rst b/Documentation/userspace-api/gpio/gpio-handle-get-line-values-ioctl.rst new file mode 100644 index 000000000000..25263b8f0588 --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-handle-get-line-values-ioctl.rst @@ -0,0 +1,56 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIOHANDLE_GET_LINE_VALUES_IOCTL: + +******************************** +GPIOHANDLE_GET_LINE_VALUES_IOCTL +******************************** +.. warning:: + This ioctl is part of chardev_v1.rst and is obsoleted by + gpio-v2-line-get-values-ioctl.rst. + +Name +==== + +GPIOHANDLE_GET_LINE_VALUES_IOCTL - Get the values of all requested lines. + +Synopsis +======== + +.. c:macro:: GPIOHANDLE_GET_LINE_VALUES_IOCTL + +``int ioctl(int handle_fd, GPIOHANDLE_GET_LINE_VALUES_IOCTL, struct gpiohandle_data *values)`` + +Arguments +========= + +``handle_fd`` + The file descriptor of the GPIO character device, as returned in the + :c:type:`request.fd<gpiohandle_request>` by gpio-get-linehandle-ioctl.rst. + +``values`` + The :c:type:`line_values<gpiohandle_data>` to be populated. + +Description +=========== + +Get the values of all requested lines. + +The values of both input and output lines may be read. + +For output lines, the value returned is driver and configuration dependent and +may be either the output buffer (the last requested value set) or the input +buffer (the actual level of the line), and depending on the hardware and +configuration these may differ. + +This ioctl can also be used to read the line value for line events, +substituting the ``event_fd`` for the ``handle_fd``. As there is only +one line requested in that case, only the one value is returned in ``values``. + +Return Value +============ + +On success 0 and ``values`` populated with the values read. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-handle-set-config-ioctl.rst b/Documentation/userspace-api/gpio/gpio-handle-set-config-ioctl.rst new file mode 100644 index 000000000000..d002a84681ac --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-handle-set-config-ioctl.rst @@ -0,0 +1,63 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIOHANDLE_SET_CONFIG_IOCTL: + +*************************** +GPIOHANDLE_SET_CONFIG_IOCTL +*************************** + +.. warning:: + This ioctl is part of chardev_v1.rst and is obsoleted by + gpio-v2-line-set-config-ioctl.rst. + +Name +==== + +GPIOHANDLE_SET_CONFIG_IOCTL - Update the configuration of previously requested lines. + +Synopsis +======== + +.. c:macro:: GPIOHANDLE_SET_CONFIG_IOCTL + +``int ioctl(int handle_fd, GPIOHANDLE_SET_CONFIG_IOCTL, struct gpiohandle_config *config)`` + +Arguments +========= + +``handle_fd`` + The file descriptor of the GPIO character device, as returned in the + :c:type:`request.fd<gpiohandle_request>` by gpio-get-linehandle-ioctl.rst. + +``config`` + The new :c:type:`configuration<gpiohandle_config>` to apply to the + requested lines. + +Description +=========== + +Update the configuration of previously requested lines, without releasing the +line or introducing potential glitches. + +The configuration applies to all requested lines. + +The same :ref:`gpio-get-linehandle-config-rules` and +:ref:`gpio-get-linehandle-config-support` that apply when requesting the +lines also apply when updating the line configuration. + +The motivating use case for this command is changing direction of +bi-directional lines between input and output, but it may be used more +generally to move lines seamlessly from one configuration state to another. + +To only change the value of output lines, use +gpio-handle-set-line-values-ioctl.rst. + +First added in 5.5. + +Return Value +============ + +On success 0. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-handle-set-line-values-ioctl.rst b/Documentation/userspace-api/gpio/gpio-handle-set-line-values-ioctl.rst new file mode 100644 index 000000000000..0aa05e623a6c --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-handle-set-line-values-ioctl.rst @@ -0,0 +1,48 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_HANDLE_SET_LINE_VALUES_IOCTL: + +********************************* +GPIO_HANDLE_SET_LINE_VALUES_IOCTL +********************************* +.. warning:: + This ioctl is part of chardev_v1.rst and is obsoleted by + gpio-v2-line-set-values-ioctl.rst. + +Name +==== + +GPIO_HANDLE_SET_LINE_VALUES_IOCTL - Set the values of all requested output lines. + +Synopsis +======== + +.. c:macro:: GPIO_HANDLE_SET_LINE_VALUES_IOCTL + +``int ioctl(int handle_fd, GPIO_HANDLE_SET_LINE_VALUES_IOCTL, struct gpiohandle_data *values)`` + +Arguments +========= + +``handle_fd`` + The file descriptor of the GPIO character device, as returned in the + :c:type:`request.fd<gpiohandle_request>` by gpio-get-linehandle-ioctl.rst. + +``values`` + The :c:type:`line_values<gpiohandle_data>` to set. + +Description +=========== + +Set the values of all requested output lines. + +Only the values of output lines may be set. +Attempting to set the value of input lines is an error (**EPERM**). + +Return Value +============ + +On success 0. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-lineevent-data-read.rst b/Documentation/userspace-api/gpio/gpio-lineevent-data-read.rst new file mode 100644 index 000000000000..68b8d4f9f604 --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-lineevent-data-read.rst @@ -0,0 +1,84 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_LINEEVENT_DATA_READ: + +************************ +GPIO_LINEEVENT_DATA_READ +************************ + +.. warning:: + This ioctl is part of chardev_v1.rst and is obsoleted by + gpio-v2-line-event-read.rst. + +Name +==== + +GPIO_LINEEVENT_DATA_READ - Read edge detection events from a line event. + +Synopsis +======== + +``int read(int event_fd, void *buf, size_t count)`` + +Arguments +========= + +``event_fd`` + The file descriptor of the GPIO character device, as returned in the + :c:type:`request.fd<gpioevent_request>` by gpio-get-lineevent-ioctl.rst. + +``buf`` + The buffer to contain the :c:type:`events<gpioevent_data>`. + +``count`` + The number of bytes available in ``buf``, which must be at + least the size of a :c:type:`gpioevent_data`. + +Description +=========== + +Read edge detection events for a line from a line event. + +Edge detection must be enabled for the input line using either +``GPIOEVENT_REQUEST_RISING_EDGE`` or ``GPIOEVENT_REQUEST_FALLING_EDGE``, or +both. Edge events are then generated whenever edge interrupts are detected on +the input line. + +The kernel captures and timestamps edge events as close as possible to their +occurrence and stores them in a buffer from where they can be read by +userspace at its convenience using `read()`. + +The source of the clock for :c:type:`event.timestamp<gpioevent_data>` is +``CLOCK_MONOTONIC``, except for kernels earlier than Linux 5.7 when it was +``CLOCK_REALTIME``. There is no indication in the :c:type:`gpioevent_data` +as to which clock source is used, it must be determined from either the kernel +version or sanity checks on the timestamp itself. + +Events read from the buffer are always in the same order that they were +detected by the kernel. + +The size of the kernel event buffer is fixed at 16 events. + +The buffer may overflow if bursts of events occur quicker than they are read +by userspace. If an overflow occurs then the most recent event is discarded. +Overflow cannot be detected from userspace. + +To minimize the number of calls required to copy events from the kernel to +userspace, `read()` supports copying multiple events. The number of events +copied is the lower of the number available in the kernel buffer and the +number that will fit in the userspace buffer (``buf``). + +The `read()` will block if no event is available and the ``event_fd`` has not +been set **O_NONBLOCK**. + +The presence of an event can be tested for by checking that the ``event_fd`` is +readable using `poll()` or an equivalent. + +Return Value +============ + +On success the number of bytes read, which will be a multiple of the size of +a :c:type:`gpio_lineevent_data` event. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-lineinfo-changed-read.rst b/Documentation/userspace-api/gpio/gpio-lineinfo-changed-read.rst new file mode 100644 index 000000000000..c4f5e1787a9d --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-lineinfo-changed-read.rst @@ -0,0 +1,87 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_LINEINFO_CHANGED_READ: + +************************** +GPIO_LINEINFO_CHANGED_READ +************************** + +.. warning:: + This ioctl is part of chardev_v1.rst and is obsoleted by + gpio-v2-lineinfo-changed-read.rst. + +Name +==== + +GPIO_LINEINFO_CHANGED_READ - Read line info change events for watched lines +from the chip. + +Synopsis +======== + +``int read(int chip_fd, void *buf, size_t count)`` + +Arguments +========= + +``chip_fd`` + The file descriptor of the GPIO character device returned by `open()`. + +``buf`` + The buffer to contain the :c:type:`events<gpioline_info_changed>`. + +``count`` + The number of bytes available in ``buf``, which must be at least the size + of a :c:type:`gpioline_info_changed` event. + +Description +=========== + +Read line info change events for watched lines from the chip. + +.. note:: + Monitoring line info changes is not generally required, and would typically + only be performed by a system monitoring component. + + These events relate to changes in a line's request state or configuration, + not its value. Use gpio-lineevent-data-read.rst to receive events when a + line changes value. + +A line must be watched using gpio-get-lineinfo-watch-ioctl.rst to generate +info changed events. Subsequently, a request, release, or reconfiguration +of the line will generate an info changed event. + +The kernel timestamps events when they occur and stores them in a buffer +from where they can be read by userspace at its convenience using `read()`. + +The size of the kernel event buffer is fixed at 32 events per ``chip_fd``. + +The buffer may overflow if bursts of events occur quicker than they are read +by userspace. If an overflow occurs then the most recent event is discarded. +Overflow cannot be detected from userspace. + +Events read from the buffer are always in the same order that they were +detected by the kernel, including when multiple lines are being monitored by +the one ``chip_fd``. + +To minimize the number of calls required to copy events from the kernel to +userspace, `read()` supports copying multiple events. The number of events +copied is the lower of the number available in the kernel buffer and the +number that will fit in the userspace buffer (``buf``). + +A `read()` will block if no event is available and the ``chip_fd`` has not +been set **O_NONBLOCK**. + +The presence of an event can be tested for by checking that the ``chip_fd`` is +readable using `poll()` or an equivalent. + +First added in 5.7. + +Return Value +============ + +On success the number of bytes read, which will be a multiple of the size of +a :c:type:`gpioline_info_changed` event. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-v2-get-line-ioctl.rst b/Documentation/userspace-api/gpio/gpio-v2-get-line-ioctl.rst new file mode 100644 index 000000000000..56b975801b6a --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-v2-get-line-ioctl.rst @@ -0,0 +1,152 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_V2_GET_LINE_IOCTL: + +********************** +GPIO_V2_GET_LINE_IOCTL +********************** + +Name +==== + +GPIO_V2_GET_LINE_IOCTL - Request a line or lines from the kernel. + +Synopsis +======== + +.. c:macro:: GPIO_V2_GET_LINE_IOCTL + +``int ioctl(int chip_fd, GPIO_V2_GET_LINE_IOCTL, struct gpio_v2_line_request *request)`` + +Arguments +========= + +``chip_fd`` + The file descriptor of the GPIO character device returned by `open()`. + +``request`` + The :c:type:`line_request<gpio_v2_line_request>` specifying the lines + to request and their configuration. + +Description +=========== + +On success, the requesting process is granted exclusive access to the line +value, write access to the line configuration, and may receive events when +edges are detected on the line, all of which are described in more detail in +:ref:`gpio-v2-line-request`. + +A number of lines may be requested in the one line request, and request +operations are performed on the requested lines by the kernel as atomically +as possible. e.g. gpio-v2-line-get-values-ioctl.rst will read all the +requested lines at once. + +The state of a line, including the value of output lines, is guaranteed to +remain as requested until the returned file descriptor is closed. Once the +file descriptor is closed, the state of the line becomes uncontrolled from +the userspace perspective, and may revert to its default state. + +Requesting a line already in use is an error (**EBUSY**). + +Closing the ``chip_fd`` has no effect on existing line requests. + +.. _gpio-v2-get-line-config-rules: + +Configuration Rules +------------------- + +For any given requested line, the following configuration rules apply: + +The direction flags, ``GPIO_V2_LINE_FLAG_INPUT`` and +``GPIO_V2_LINE_FLAG_OUTPUT``, cannot be combined. If neither are set then +the only other flag that may be set is ``GPIO_V2_LINE_FLAG_ACTIVE_LOW`` +and the line is requested "as-is" to allow reading of the line value +without altering the electrical configuration. + +The drive flags, ``GPIO_V2_LINE_FLAG_OPEN_xxx``, require the +``GPIO_V2_LINE_FLAG_OUTPUT`` to be set. +Only one drive flag may be set. +If none are set then the line is assumed push-pull. + +Only one bias flag, ``GPIO_V2_LINE_FLAG_BIAS_xxx``, may be set, and it +requires a direction flag to also be set. +If no bias flags are set then the bias configuration is not changed. + +The edge flags, ``GPIO_V2_LINE_FLAG_EDGE_xxx``, require +``GPIO_V2_LINE_FLAG_INPUT`` to be set and may be combined to detect both rising +and falling edges. Requesting edge detection from a line that does not support +it is an error (**ENXIO**). + +Only one event clock flag, ``GPIO_V2_LINE_FLAG_EVENT_CLOCK_xxx``, may be set. +If none are set then the event clock defaults to ``CLOCK_MONOTONIC``. +The ``GPIO_V2_LINE_FLAG_EVENT_CLOCK_HTE`` flag requires supporting hardware +and a kernel with ``CONFIG_HTE`` set. Requesting HTE from a device that +doesn't support it is an error (**EOPNOTSUP**). + +The :c:type:`debounce_period_us<gpio_v2_line_attribute>` attribute may only +be applied to lines with ``GPIO_V2_LINE_FLAG_INPUT`` set. When set, debounce +applies to both the values returned by gpio-v2-line-get-values-ioctl.rst and +the edges returned by gpio-v2-line-event-read.rst. If not +supported directly by hardware, debouncing is emulated in software by the +kernel. Requesting debounce on a line that supports neither debounce in +hardware nor interrupts, as required for software emulation, is an error +(**ENXIO**). + +Requesting an invalid configuration is an error (**EINVAL**). + +.. _gpio-v2-get-line-config-support: + +Configuration Support +--------------------- + +Where the requested configuration is not directly supported by the underlying +hardware and driver, the kernel applies one of these approaches: + + - reject the request + - emulate the feature in software + - treat the feature as best effort + +The approach applied depends on whether the feature can reasonably be emulated +in software, and the impact on the hardware and userspace if the feature is not +supported. +The approach applied for each feature is as follows: + +============== =========== +Feature Approach +============== =========== +Bias best effort +Debounce emulate +Direction reject +Drive emulate +Edge Detection reject +============== =========== + +Bias is treated as best effort to allow userspace to apply the same +configuration for platforms that support internal bias as those that require +external bias. +Worst case the line floats rather than being biased as expected. + +Debounce is emulated by applying a filter to hardware interrupts on the line. +An edge event is generated after an edge is detected and the line remains +stable for the debounce period. +The event timestamp corresponds to the end of the debounce period. + +Drive is emulated by switching the line to an input when the line should not +be actively driven. + +Edge detection requires interrupt support, and is rejected if that is not +supported. Emulation by polling can still be performed from userspace. + +In all cases, the configuration reported by gpio-v2-get-lineinfo-ioctl.rst +is the requested configuration, not the resulting hardware configuration. +Userspace cannot determine if a feature is supported in hardware, is +emulated, or is best effort. + +Return Value +============ + +On success 0 and the :c:type:`request.fd<gpio_v2_line_request>` contains the +file descriptor for the request. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-ioctl.rst b/Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-ioctl.rst new file mode 100644 index 000000000000..bc4d8df887d4 --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-ioctl.rst @@ -0,0 +1,50 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_V2_GET_LINEINFO_IOCTL: + +************************** +GPIO_V2_GET_LINEINFO_IOCTL +************************** + +Name +==== + +GPIO_V2_GET_LINEINFO_IOCTL - Get the publicly available information for a line. + +Synopsis +======== + +.. c:macro:: GPIO_V2_GET_LINEINFO_IOCTL + +``int ioctl(int chip_fd, GPIO_V2_GET_LINEINFO_IOCTL, struct gpio_v2_line_info *info)`` + +Arguments +========= + +``chip_fd`` + The file descriptor of the GPIO character device returned by `open()`. + +``info`` + The :c:type:`line_info<gpio_v2_line_info>` to be populated, with the + ``offset`` field set to indicate the line to be collected. + +Description +=========== + +Get the publicly available information for a line. + +This information is available independent of whether the line is in use. + +.. note:: + The line info does not include the line value. + + The line must be requested using gpio-v2-get-line-ioctl.rst to access its + value. + +Return Value +============ + +On success 0 and ``info`` is populated with the chip info. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-watch-ioctl.rst b/Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-watch-ioctl.rst new file mode 100644 index 000000000000..938ff85a9322 --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-watch-ioctl.rst @@ -0,0 +1,67 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_V2_GET_LINEINFO_WATCH_IOCTL: + +******************************** +GPIO_V2_GET_LINEINFO_WATCH_IOCTL +******************************** + +Name +==== + +GPIO_V2_GET_LINEINFO_WATCH_IOCTL - Enable watching a line for changes to its +request state and configuration information. + +Synopsis +======== + +.. c:macro:: GPIO_V2_GET_LINEINFO_WATCH_IOCTL + +``int ioctl(int chip_fd, GPIO_V2_GET_LINEINFO_WATCH_IOCTL, struct gpio_v2_line_info *info)`` + +Arguments +========= + +``chip_fd`` + The file descriptor of the GPIO character device returned by `open()`. + +``info`` + The :c:type:`line_info<gpio_v2_line_info>` struct to be populated, with + the ``offset`` set to indicate the line to watch + +Description +=========== + +Enable watching a line for changes to its request state and configuration +information. Changes to line info include a line being requested, released +or reconfigured. + +.. note:: + Watching line info is not generally required, and would typically only be + used by a system monitoring component. + + The line info does NOT include the line value. + The line must be requested using gpio-v2-get-line-ioctl.rst to access + its value, and the line request can monitor a line for events using + gpio-v2-line-event-read.rst. + +By default all lines are unwatched when the GPIO chip is opened. + +Multiple lines may be watched simultaneously by adding a watch for each. + +Once a watch is set, any changes to line info will generate events which can be +read from the ``chip_fd`` as described in +gpio-v2-lineinfo-changed-read.rst. + +Adding a watch to a line that is already watched is an error (**EBUSY**). + +Watches are specific to the ``chip_fd`` and are independent of watches +on the same GPIO chip opened with a separate call to `open()`. + +Return Value +============ + +On success 0 and ``info`` is populated with the current line info. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-v2-line-event-read.rst b/Documentation/userspace-api/gpio/gpio-v2-line-event-read.rst new file mode 100644 index 000000000000..6513c23fb7ca --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-v2-line-event-read.rst @@ -0,0 +1,83 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_V2_LINE_EVENT_READ: + +*********************** +GPIO_V2_LINE_EVENT_READ +*********************** + +Name +==== + +GPIO_V2_LINE_EVENT_READ - Read edge detection events for lines from a request. + +Synopsis +======== + +``int read(int req_fd, void *buf, size_t count)`` + +Arguments +========= + +``req_fd`` + The file descriptor of the GPIO character device, as returned in the + :c:type:`request.fd<gpio_v2_line_request>` by gpio-v2-get-line-ioctl.rst. + +``buf`` + The buffer to contain the :c:type:`events<gpio_v2_line_event>`. + +``count`` + The number of bytes available in ``buf``, which must be at + least the size of a :c:type:`gpio_v2_line_event`. + +Description +=========== + +Read edge detection events for lines from a request. + +Edge detection must be enabled for the input line using either +``GPIO_V2_LINE_FLAG_EDGE_RISING`` or ``GPIO_V2_LINE_FLAG_EDGE_FALLING``, or +both. Edge events are then generated whenever edge interrupts are detected on +the input line. + +The kernel captures and timestamps edge events as close as possible to their +occurrence and stores them in a buffer from where they can be read by +userspace at its convenience using `read()`. + +Events read from the buffer are always in the same order that they were +detected by the kernel, including when multiple lines are being monitored by +the one request. + +The size of the kernel event buffer is fixed at the time of line request +creation, and can be influenced by the +:c:type:`request.event_buffer_size<gpio_v2_line_request>`. +The default size is 16 times the number of lines requested. + +The buffer may overflow if bursts of events occur quicker than they are read +by userspace. If an overflow occurs then the oldest buffered event is +discarded. Overflow can be detected from userspace by monitoring the event +sequence numbers. + +To minimize the number of calls required to copy events from the kernel to +userspace, `read()` supports copying multiple events. The number of events +copied is the lower of the number available in the kernel buffer and the +number that will fit in the userspace buffer (``buf``). + +Changing the edge detection flags using gpio-v2-line-set-config-ioctl.rst +does not remove or modify the events already contained in the kernel event +buffer. + +The `read()` will block if no event is available and the ``req_fd`` has not +been set **O_NONBLOCK**. + +The presence of an event can be tested for by checking that the ``req_fd`` is +readable using `poll()` or an equivalent. + +Return Value +============ + +On success the number of bytes read, which will be a multiple of the size of a +:c:type:`gpio_v2_line_event` event. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-v2-line-get-values-ioctl.rst b/Documentation/userspace-api/gpio/gpio-v2-line-get-values-ioctl.rst new file mode 100644 index 000000000000..e4e74a1926d8 --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-v2-line-get-values-ioctl.rst @@ -0,0 +1,51 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_V2_LINE_GET_VALUES_IOCTL: + +***************************** +GPIO_V2_LINE_GET_VALUES_IOCTL +***************************** + +Name +==== + +GPIO_V2_LINE_GET_VALUES_IOCTL - Get the values of requested lines. + +Synopsis +======== + +.. c:macro:: GPIO_V2_LINE_GET_VALUES_IOCTL + +``int ioctl(int req_fd, GPIO_V2_LINE_GET_VALUES_IOCTL, struct gpio_v2_line_values *values)`` + +Arguments +========= + +``req_fd`` + The file descriptor of the GPIO character device, as returned in the + :c:type:`request.fd<gpio_v2_line_request>` by gpio-v2-get-line-ioctl.rst. + +``values`` + The :c:type:`line_values<gpio_v2_line_values>` to get with the ``mask`` set + to indicate the subset of requested lines to get. + +Description +=========== + +Get the values of requested lines. + +The values of both input and output lines may be read. + +For output lines, the value returned is driver and configuration dependent and +may be either the output buffer (the last requested value set) or the input +buffer (the actual level of the line), and depending on the hardware and +configuration these may differ. + +Return Value +============ + +On success 0 and the corresponding :c:type:`values.bits<gpio_v2_line_values>` +contain the value read. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-v2-line-set-config-ioctl.rst b/Documentation/userspace-api/gpio/gpio-v2-line-set-config-ioctl.rst new file mode 100644 index 000000000000..9b942a8a53ca --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-v2-line-set-config-ioctl.rst @@ -0,0 +1,58 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_V2_LINE_SET_CONFIG_IOCTL: + +***************************** +GPIO_V2_LINE_SET_CONFIG_IOCTL +***************************** + +Name +==== + +GPIO_V2_LINE_SET_CONFIG_IOCTL - Update the configuration of previously requested lines. + +Synopsis +======== + +.. c:macro:: GPIO_V2_LINE_SET_CONFIG_IOCTL + +``int ioctl(int req_fd, GPIO_V2_LINE_SET_CONFIG_IOCTL, struct gpio_v2_line_config *config)`` + +Arguments +========= + +``req_fd`` + The file descriptor of the GPIO character device, as returned in the + :c:type:`request.fd<gpio_v2_line_request>` by gpio-v2-get-line-ioctl.rst. + +``config`` + The new :c:type:`configuration<gpio_v2_line_config>` to apply to the + requested lines. + +Description +=========== + +Update the configuration of previously requested lines, without releasing the +line or introducing potential glitches. + +The new configuration must specify the configuration of all requested lines. + +The same :ref:`gpio-v2-get-line-config-rules` and +:ref:`gpio-v2-get-line-config-support` that apply when requesting the lines +also apply when updating the line configuration. + +The motivating use case for this command is changing direction of +bi-directional lines between input and output, but it may also be used to +dynamically control edge detection, or more generally move lines seamlessly +from one configuration state to another. + +To only change the value of output lines, use +gpio-v2-line-set-values-ioctl.rst. + +Return Value +============ + +On success 0. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-v2-line-set-values-ioctl.rst b/Documentation/userspace-api/gpio/gpio-v2-line-set-values-ioctl.rst new file mode 100644 index 000000000000..6d2d1886950b --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-v2-line-set-values-ioctl.rst @@ -0,0 +1,47 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_V2_LINE_SET_VALUES_IOCTL: + +***************************** +GPIO_V2_LINE_SET_VALUES_IOCTL +***************************** + +Name +==== + +GPIO_V2_LINE_SET_VALUES_IOCTL - Set the values of requested output lines. + +Synopsis +======== + +.. c:macro:: GPIO_V2_LINE_SET_VALUES_IOCTL + +``int ioctl(int req_fd, GPIO_V2_LINE_SET_VALUES_IOCTL, struct gpio_v2_line_values *values)`` + +Arguments +========= + +``req_fd`` + The file descriptor of the GPIO character device, as returned in the + :c:type:`request.fd<gpio_v2_line_request>` by gpio-v2-get-line-ioctl.rst. + +``values`` + The :c:type:`line_values<gpio_v2_line_values>` to set with the ``mask`` set + to indicate the subset of requested lines to set and ``bits`` set to + indicate the new value. + +Description +=========== + +Set the values of requested output lines. + +Only the values of output lines may be set. +Attempting to set the value of an input line is an error (**EPERM**). + +Return Value +============ + +On success 0. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/gpio-v2-lineinfo-changed-read.rst b/Documentation/userspace-api/gpio/gpio-v2-lineinfo-changed-read.rst new file mode 100644 index 000000000000..24ad325e7253 --- /dev/null +++ b/Documentation/userspace-api/gpio/gpio-v2-lineinfo-changed-read.rst @@ -0,0 +1,81 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _GPIO_V2_LINEINFO_CHANGED_READ: + +***************************** +GPIO_V2_LINEINFO_CHANGED_READ +***************************** + +Name +==== + +GPIO_V2_LINEINFO_CHANGED_READ - Read line info changed events for watched +lines from the chip. + +Synopsis +======== + +``int read(int chip_fd, void *buf, size_t count)`` + +Arguments +========= + +``chip_fd`` + The file descriptor of the GPIO character device returned by `open()`. + +``buf`` + The buffer to contain the :c:type:`events<gpio_v2_line_info_changed>`. + +``count`` + The number of bytes available in ``buf``, which must be at least the size + of a :c:type:`gpio_v2_line_info_changed` event. + +Description +=========== + +Read line info changed events for watched lines from the chip. + +.. note:: + Monitoring line info changes is not generally required, and would typically + only be performed by a system monitoring component. + + These events relate to changes in a line's request state or configuration, + not its value. Use gpio-v2-line-event-read.rst to receive events when a + line changes value. + +A line must be watched using gpio-v2-get-lineinfo-watch-ioctl.rst to generate +info changed events. Subsequently, a request, release, or reconfiguration +of the line will generate an info changed event. + +The kernel timestamps events when they occur and stores them in a buffer +from where they can be read by userspace at its convenience using `read()`. + +The size of the kernel event buffer is fixed at 32 events per ``chip_fd``. + +The buffer may overflow if bursts of events occur quicker than they are read +by userspace. If an overflow occurs then the most recent event is discarded. +Overflow cannot be detected from userspace. + +Events read from the buffer are always in the same order that they were +detected by the kernel, including when multiple lines are being monitored by +the one ``chip_fd``. + +To minimize the number of calls required to copy events from the kernel to +userspace, `read()` supports copying multiple events. The number of events +copied is the lower of the number available in the kernel buffer and the +number that will fit in the userspace buffer (``buf``). + +A `read()` will block if no event is available and the ``chip_fd`` has not +been set **O_NONBLOCK**. + +The presence of an event can be tested for by checking that the ``chip_fd`` is +readable using `poll()` or an equivalent. + +Return Value +============ + +On success the number of bytes read, which will be a multiple of the size +of a :c:type:`gpio_v2_line_info_changed` event. + +On error -1 and the ``errno`` variable is set appropriately. +Common error codes are described in error-codes.rst. diff --git a/Documentation/userspace-api/gpio/index.rst b/Documentation/userspace-api/gpio/index.rst new file mode 100644 index 000000000000..f258de4ef370 --- /dev/null +++ b/Documentation/userspace-api/gpio/index.rst @@ -0,0 +1,18 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==== +GPIO +==== + +.. toctree:: + :maxdepth: 1 + + Character Device Userspace API <chardev> + Obsolete Userspace APIs <obsolete> + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/userspace-api/gpio/obsolete.rst b/Documentation/userspace-api/gpio/obsolete.rst new file mode 100644 index 000000000000..c42538b44ec8 --- /dev/null +++ b/Documentation/userspace-api/gpio/obsolete.rst @@ -0,0 +1,11 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================ +Obsolete GPIO Userspace APIs +============================ + +.. toctree:: + :maxdepth: 1 + + Character Device Userspace API (v1) <chardev_v1> + Sysfs Interface <sysfs> diff --git a/Documentation/admin-guide/gpio/sysfs.rst b/Documentation/userspace-api/gpio/sysfs.rst index 35171d15f78d..116921048b18 100644 --- a/Documentation/admin-guide/gpio/sysfs.rst +++ b/Documentation/userspace-api/gpio/sysfs.rst @@ -2,18 +2,18 @@ GPIO Sysfs Interface for Userspace ================================== .. warning:: + This API is obsoleted by the chardev.rst and the ABI documentation has + been moved to Documentation/ABI/obsolete/sysfs-gpio. - THIS ABI IS DEPRECATED, THE ABI DOCUMENTATION HAS BEEN MOVED TO - Documentation/ABI/obsolete/sysfs-gpio AND NEW USERSPACE CONSUMERS - ARE SUPPOSED TO USE THE CHARACTER DEVICE ABI. THIS OLD SYSFS ABI WILL - NOT BE DEVELOPED (NO NEW FEATURES), IT WILL JUST BE MAINTAINED. + New developments should use the chardev.rst, and existing developments are + encouraged to migrate as soon as possible, as this API will be removed + in the future. -Refer to the examples in tools/gpio/* for an introduction to the new -character device ABI. Also see the userspace header in -include/uapi/linux/gpio.h + This interface will continue to be maintained for the migration period, + but new features will only be added to the new API. -The deprecated sysfs ABI ------------------------- +The obsolete sysfs ABI +---------------------- Platforms which use the "gpiolib" implementors framework may choose to configure a sysfs user interface to GPIOs. This is different from the debugfs interface, since it provides control over GPIO direction and @@ -33,9 +33,12 @@ userspace GPIO can be used to determine system configuration data that standard kernels won't know about. And for some tasks, simple userspace GPIO drivers could be all that the system really needs. -DO NOT ABUSE SYSFS TO CONTROL HARDWARE THAT HAS PROPER KERNEL DRIVERS. -PLEASE READ THE DOCUMENT AT Documentation/driver-api/gpio/drivers-on-gpio.rst -TO AVOID REINVENTING KERNEL WHEELS IN USERSPACE. I MEAN IT. REALLY. +.. note:: + Do NOT abuse sysfs to control hardware that has proper kernel drivers. + Please read Documentation/driver-api/gpio/drivers-on-gpio.rst + to avoid reinventing kernel wheels in userspace. + + I MEAN IT. REALLY. Paths in Sysfs -------------- @@ -84,9 +87,9 @@ and have the following read/write attributes: allow userspace to reconfigure this GPIO's direction. "value" ... - reads as either 0 (low) or 1 (high). If the GPIO + reads as either 0 (inactive) or 1 (active). If the GPIO is configured as an output, this value may be written; - any nonzero value is treated as high. + any nonzero value is treated as active. If the pin can be configured as interrupt-generating interrupt and if it has been configured to generate interrupts (see the diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index 09f61bd2ac2e..afecfe3cc4a8 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -9,31 +9,59 @@ While much of the kernel's user-space API is documented elsewhere also be found in the kernel tree itself. This manual is intended to be the place where this information is gathered. + +System calls +============ + +.. toctree:: + :maxdepth: 1 + + unshare + futex2 + ebpf/index + ioctl/index + +Security-related interfaces +=========================== + .. toctree:: - :caption: Table of contents - :maxdepth: 2 + :maxdepth: 1 no_new_privs seccomp_filter landlock - unshare + lsm spec_ctrl + tee + +Devices and I/O +=============== + +.. toctree:: + :maxdepth: 1 + accelerators/ocxl dma-buf-alloc-exchange - ebpf/index - ELF - ioctl/index + gpio/index iommu iommufd media/index + dcdbas + vduse + isapnp + +Everything else +=============== + +.. toctree:: + :maxdepth: 1 + + ELF netlink/index sysfs-platform_profile vduse futex2 - lsm - tee - isapnp - dcdbas + perf_ring_buffer .. only:: subproject and html diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index 457e16f06e04..c472423412bf 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -82,8 +82,9 @@ Code Seq# Include File Comments 0x10 00-0F drivers/char/s390/vmcp.h 0x10 10-1F arch/s390/include/uapi/sclp_ctl.h 0x10 20-2F arch/s390/include/uapi/asm/hypfs.h -0x12 all linux/fs.h +0x12 all linux/fs.h BLK* ioctls linux/blkpg.h +0x15 all linux/fs.h FS_IOC_* ioctls 0x1b all InfiniBand Subsystem <http://infiniband.sourceforge.net/> 0x20 all drivers/cdrom/cm206.h @@ -309,6 +310,7 @@ Code Seq# Include File Comments 0x89 0B-DF linux/sockios.h 0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range 0x89 F0-FF linux/sockios.h SIOCDEVPRIVATE range +0x8A 00-1F linux/eventpoll.h 0x8B all linux/wireless.h 0x8C 00-3F WiNRADiO driver <http://www.winradio.com.au/> diff --git a/Documentation/userspace-api/netlink/netlink-raw.rst b/Documentation/userspace-api/netlink/netlink-raw.rst index 1e14f5f22b8e..1990eea772d0 100644 --- a/Documentation/userspace-api/netlink/netlink-raw.rst +++ b/Documentation/userspace-api/netlink/netlink-raw.rst @@ -150,3 +150,45 @@ attributes from an ``attribute-set``. For example the following Note that a selector attribute must appear in a netlink message before any sub-message attributes that depend on it. + +If an attribute such as ``kind`` is defined at more than one nest level, then a +sub-message selector will be resolved using the value 'closest' to the selector. +For example, if the same attribute name is defined in a nested ``attribute-set`` +alongside a sub-message selector and also in a top level ``attribute-set``, then +the selector will be resolved using the value 'closest' to the selector. If the +value is not present in the message at the same level as defined in the spec +then this is an error. + +Nested struct definitions +------------------------- + +Many raw netlink families such as :doc:`tc<../../networking/netlink_spec/tc>` +make use of nested struct definitions. The ``netlink-raw`` schema makes it +possible to embed a struct within a struct definition using the ``struct`` +property. For example, the following struct definition embeds the +``tc-ratespec`` struct definition for both the ``rate`` and the ``peakrate`` +members of ``struct tc-tbf-qopt``. + +.. code-block:: yaml + + - + name: tc-tbf-qopt + type: struct + members: + - + name: rate + type: binary + struct: tc-ratespec + - + name: peakrate + type: binary + struct: tc-ratespec + - + name: limit + type: u32 + - + name: buffer + type: u32 + - + name: mtu + type: u32 diff --git a/Documentation/userspace-api/perf_ring_buffer.rst b/Documentation/userspace-api/perf_ring_buffer.rst new file mode 100644 index 000000000000..bde9d8cbc106 --- /dev/null +++ b/Documentation/userspace-api/perf_ring_buffer.rst @@ -0,0 +1,830 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================ +Perf ring buffer +================ + +.. CONTENTS + + 1. Introduction + + 2. Ring buffer implementation + 2.1 Basic algorithm + 2.2 Ring buffer for different tracing modes + 2.2.1 Default mode + 2.2.2 Per-thread mode + 2.2.3 Per-CPU mode + 2.2.4 System wide mode + 2.3 Accessing buffer + 2.3.1 Producer-consumer model + 2.3.2 Properties of the ring buffers + 2.3.3 Writing samples into buffer + 2.3.4 Reading samples from buffer + 2.3.5 Memory synchronization + + 3. The mechanism of AUX ring buffer + 3.1 The relationship between AUX and regular ring buffers + 3.2 AUX events + 3.3 Snapshot mode + + +1. Introduction +=============== + +The ring buffer is a fundamental mechanism for data transfer. perf uses +ring buffers to transfer event data from kernel to user space, another +kind of ring buffer which is so called auxiliary (AUX) ring buffer also +plays an important role for hardware tracing with Intel PT, Arm +CoreSight, etc. + +The ring buffer implementation is critical but it's also a very +challenging work. On the one hand, the kernel and perf tool in the user +space use the ring buffer to exchange data and stores data into data +file, thus the ring buffer needs to transfer data with high throughput; +on the other hand, the ring buffer management should avoid significant +overload to distract profiling results. + +This documentation dives into the details for perf ring buffer with two +parts: firstly it explains the perf ring buffer implementation, then the +second part discusses the AUX ring buffer mechanism. + +2. Ring buffer implementation +============================= + +2.1 Basic algorithm +------------------- + +That said, a typical ring buffer is managed by a head pointer and a tail +pointer; the head pointer is manipulated by a writer and the tail +pointer is updated by a reader respectively. + +:: + + +---------------------------+ + | | |***|***|***| | | + +---------------------------+ + `-> Tail `-> Head + + * : the data is filled by the writer. + + Figure 1. Ring buffer + +Perf uses the same way to manage its ring buffer. In the implementation +there are two key data structures held together in a set of consecutive +pages, the control structure and then the ring buffer itself. The page +with the control structure in is known as the "user page". Being held +in continuous virtual addresses simplifies locating the ring buffer +address, it is in the pages after the page with the user page. + +The control structure is named as ``perf_event_mmap_page``, it contains a +head pointer ``data_head`` and a tail pointer ``data_tail``. When the +kernel starts to fill records into the ring buffer, it updates the head +pointer to reserve the memory so later it can safely store events into +the buffer. On the other side, when the user page is a writable mapping, +the perf tool has the permission to update the tail pointer after consuming +data from the ring buffer. Yet another case is for the user page's +read-only mapping, which is to be addressed in the section +:ref:`writing_samples_into_buffer`. + +:: + + user page ring buffer + +---------+---------+ +---------------------------------------+ + |data_head|data_tail|...| | |***|***|***|***|***| | | | + +---------+---------+ +---------------------------------------+ + ` `----------------^ ^ + `----------------------------------------------| + + * : the data is filled by the writer. + + Figure 2. Perf ring buffer + +When using the ``perf record`` tool, we can specify the ring buffer size +with option ``-m`` or ``--mmap-pages=``, the given size will be rounded up +to a power of two that is a multiple of a page size. Though the kernel +allocates at once for all memory pages, it's deferred to map the pages +to VMA area until the perf tool accesses the buffer from the user space. +In other words, at the first time accesses the buffer's page from user +space in the perf tool, a data abort exception for page fault is taken +and the kernel uses this occasion to map the page into process VMA +(see ``perf_mmap_fault()``), thus the perf tool can continue to access +the page after returning from the exception. + +2.2 Ring buffer for different tracing modes +------------------------------------------- + +The perf profiles programs with different modes: default mode, per thread +mode, per cpu mode, and system wide mode. This section describes these +modes and how the ring buffer meets requirements for them. At last we +will review the race conditions caused by these modes. + +2.2.1 Default mode +^^^^^^^^^^^^^^^^^^ + +Usually we execute ``perf record`` command followed by a profiling program +name, like below command:: + + perf record test_program + +This command doesn't specify any options for CPU and thread modes, the +perf tool applies the default mode on the perf event. It maps all the +CPUs in the system and the profiled program's PID on the perf event, and +it enables inheritance mode on the event so that child tasks inherits +the events. As a result, the perf event is attributed as:: + + evsel::cpus::map[] = { 0 .. _SC_NPROCESSORS_ONLN-1 } + evsel::threads::map[] = { pid } + evsel::attr::inherit = 1 + +These attributions finally will be reflected on the deployment of ring +buffers. As shown below, the perf tool allocates individual ring buffer +for each CPU, but it only enables events for the profiled program rather +than for all threads in the system. The *T1* thread represents the +thread context of the 'test_program', whereas *T2* and *T3* are irrelevant +threads in the system. The perf samples are exclusively collected for +the *T1* thread and stored in the ring buffer associated with the CPU on +which the *T1* thread is running. + +:: + + T1 T2 T1 + +----+ +-----------+ +----+ + CPU0 |xxxx| |xxxxxxxxxxx| |xxxx| + +----+--------------+-----------+----------+----+--------> + | | + v v + +-----------------------------------------------------+ + | Ring buffer 0 | + +-----------------------------------------------------+ + + T1 + +-----+ + CPU1 |xxxxx| + -----+-----+---------------------------------------------> + | + v + +-----------------------------------------------------+ + | Ring buffer 1 | + +-----------------------------------------------------+ + + T1 T3 + +----+ +-------+ + CPU2 |xxxx| |xxxxxxx| + --------------------------+----+--------+-------+--------> + | + v + +-----------------------------------------------------+ + | Ring buffer 2 | + +-----------------------------------------------------+ + + T1 + +--------------+ + CPU3 |xxxxxxxxxxxxxx| + -----------+--------------+------------------------------> + | + v + +-----------------------------------------------------+ + | Ring buffer 3 | + +-----------------------------------------------------+ + + T1: Thread 1; T2: Thread 2; T3: Thread 3 + x: Thread is in running state + + Figure 3. Ring buffer for default mode + +2.2.2 Per-thread mode +^^^^^^^^^^^^^^^^^^^^^ + +By specifying option ``--per-thread`` in perf command, e.g. + +:: + + perf record --per-thread test_program + +The perf event doesn't map to any CPUs and is only bound to the +profiled process, thus, the perf event's attributions are:: + + evsel::cpus::map[0] = { -1 } + evsel::threads::map[] = { pid } + evsel::attr::inherit = 0 + +In this mode, a single ring buffer is allocated for the profiled thread; +if the thread is scheduled on a CPU, the events on that CPU will be +enabled; and if the thread is scheduled out from the CPU, the events on +the CPU will be disabled. When the thread is migrated from one CPU to +another, the events are to be disabled on the previous CPU and enabled +on the next CPU correspondingly. + +:: + + T1 T2 T1 + +----+ +-----------+ +----+ + CPU0 |xxxx| |xxxxxxxxxxx| |xxxx| + +----+--------------+-----------+----------+----+--------> + | | + | T1 | + | +-----+ | + CPU1 | |xxxxx| | + --|--+-----+----------------------------------|----------> + | | | + | | T1 T3 | + | | +----+ +---+ | + CPU2 | | |xxxx| |xxx| | + --|-----|-----------------+----+--------+---+-|----------> + | | | | + | | T1 | | + | | +--------------+ | | + CPU3 | | |xxxxxxxxxxxxxx| | | + --|-----|--+--------------+-|-----------------|----------> + | | | | | + v v v v v + +-----------------------------------------------------+ + | Ring buffer | + +-----------------------------------------------------+ + + T1: Thread 1 + x: Thread is in running state + + Figure 4. Ring buffer for per-thread mode + +When perf runs in per-thread mode, a ring buffer is allocated for the +profiled thread *T1*. The ring buffer is dedicated for thread *T1*, if the +thread *T1* is running, the perf events will be recorded into the ring +buffer; when the thread is sleeping, all associated events will be +disabled, thus no trace data will be recorded into the ring buffer. + +2.2.3 Per-CPU mode +^^^^^^^^^^^^^^^^^^ + +The option ``-C`` is used to collect samples on the list of CPUs, for +example the below perf command receives option ``-C 0,2``:: + + perf record -C 0,2 test_program + +It maps the perf event to CPUs 0 and 2, and the event is not associated to any +PID. Thus the perf event attributions are set as:: + + evsel::cpus::map[0] = { 0, 2 } + evsel::threads::map[] = { -1 } + evsel::attr::inherit = 0 + +This results in the session of ``perf record`` will sample all threads on CPU0 +and CPU2, and be terminated until test_program exits. Even there have tasks +running on CPU1 and CPU3, since the ring buffer is absent for them, any +activities on these two CPUs will be ignored. A usage case is to combine the +options for per-thread mode and per-CPU mode, e.g. the options ``–C 0,2`` and +``––per–thread`` are specified together, the samples are recorded only when +the profiled thread is scheduled on any of the listed CPUs. + +:: + + T1 T2 T1 + +----+ +-----------+ +----+ + CPU0 |xxxx| |xxxxxxxxxxx| |xxxx| + +----+--------------+-----------+----------+----+--------> + | | | + v v v + +-----------------------------------------------------+ + | Ring buffer 0 | + +-----------------------------------------------------+ + + T1 + +-----+ + CPU1 |xxxxx| + -----+-----+---------------------------------------------> + + T1 T3 + +----+ +-------+ + CPU2 |xxxx| |xxxxxxx| + --------------------------+----+--------+-------+--------> + | | + v v + +-----------------------------------------------------+ + | Ring buffer 1 | + +-----------------------------------------------------+ + + T1 + +--------------+ + CPU3 |xxxxxxxxxxxxxx| + -----------+--------------+------------------------------> + + T1: Thread 1; T2: Thread 2; T3: Thread 3 + x: Thread is in running state + + Figure 5. Ring buffer for per-CPU mode + +2.2.4 System wide mode +^^^^^^^^^^^^^^^^^^^^^^ + +By using option ``–a`` or ``––all–cpus``, perf collects samples on all CPUs +for all tasks, we call it as the system wide mode, the command is:: + + perf record -a test_program + +Similar to the per-CPU mode, the perf event doesn't bind to any PID, and +it maps to all CPUs in the system:: + + evsel::cpus::map[] = { 0 .. _SC_NPROCESSORS_ONLN-1 } + evsel::threads::map[] = { -1 } + evsel::attr::inherit = 0 + +In the system wide mode, every CPU has its own ring buffer, all threads +are monitored during the running state and the samples are recorded into +the ring buffer belonging to the CPU which the events occurred on. + +:: + + T1 T2 T1 + +----+ +-----------+ +----+ + CPU0 |xxxx| |xxxxxxxxxxx| |xxxx| + +----+--------------+-----------+----------+----+--------> + | | | + v v v + +-----------------------------------------------------+ + | Ring buffer 0 | + +-----------------------------------------------------+ + + T1 + +-----+ + CPU1 |xxxxx| + -----+-----+---------------------------------------------> + | + v + +-----------------------------------------------------+ + | Ring buffer 1 | + +-----------------------------------------------------+ + + T1 T3 + +----+ +-------+ + CPU2 |xxxx| |xxxxxxx| + --------------------------+----+--------+-------+--------> + | | + v v + +-----------------------------------------------------+ + | Ring buffer 2 | + +-----------------------------------------------------+ + + T1 + +--------------+ + CPU3 |xxxxxxxxxxxxxx| + -----------+--------------+------------------------------> + | + v + +-----------------------------------------------------+ + | Ring buffer 3 | + +-----------------------------------------------------+ + + T1: Thread 1; T2: Thread 2; T3: Thread 3 + x: Thread is in running state + + Figure 6. Ring buffer for system wide mode + +2.3 Accessing buffer +-------------------- + +Based on the understanding of how the ring buffer is allocated in +various modes, this section explains access the ring buffer. + +2.3.1 Producer-consumer model +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In the Linux kernel, the PMU events can produce samples which are stored +into the ring buffer; the perf command in user space consumes the +samples by reading out data from the ring buffer and finally saves the +data into the file for post analysis. It’s a typical producer-consumer +model for using the ring buffer. + +The perf process polls on the PMU events and sleeps when no events are +incoming. To prevent frequent exchanges between the kernel and user +space, the kernel event core layer introduces a watermark, which is +stored in the ``perf_buffer::watermark``. When a sample is recorded into +the ring buffer, and if the used buffer exceeds the watermark, the +kernel wakes up the perf process to read samples from the ring buffer. + +:: + + Perf + / | Read samples + Polling / `--------------| Ring buffer + v v ;---------------------v + +----------------+ +---------+---------+ +-------------------+ + |Event wait queue| |data_head|data_tail| |***|***| | |***| + +----------------+ +---------+---------+ +-------------------+ + ^ ^ `------------------------^ + | Wake up tasks | Store samples + +-----------------------------+ + | Kernel event core layer | + +-----------------------------+ + + * : the data is filled by the writer. + + Figure 7. Writing and reading the ring buffer + +When the kernel event core layer notifies the user space, because +multiple events might share the same ring buffer for recording samples, +the core layer iterates every event associated with the ring buffer and +wakes up tasks waiting on the event. This is fulfilled by the kernel +function ``ring_buffer_wakeup()``. + +After the perf process is woken up, it starts to check the ring buffers +one by one, if it finds any ring buffer containing samples it will read +out the samples for statistics or saving into the data file. Given the +perf process is able to run on any CPU, this leads to the ring buffer +potentially being accessed from multiple CPUs simultaneously, which +causes race conditions. The race condition handling is described in the +section :ref:`memory_synchronization`. + +2.3.2 Properties of the ring buffers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Linux kernel supports two write directions for the ring buffer: forward and +backward. The forward writing saves samples from the beginning of the ring +buffer, the backward writing stores data from the end of the ring buffer with +the reversed direction. The perf tool determines the writing direction. + +Additionally, the tool can map buffers in either read-write mode or read-only +mode to the user space. + +The ring buffer in the read-write mode is mapped with the property +``PROT_READ | PROT_WRITE``. With the write permission, the perf tool +updates the ``data_tail`` to indicate the data start position. Combining +with the head pointer ``data_head``, which works as the end position of +the current data, the perf tool can easily know where read out the data +from. + +Alternatively, in the read-only mode, only the kernel keeps to update +the ``data_head`` while the user space cannot access the ``data_tail`` due +to the mapping property ``PROT_READ``. + +As a result, the matrix below illustrates the various combinations of +direction and mapping characteristics. The perf tool employs two of these +combinations to support buffer types: the non-overwrite buffer and the +overwritable buffer. + +.. list-table:: + :widths: 1 1 1 + :header-rows: 1 + + * - Mapping mode + - Forward + - Backward + * - read-write + - Non-overwrite ring buffer + - Not used + * - read-only + - Not used + - Overwritable ring buffer + +The non-overwrite ring buffer uses the read-write mapping with forward +writing. It starts to save data from the beginning of the ring buffer +and wrap around when overflow, which is used with the read-write mode in +the normal ring buffer. When the consumer doesn't keep up with the +producer, it would lose some data, the kernel keeps how many records it +lost and generates the ``PERF_RECORD_LOST`` records in the next time +when it finds a space in the ring buffer. + +The overwritable ring buffer uses the backward writing with the +read-only mode. It saves the data from the end of the ring buffer and +the ``data_head`` keeps the position of current data, the perf always +knows where it starts to read and until the end of the ring buffer, thus +it don't need the ``data_tail``. In this mode, it will not generate the +``PERF_RECORD_LOST`` records. + +.. _writing_samples_into_buffer: + +2.3.3 Writing samples into buffer +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When a sample is taken and saved into the ring buffer, the kernel +prepares sample fields based on the sample type; then it prepares the +info for writing ring buffer which is stored in the structure +``perf_output_handle``. In the end, the kernel outputs the sample into +the ring buffer and updates the head pointer in the user page so the +perf tool can see the latest value. + +The structure ``perf_output_handle`` serves as a temporary context for +tracking the information related to the buffer. The advantages of it is +that it enables concurrent writing to the buffer by different events. +For example, a software event and a hardware PMU event both are enabled +for profiling, two instances of ``perf_output_handle`` serve as separate +contexts for the software event and the hardware event respectively. +This allows each event to reserve its own memory space for populating +the record data. + +2.3.4 Reading samples from buffer +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In the user space, the perf tool utilizes the ``perf_event_mmap_page`` +structure to handle the head and tail of the buffer. It also uses +``perf_mmap`` structure to keep track of a context for the ring buffer, this +context includes information about the buffer's starting and ending +addresses. Additionally, the mask value can be utilized to compute the +circular buffer pointer even for an overflow. + +Similar to the kernel, the perf tool in the user space first reads out +the recorded data from the ring buffer, and then updates the buffer's +tail pointer ``perf_event_mmap_page::data_tail``. + +.. _memory_synchronization: + +2.3.5 Memory synchronization +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The modern CPUs with relaxed memory model cannot promise the memory +ordering, this means it’s possible to access the ring buffer and the +``perf_event_mmap_page`` structure out of order. To assure the specific +sequence for memory accessing perf ring buffer, memory barriers are +used to assure the data dependency. The rationale for the memory +synchronization is as below:: + + Kernel User space + + if (LOAD ->data_tail) { LOAD ->data_head + (A) smp_rmb() (C) + STORE $data LOAD $data + smp_wmb() (B) smp_mb() (D) + STORE ->data_head STORE ->data_tail + } + +The comments in tools/include/linux/ring_buffer.h gives nice description +for why and how to use memory barriers, here we will just provide an +alternative explanation: + +(A) is a control dependency so that CPU assures order between checking +pointer ``perf_event_mmap_page::data_tail`` and filling sample into ring +buffer; + +(D) pairs with (A). (D) separates the ring buffer data reading from +writing the pointer ``data_tail``, perf tool first consumes samples and then +tells the kernel that the data chunk has been released. Since a reading +operation is followed by a writing operation, thus (D) is a full memory +barrier. + +(B) is a writing barrier in the middle of two writing operations, which +makes sure that recording a sample must be prior to updating the head +pointer. + +(C) pairs with (B). (C) is a read memory barrier to ensure the head +pointer is fetched before reading samples. + +To implement the above algorithm, the ``perf_output_put_handle()`` function +in the kernel and two helpers ``ring_buffer_read_head()`` and +``ring_buffer_write_tail()`` in the user space are introduced, they rely +on memory barriers as described above to ensure the data dependency. + +Some architectures support one-way permeable barrier with load-acquire +and store-release operations, these barriers are more relaxed with less +performance penalty, so (C) and (D) can be optimized to use barriers +``smp_load_acquire()`` and ``smp_store_release()`` respectively. + +If an architecture doesn’t support load-acquire and store-release in its +memory model, it will roll back to the old fashion of memory barrier +operations. In this case, ``smp_load_acquire()`` encapsulates +``READ_ONCE()`` + ``smp_mb()``, since ``smp_mb()`` is costly, +``ring_buffer_read_head()`` doesn't invoke ``smp_load_acquire()`` and it uses +the barriers ``READ_ONCE()`` + ``smp_rmb()`` instead. + +3. The mechanism of AUX ring buffer +=================================== + +In this chapter, we will explain the implementation of the AUX ring +buffer. In the first part it will discuss the connection between the +AUX ring buffer and the regular ring buffer, then the second part will +examine how the AUX ring buffer co-works with the regular ring buffer, +as well as the additional features introduced by the AUX ring buffer for +the sampling mechanism. + +3.1 The relationship between AUX and regular ring buffers +--------------------------------------------------------- + +Generally, the AUX ring buffer is an auxiliary for the regular ring +buffer. The regular ring buffer is primarily used to store the event +samples and every event format complies with the definition in the +union ``perf_event``; the AUX ring buffer is for recording the hardware +trace data and the trace data format is hardware IP dependent. + +The general use and advantage of the AUX ring buffer is that it is +written directly by hardware rather than by the kernel. For example, +regular profile samples that write to the regular ring buffer cause an +interrupt. Tracing execution requires a high number of samples and +using interrupts would be overwhelming for the regular ring buffer +mechanism. Having an AUX buffer allows for a region of memory more +decoupled from the kernel and written to directly by hardware tracing. + +The AUX ring buffer reuses the same algorithm with the regular ring +buffer for the buffer management. The control structure +``perf_event_mmap_page`` extends the new fields ``aux_head`` and ``aux_tail`` +for the head and tail pointers of the AUX ring buffer. + +During the initialisation phase, besides the mmap()-ed regular ring +buffer, the perf tool invokes a second syscall in the +``auxtrace_mmap__mmap()`` function for the mmap of the AUX buffer with +non-zero file offset; ``rb_alloc_aux()`` in the kernel allocates pages +correspondingly, these pages will be deferred to map into VMA when +handling the page fault, which is the same lazy mechanism with the +regular ring buffer. + +AUX events and AUX trace data are two different things. Let's see an +example:: + + perf record -a -e cycles -e cs_etm/@tmc_etr0/ -- sleep 2 + +The above command enables two events: one is the event *cycles* from PMU +and another is the AUX event *cs_etm* from Arm CoreSight, both are saved +into the regular ring buffer while the CoreSight's AUX trace data is +stored in the AUX ring buffer. + +As a result, we can see the regular ring buffer and the AUX ring buffer +are allocated in pairs. The perf in default mode allocates the regular +ring buffer and the AUX ring buffer per CPU-wise, which is the same as +the system wide mode, however, the default mode records samples only for +the profiled program, whereas the latter mode profiles for all programs +in the system. For per-thread mode, the perf tool allocates only one +regular ring buffer and one AUX ring buffer for the whole session. For +the per-CPU mode, the perf allocates two kinds of ring buffers for +selected CPUs specified by the option ``-C``. + +The below figure demonstrates the buffers' layout in the system wide +mode; if there are any activities on one CPU, the AUX event samples and +the hardware trace data will be recorded into the dedicated buffers for +the CPU. + +:: + + T1 T2 T1 + +----+ +-----------+ +----+ + CPU0 |xxxx| |xxxxxxxxxxx| |xxxx| + +----+--------------+-----------+----------+----+--------> + | | | + v v v + +-----------------------------------------------------+ + | Ring buffer 0 | + +-----------------------------------------------------+ + | | | + v v v + +-----------------------------------------------------+ + | AUX Ring buffer 0 | + +-----------------------------------------------------+ + + T1 + +-----+ + CPU1 |xxxxx| + -----+-----+---------------------------------------------> + | + v + +-----------------------------------------------------+ + | Ring buffer 1 | + +-----------------------------------------------------+ + | + v + +-----------------------------------------------------+ + | AUX Ring buffer 1 | + +-----------------------------------------------------+ + + T1 T3 + +----+ +-------+ + CPU2 |xxxx| |xxxxxxx| + --------------------------+----+--------+-------+--------> + | | + v v + +-----------------------------------------------------+ + | Ring buffer 2 | + +-----------------------------------------------------+ + | | + v v + +-----------------------------------------------------+ + | AUX Ring buffer 2 | + +-----------------------------------------------------+ + + T1 + +--------------+ + CPU3 |xxxxxxxxxxxxxx| + -----------+--------------+------------------------------> + | + v + +-----------------------------------------------------+ + | Ring buffer 3 | + +-----------------------------------------------------+ + | + v + +-----------------------------------------------------+ + | AUX Ring buffer 3 | + +-----------------------------------------------------+ + + T1: Thread 1; T2: Thread 2; T3: Thread 3 + x: Thread is in running state + + Figure 8. AUX ring buffer for system wide mode + +3.2 AUX events +-------------- + +Similar to ``perf_output_begin()`` and ``perf_output_end()``'s working for the +regular ring buffer, ``perf_aux_output_begin()`` and ``perf_aux_output_end()`` +serve for the AUX ring buffer for processing the hardware trace data. + +Once the hardware trace data is stored into the AUX ring buffer, the PMU +driver will stop hardware tracing by calling the ``pmu::stop()`` callback. +Similar to the regular ring buffer, the AUX ring buffer needs to apply +the memory synchronization mechanism as discussed in the section +:ref:`memory_synchronization`. Since the AUX ring buffer is managed by the +PMU driver, the barrier (B), which is a writing barrier to ensure the trace +data is externally visible prior to updating the head pointer, is asked +to be implemented in the PMU driver. + +Then ``pmu::stop()`` can safely call the ``perf_aux_output_end()`` function to +finish two things: + +- It fills an event ``PERF_RECORD_AUX`` into the regular ring buffer, this + event delivers the information of the start address and data size for a + chunk of hardware trace data has been stored into the AUX ring buffer; + +- Since the hardware trace driver has stored new trace data into the AUX + ring buffer, the argument *size* indicates how many bytes have been + consumed by the hardware tracing, thus ``perf_aux_output_end()`` updates the + header pointer ``perf_buffer::aux_head`` to reflect the latest buffer usage. + +At the end, the PMU driver will restart hardware tracing. During this +temporary suspending period, it will lose hardware trace data, which +will introduce a discontinuity during decoding phase. + +The event ``PERF_RECORD_AUX`` presents an AUX event which is handled in the +kernel, but it lacks the information for saving the AUX trace data in +the perf file. When the perf tool copies the trace data from AUX ring +buffer to the perf data file, it synthesizes a ``PERF_RECORD_AUXTRACE`` +event which is not a kernel ABI, it's defined by the perf tool to describe +which portion of data in the AUX ring buffer is saved. Afterwards, the perf +tool reads out the AUX trace data from the perf file based on the +``PERF_RECORD_AUXTRACE`` events, and the ``PERF_RECORD_AUX`` event is used to +decode a chunk of data by correlating with time order. + +3.3 Snapshot mode +----------------- + +Perf supports snapshot mode for AUX ring buffer, in this mode, users +only record AUX trace data at a specific time point which users are +interested in. E.g. below gives an example of how to take snapshots +with 1 second interval with Arm CoreSight:: + + perf record -e cs_etm/@tmc_etr0/u -S -a program & + PERFPID=$! + while true; do + kill -USR2 $PERFPID + sleep 1 + done + +The main flow for snapshot mode is: + +- Before a snapshot is taken, the AUX ring buffer acts in free run mode. + During free run mode the perf doesn't record any of the AUX events and + trace data; + +- Once the perf tool receives the *USR2* signal, it triggers the callback + function ``auxtrace_record::snapshot_start()`` to deactivate hardware + tracing. The kernel driver then populates the AUX ring buffer with the + hardware trace data, and the event ``PERF_RECORD_AUX`` is stored in the + regular ring buffer; + +- Then perf tool takes a snapshot, ``record__read_auxtrace_snapshot()`` + reads out the hardware trace data from the AUX ring buffer and saves it + into perf data file; + +- After the snapshot is finished, ``auxtrace_record::snapshot_finish()`` + restarts the PMU event for AUX tracing. + +The perf only accesses the head pointer ``perf_event_mmap_page::aux_head`` +in snapshot mode and doesn’t touch tail pointer ``aux_tail``, this is +because the AUX ring buffer can overflow in free run mode, the tail +pointer is useless in this case. Alternatively, the callback +``auxtrace_record::find_snapshot()`` is introduced for making the decision +of whether the AUX ring buffer has been wrapped around or not, at the +end it fixes up the AUX buffer's head which are used to calculate the +trace data size. + +As we know, the buffers' deployment can be per-thread mode, per-CPU +mode, or system wide mode, and the snapshot can be applied to any of +these modes. Below is an example of taking snapshot with system wide +mode. + +:: + + Snapshot is taken + | + v + +------------------------+ + | AUX Ring buffer 0 | <- aux_head + +------------------------+ + v + +--------------------------------+ + | AUX Ring buffer 1 | <- aux_head + +--------------------------------+ + v + +--------------------------------------------+ + | AUX Ring buffer 2 | <- aux_head + +--------------------------------------------+ + v + +---------------------------------------+ + | AUX Ring buffer 3 | <- aux_head + +---------------------------------------+ + + Figure 9. Snapshot with system wide mode diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst index 68b0d2363af8..e1eaf6a830ce 100644 --- a/Documentation/virt/coco/sev-guest.rst +++ b/Documentation/virt/coco/sev-guest.rst @@ -67,6 +67,23 @@ counter (e.g. counter overflow), then -EIO will be returned. }; }; +The host ioctls are issued to a file descriptor of the /dev/sev device. +The ioctl accepts the command ID/input structure documented below. + +:: + + struct sev_issue_cmd { + /* Command ID */ + __u32 cmd; + + /* Command request structure */ + __u64 data; + + /* Firmware error code on failure (see psp-sev.h) */ + __u32 error; + }; + + 2.1 SNP_GET_REPORT ------------------ @@ -124,6 +141,41 @@ be updated with the expected value. See GHCB specification for further detail on how to parse the certificate blob. +2.4 SNP_PLATFORM_STATUS +----------------------- +:Technology: sev-snp +:Type: hypervisor ioctl cmd +:Parameters (out): struct sev_user_data_snp_status +:Returns (out): 0 on success, -negative on error + +The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The +status includes API major, minor version and more. See the SEV-SNP +specification for further details. + +2.5 SNP_COMMIT +-------------- +:Technology: sev-snp +:Type: hypervisor ioctl cmd +:Returns (out): 0 on success, -negative on error + +SNP_COMMIT is used to commit the currently installed firmware using the +SEV-SNP firmware SNP_COMMIT command. This prevents roll-back to a previously +committed firmware version. This will also update the reported TCB to match +that of the currently installed firmware. + +2.6 SNP_SET_CONFIG +------------------ +:Technology: sev-snp +:Type: hypervisor ioctl cmd +:Parameters (in): struct sev_user_data_snp_config +:Returns (out): 0 on success, -negative on error + +SNP_SET_CONFIG is used to set the system-wide configuration such as +reported TCB version in the attestation report. The command is similar +to SNP_CONFIG command defined in the SEV-SNP spec. The current values of +the firmware parameters affected by this command can be queried via +SNP_PLATFORM_STATUS. + 3. SEV-SNP CPUID Enforcement ============================ diff --git a/Documentation/virt/hyperv/index.rst b/Documentation/virt/hyperv/index.rst index 4a7a1b738bbe..de447e11b4a5 100644 --- a/Documentation/virt/hyperv/index.rst +++ b/Documentation/virt/hyperv/index.rst @@ -10,3 +10,4 @@ Hyper-V Enlightenments overview vmbus clocks + vpci diff --git a/Documentation/virt/hyperv/vpci.rst b/Documentation/virt/hyperv/vpci.rst new file mode 100644 index 000000000000..b65b2126ede3 --- /dev/null +++ b/Documentation/virt/hyperv/vpci.rst @@ -0,0 +1,316 @@ +.. SPDX-License-Identifier: GPL-2.0 + +PCI pass-thru devices +========================= +In a Hyper-V guest VM, PCI pass-thru devices (also called +virtual PCI devices, or vPCI devices) are physical PCI devices +that are mapped directly into the VM's physical address space. +Guest device drivers can interact directly with the hardware +without intermediation by the host hypervisor. This approach +provides higher bandwidth access to the device with lower +latency, compared with devices that are virtualized by the +hypervisor. The device should appear to the guest just as it +would when running on bare metal, so no changes are required +to the Linux device drivers for the device. + +Hyper-V terminology for vPCI devices is "Discrete Device +Assignment" (DDA). Public documentation for Hyper-V DDA is +available here: `DDA`_ + +.. _DDA: https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment + +DDA is typically used for storage controllers, such as NVMe, +and for GPUs. A similar mechanism for NICs is called SR-IOV +and produces the same benefits by allowing a guest device +driver to interact directly with the hardware. See Hyper-V +public documentation here: `SR-IOV`_ + +.. _SR-IOV: https://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov- + +This discussion of vPCI devices includes DDA and SR-IOV +devices. + +Device Presentation +------------------- +Hyper-V provides full PCI functionality for a vPCI device when +it is operating, so the Linux device driver for the device can +be used unchanged, provided it uses the correct Linux kernel +APIs for accessing PCI config space and for other integration +with Linux. But the initial detection of the PCI device and +its integration with the Linux PCI subsystem must use Hyper-V +specific mechanisms. Consequently, vPCI devices on Hyper-V +have a dual identity. They are initially presented to Linux +guests as VMBus devices via the standard VMBus "offer" +mechanism, so they have a VMBus identity and appear under +/sys/bus/vmbus/devices. The VMBus vPCI driver in Linux at +drivers/pci/controller/pci-hyperv.c handles a newly introduced +vPCI device by fabricating a PCI bus topology and creating all +the normal PCI device data structures in Linux that would +exist if the PCI device were discovered via ACPI on a bare- +metal system. Once those data structures are set up, the +device also has a normal PCI identity in Linux, and the normal +Linux device driver for the vPCI device can function as if it +were running in Linux on bare-metal. Because vPCI devices are +presented dynamically through the VMBus offer mechanism, they +do not appear in the Linux guest's ACPI tables. vPCI devices +may be added to a VM or removed from a VM at any time during +the life of the VM, and not just during initial boot. + +With this approach, the vPCI device is a VMBus device and a +PCI device at the same time. In response to the VMBus offer +message, the hv_pci_probe() function runs and establishes a +VMBus connection to the vPCI VSP on the Hyper-V host. That +connection has a single VMBus channel. The channel is used to +exchange messages with the vPCI VSP for the purpose of setting +up and configuring the vPCI device in Linux. Once the device +is fully configured in Linux as a PCI device, the VMBus +channel is used only if Linux changes the vCPU to be interrupted +in the guest, or if the vPCI device is removed from +the VM while the VM is running. The ongoing operation of the +device happens directly between the Linux device driver for +the device and the hardware, with VMBus and the VMBus channel +playing no role. + +PCI Device Setup +---------------- +PCI device setup follows a sequence that Hyper-V originally +created for Windows guests, and that can be ill-suited for +Linux guests due to differences in the overall structure of +the Linux PCI subsystem compared with Windows. Nonetheless, +with a bit of hackery in the Hyper-V virtual PCI driver for +Linux, the virtual PCI device is setup in Linux so that +generic Linux PCI subsystem code and the Linux driver for the +device "just work". + +Each vPCI device is set up in Linux to be in its own PCI +domain with a host bridge. The PCI domainID is derived from +bytes 4 and 5 of the instance GUID assigned to the VMBus vPCI +device. The Hyper-V host does not guarantee that these bytes +are unique, so hv_pci_probe() has an algorithm to resolve +collisions. The collision resolution is intended to be stable +across reboots of the same VM so that the PCI domainIDs don't +change, as the domainID appears in the user space +configuration of some devices. + +hv_pci_probe() allocates a guest MMIO range to be used as PCI +config space for the device. This MMIO range is communicated +to the Hyper-V host over the VMBus channel as part of telling +the host that the device is ready to enter d0. See +hv_pci_enter_d0(). When the guest subsequently accesses this +MMIO range, the Hyper-V host intercepts the accesses and maps +them to the physical device PCI config space. + +hv_pci_probe() also gets BAR information for the device from +the Hyper-V host, and uses this information to allocate MMIO +space for the BARs. That MMIO space is then setup to be +associated with the host bridge so that it works when generic +PCI subsystem code in Linux processes the BARs. + +Finally, hv_pci_probe() creates the root PCI bus. At this +point the Hyper-V virtual PCI driver hackery is done, and the +normal Linux PCI machinery for scanning the root bus works to +detect the device, to perform driver matching, and to +initialize the driver and device. + +PCI Device Removal +------------------ +A Hyper-V host may initiate removal of a vPCI device from a +guest VM at any time during the life of the VM. The removal +is instigated by an admin action taken on the Hyper-V host and +is not under the control of the guest OS. + +A guest VM is notified of the removal by an unsolicited +"Eject" message sent from the host to the guest over the VMBus +channel associated with the vPCI device. Upon receipt of such +a message, the Hyper-V virtual PCI driver in Linux +asynchronously invokes Linux kernel PCI subsystem calls to +shutdown and remove the device. When those calls are +complete, an "Ejection Complete" message is sent back to +Hyper-V over the VMBus channel indicating that the device has +been removed. At this point, Hyper-V sends a VMBus rescind +message to the Linux guest, which the VMBus driver in Linux +processes by removing the VMBus identity for the device. Once +that processing is complete, all vestiges of the device having +been present are gone from the Linux kernel. The rescind +message also indicates to the guest that Hyper-V has stopped +providing support for the vPCI device in the guest. If the +guest were to attempt to access that device's MMIO space, it +would be an invalid reference. Hypercalls affecting the device +return errors, and any further messages sent in the VMBus +channel are ignored. + +After sending the Eject message, Hyper-V allows the guest VM +60 seconds to cleanly shutdown the device and respond with +Ejection Complete before sending the VMBus rescind +message. If for any reason the Eject steps don't complete +within the allowed 60 seconds, the Hyper-V host forcibly +performs the rescind steps, which will likely result in +cascading errors in the guest because the device is now no +longer present from the guest standpoint and accessing the +device MMIO space will fail. + +Because ejection is asynchronous and can happen at any point +during the guest VM lifecycle, proper synchronization in the +Hyper-V virtual PCI driver is very tricky. Ejection has been +observed even before a newly offered vPCI device has been +fully setup. The Hyper-V virtual PCI driver has been updated +several times over the years to fix race conditions when +ejections happen at inopportune times. Care must be taken when +modifying this code to prevent re-introducing such problems. +See comments in the code. + +Interrupt Assignment +-------------------- +The Hyper-V virtual PCI driver supports vPCI devices using +MSI, multi-MSI, or MSI-X. Assigning the guest vCPU that will +receive the interrupt for a particular MSI or MSI-X message is +complex because of the way the Linux setup of IRQs maps onto +the Hyper-V interfaces. For the single-MSI and MSI-X cases, +Linux calls hv_compse_msi_msg() twice, with the first call +containing a dummy vCPU and the second call containing the +real vCPU. Furthermore, hv_irq_unmask() is finally called +(on x86) or the GICD registers are set (on arm64) to specify +the real vCPU again. Each of these three calls interact +with Hyper-V, which must decide which physical CPU should +receive the interrupt before it is forwarded to the guest VM. +Unfortunately, the Hyper-V decision-making process is a bit +limited, and can result in concentrating the physical +interrupts on a single CPU, causing a performance bottleneck. +See details about how this is resolved in the extensive +comment above the function hv_compose_msi_req_get_cpu(). + +The Hyper-V virtual PCI driver implements the +irq_chip.irq_compose_msi_msg function as hv_compose_msi_msg(). +Unfortunately, on Hyper-V the implementation requires sending +a VMBus message to the Hyper-V host and awaiting an interrupt +indicating receipt of a reply message. Since +irq_chip.irq_compose_msi_msg can be called with IRQ locks +held, it doesn't work to do the normal sleep until awakened by +the interrupt. Instead hv_compose_msi_msg() must send the +VMBus message, and then poll for the completion message. As +further complexity, the vPCI device could be ejected/rescinded +while the polling is in progress, so this scenario must be +detected as well. See comments in the code regarding this +very tricky area. + +Most of the code in the Hyper-V virtual PCI driver (pci- +hyperv.c) applies to Hyper-V and Linux guests running on x86 +and on arm64 architectures. But there are differences in how +interrupt assignments are managed. On x86, the Hyper-V +virtual PCI driver in the guest must make a hypercall to tell +Hyper-V which guest vCPU should be interrupted by each +MSI/MSI-X interrupt, and the x86 interrupt vector number that +the x86_vector IRQ domain has picked for the interrupt. This +hypercall is made by hv_arch_irq_unmask(). On arm64, the +Hyper-V virtual PCI driver manages the allocation of an SPI +for each MSI/MSI-X interrupt. The Hyper-V virtual PCI driver +stores the allocated SPI in the architectural GICD registers, +which Hyper-V emulates, so no hypercall is necessary as with +x86. Hyper-V does not support using LPIs for vPCI devices in +arm64 guest VMs because it does not emulate a GICv3 ITS. + +The Hyper-V virtual PCI driver in Linux supports vPCI devices +whose drivers create managed or unmanaged Linux IRQs. If the +smp_affinity for an unmanaged IRQ is updated via the /proc/irq +interface, the Hyper-V virtual PCI driver is called to tell +the Hyper-V host to change the interrupt targeting and +everything works properly. However, on x86 if the x86_vector +IRQ domain needs to reassign an interrupt vector due to +running out of vectors on a CPU, there's no path to inform the +Hyper-V host of the change, and things break. Fortunately, +guest VMs operate in a constrained device environment where +using all the vectors on a CPU doesn't happen. Since such a +problem is only a theoretical concern rather than a practical +concern, it has been left unaddressed. + +DMA +--- +By default, Hyper-V pins all guest VM memory in the host +when the VM is created, and programs the physical IOMMU to +allow the VM to have DMA access to all its memory. Hence +it is safe to assign PCI devices to the VM, and allow the +guest operating system to program the DMA transfers. The +physical IOMMU prevents a malicious guest from initiating +DMA to memory belonging to the host or to other VMs on the +host. From the Linux guest standpoint, such DMA transfers +are in "direct" mode since Hyper-V does not provide a virtual +IOMMU in the guest. + +Hyper-V assumes that physical PCI devices always perform +cache-coherent DMA. When running on x86, this behavior is +required by the architecture. When running on arm64, the +architecture allows for both cache-coherent and +non-cache-coherent devices, with the behavior of each device +specified in the ACPI DSDT. But when a PCI device is assigned +to a guest VM, that device does not appear in the DSDT, so the +Hyper-V VMBus driver propagates cache-coherency information +from the VMBus node in the ACPI DSDT to all VMBus devices, +including vPCI devices (since they have a dual identity as a VMBus +device and as a PCI device). See vmbus_dma_configure(). +Current Hyper-V versions always indicate that the VMBus is +cache coherent, so vPCI devices on arm64 always get marked as +cache coherent and the CPU does not perform any sync +operations as part of dma_map/unmap_*() calls. + +vPCI protocol versions +---------------------- +As previously described, during vPCI device setup and teardown +messages are passed over a VMBus channel between the Hyper-V +host and the Hyper-v vPCI driver in the Linux guest. Some +messages have been revised in newer versions of Hyper-V, so +the guest and host must agree on the vPCI protocol version to +be used. The version is negotiated when communication over +the VMBus channel is first established. See +hv_pci_protocol_negotiation(). Newer versions of the protocol +extend support to VMs with more than 64 vCPUs, and provide +additional information about the vPCI device, such as the +guest virtual NUMA node to which it is most closely affined in +the underlying hardware. + +Guest NUMA node affinity +------------------------ +When the vPCI protocol version provides it, the guest NUMA +node affinity of the vPCI device is stored as part of the Linux +device information for subsequent use by the Linux driver. See +hv_pci_assign_numa_node(). If the negotiated protocol version +does not support the host providing NUMA affinity information, +the Linux guest defaults the device NUMA node to 0. But even +when the negotiated protocol version includes NUMA affinity +information, the ability of the host to provide such +information depends on certain host configuration options. If +the guest receives NUMA node value "0", it could mean NUMA +node 0, or it could mean "no information is available". +Unfortunately it is not possible to distinguish the two cases +from the guest side. + +PCI config space access in a CoCo VM +------------------------------------ +Linux PCI device drivers access PCI config space using a +standard set of functions provided by the Linux PCI subsystem. +In Hyper-V guests these standard functions map to functions +hv_pcifront_read_config() and hv_pcifront_write_config() +in the Hyper-V virtual PCI driver. In normal VMs, +these hv_pcifront_*() functions directly access the PCI config +space, and the accesses trap to Hyper-V to be handled. +But in CoCo VMs, memory encryption prevents Hyper-V +from reading the guest instruction stream to emulate the +access, so the hv_pcifront_*() functions must invoke +hypercalls with explicit arguments describing the access to be +made. + +Config Block back-channel +------------------------- +The Hyper-V host and Hyper-V virtual PCI driver in Linux +together implement a non-standard back-channel communication +path between the host and guest. The back-channel path uses +messages sent over the VMBus channel associated with the vPCI +device. The functions hyperv_read_cfg_blk() and +hyperv_write_cfg_blk() are the primary interfaces provided to +other parts of the Linux kernel. As of this writing, these +interfaces are used only by the Mellanox mlx5 driver to pass +diagnostic data to a Hyper-V host running in the Azure public +cloud. The functions hyperv_read_cfg_blk() and +hyperv_write_cfg_blk() are implemented in a separate module +(pci-hyperv-intf.c, under CONFIG_PCI_HYPERV_INTERFACE) that +effectively stubs them out when running in non-Hyper-V +environments. diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 3ec0b7a455a0..09c7e585ff58 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8791,6 +8791,11 @@ means the VM type with value @n is supported. Possible values of @n are:: #define KVM_X86_DEFAULT_VM 0 #define KVM_X86_SW_PROTECTED_VM 1 +Note, KVM_X86_SW_PROTECTED_VM is currently only for development and testing. +Do not use KVM_X86_SW_PROTECTED_VM for "real" VMs, and especially not in +production. The behavior and effective ABI for software-protected VMs is +unstable. + 9. Known KVM API problems ========================= |