diff options
Diffstat (limited to 'Documentation')
54 files changed, 2657 insertions, 380 deletions
diff --git a/Documentation/ABI/testing/debugfs-hisi-hpre b/Documentation/ABI/testing/debugfs-hisi-hpre index ec4a79e3a807..b4be5f1db4b7 100644 --- a/Documentation/ABI/testing/debugfs-hisi-hpre +++ b/Documentation/ABI/testing/debugfs-hisi-hpre @@ -33,7 +33,7 @@ Contact: linux-crypto@vger.kernel.org Description: Dump debug registers from the HPRE. Only available for PF. -What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/qm_regs +What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/regs Date: Sep 2019 Contact: linux-crypto@vger.kernel.org Description: Dump debug registers from the QM. @@ -44,14 +44,97 @@ What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/current_q Date: Sep 2019 Contact: linux-crypto@vger.kernel.org Description: One QM may contain multiple queues. Select specific queue to - show its debug registers in above qm_regs. + show its debug registers in above regs. Only available for PF. What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/clear_enable Date: Sep 2019 Contact: linux-crypto@vger.kernel.org -Description: QM debug registers(qm_regs) read clear control. 1 means enable +Description: QM debug registers(regs) read clear control. 1 means enable register read clear, otherwise 0. Writing to this file has no functional effect, only enable or disable counters clear after reading of these registers. Only available for PF. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/err_irq +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of invalid interrupts for + QM task completion. + Available for both PF and VF, and take no other effect on HPRE. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/aeq_irq +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of QM async event queue interrupts. + Available for both PF and VF, and take no other effect on HPRE. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/abnormal_irq +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of interrupts for QM abnormal event. + Available for both PF and VF, and take no other effect on HPRE. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/create_qp_err +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of queue allocation errors. + Available for both PF and VF, and take no other effect on HPRE. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/mb_err +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of failed QM mailbox commands. + Available for both PF and VF, and take no other effect on HPRE. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/status +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the status of the QM. + Four states: initiated, started, stopped and closed. + Available for both PF and VF, and take no other effect on HPRE. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/send_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of sent requests. + Available for both PF and VF, and take no other effect on HPRE. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/recv_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of received requests. + Available for both PF and VF, and take no other effect on HPRE. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/send_busy_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of requests sent + with returning busy. + Available for both PF and VF, and take no other effect on HPRE. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/send_fail_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of completed but error requests. + Available for both PF and VF, and take no other effect on HPRE. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/invalid_req_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of invalid requests being received. + Available for both PF and VF, and take no other effect on HPRE. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/overtime_thrhld +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Set the threshold time for counting the request which is + processed longer than the threshold. + 0: disable(default), 1: 1 microsecond. + Available for both PF and VF, and take no other effect on HPRE. + +What: /sys/kernel/debug/hisi_hpre/<bdf>/hpre_dfx/over_thrhld_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of time out requests. + Available for both PF and VF, and take no other effect on HPRE. diff --git a/Documentation/ABI/testing/debugfs-hisi-sec b/Documentation/ABI/testing/debugfs-hisi-sec index 06adb899495e..85feb4408e0f 100644 --- a/Documentation/ABI/testing/debugfs-hisi-sec +++ b/Documentation/ABI/testing/debugfs-hisi-sec @@ -1,10 +1,4 @@ -What: /sys/kernel/debug/hisi_sec/<bdf>/sec_dfx -Date: Oct 2019 -Contact: linux-crypto@vger.kernel.org -Description: Dump the debug registers of SEC cores. - Only available for PF. - -What: /sys/kernel/debug/hisi_sec/<bdf>/clear_enable +What: /sys/kernel/debug/hisi_sec2/<bdf>/clear_enable Date: Oct 2019 Contact: linux-crypto@vger.kernel.org Description: Enabling/disabling of clear action after reading @@ -12,7 +6,7 @@ Description: Enabling/disabling of clear action after reading 0: disable, 1: enable. Only available for PF, and take no other effect on SEC. -What: /sys/kernel/debug/hisi_sec/<bdf>/current_qm +What: /sys/kernel/debug/hisi_sec2/<bdf>/current_qm Date: Oct 2019 Contact: linux-crypto@vger.kernel.org Description: One SEC controller has one PF and multiple VFs, each function @@ -20,24 +14,100 @@ Description: One SEC controller has one PF and multiple VFs, each function qm refers to. Only available for PF. -What: /sys/kernel/debug/hisi_sec/<bdf>/qm/qm_regs +What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/qm_regs Date: Oct 2019 Contact: linux-crypto@vger.kernel.org Description: Dump of QM related debug registers. Available for PF and VF in host. VF in guest currently only has one debug register. -What: /sys/kernel/debug/hisi_sec/<bdf>/qm/current_q +What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/current_q Date: Oct 2019 Contact: linux-crypto@vger.kernel.org Description: One QM of SEC may contain multiple queues. Select specific - queue to show its debug registers in above 'qm_regs'. + queue to show its debug registers in above 'regs'. Only available for PF. -What: /sys/kernel/debug/hisi_sec/<bdf>/qm/clear_enable +What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/clear_enable Date: Oct 2019 Contact: linux-crypto@vger.kernel.org Description: Enabling/disabling of clear action after reading the SEC's QM debug registers. 0: disable, 1: enable. Only available for PF, and take no other effect on SEC. + +What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/err_irq +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of invalid interrupts for + QM task completion. + Available for both PF and VF, and take no other effect on SEC. + +What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/aeq_irq +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of QM async event queue interrupts. + Available for both PF and VF, and take no other effect on SEC. + +What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/abnormal_irq +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of interrupts for QM abnormal event. + Available for both PF and VF, and take no other effect on SEC. + +What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/create_qp_err +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of queue allocation errors. + Available for both PF and VF, and take no other effect on SEC. + +What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/mb_err +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of failed QM mailbox commands. + Available for both PF and VF, and take no other effect on SEC. + +What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/status +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the status of the QM. + Four states: initiated, started, stopped and closed. + Available for both PF and VF, and take no other effect on SEC. + +What: /sys/kernel/debug/hisi_sec2/<bdf>/sec_dfx/send_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of sent requests. + Available for both PF and VF, and take no other effect on SEC. + +What: /sys/kernel/debug/hisi_sec2/<bdf>/sec_dfx/recv_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of received requests. + Available for both PF and VF, and take no other effect on SEC. + +What: /sys/kernel/debug/hisi_sec2/<bdf>/sec_dfx/send_busy_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of requests sent with returning busy. + Available for both PF and VF, and take no other effect on SEC. + +What: /sys/kernel/debug/hisi_sec2/<bdf>/sec_dfx/err_bd_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of BD type error requests + to be received. + Available for both PF and VF, and take no other effect on SEC. + +What: /sys/kernel/debug/hisi_sec2/<bdf>/sec_dfx/invalid_req_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of invalid requests being received. + Available for both PF and VF, and take no other effect on SEC. + +What: /sys/kernel/debug/hisi_sec2/<bdf>/sec_dfx/done_flag_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of completed but marked error requests + to be received. + Available for both PF and VF, and take no other effect on SEC. diff --git a/Documentation/ABI/testing/debugfs-hisi-zip b/Documentation/ABI/testing/debugfs-hisi-zip index a7c63e6c4bc3..3034a2bf99ca 100644 --- a/Documentation/ABI/testing/debugfs-hisi-zip +++ b/Documentation/ABI/testing/debugfs-hisi-zip @@ -26,7 +26,7 @@ Description: One ZIP controller has one PF and multiple VFs, each function has a QM. Select the QM which below qm refers to. Only available for PF. -What: /sys/kernel/debug/hisi_zip/<bdf>/qm/qm_regs +What: /sys/kernel/debug/hisi_zip/<bdf>/qm/regs Date: Nov 2018 Contact: linux-crypto@vger.kernel.org Description: Dump of QM related debug registers. @@ -37,14 +37,78 @@ What: /sys/kernel/debug/hisi_zip/<bdf>/qm/current_q Date: Nov 2018 Contact: linux-crypto@vger.kernel.org Description: One QM may contain multiple queues. Select specific queue to - show its debug registers in above qm_regs. + show its debug registers in above regs. Only available for PF. What: /sys/kernel/debug/hisi_zip/<bdf>/qm/clear_enable Date: Nov 2018 Contact: linux-crypto@vger.kernel.org -Description: QM debug registers(qm_regs) read clear control. 1 means enable +Description: QM debug registers(regs) read clear control. 1 means enable register read clear, otherwise 0. Writing to this file has no functional effect, only enable or disable counters clear after reading of these registers. Only available for PF. + +What: /sys/kernel/debug/hisi_zip/<bdf>/qm/err_irq +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of invalid interrupts for + QM task completion. + Available for both PF and VF, and take no other effect on ZIP. + +What: /sys/kernel/debug/hisi_zip/<bdf>/qm/aeq_irq +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of QM async event queue interrupts. + Available for both PF and VF, and take no other effect on ZIP. + +What: /sys/kernel/debug/hisi_zip/<bdf>/qm/abnormal_irq +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of interrupts for QM abnormal event. + Available for both PF and VF, and take no other effect on ZIP. + +What: /sys/kernel/debug/hisi_zip/<bdf>/qm/create_qp_err +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of queue allocation errors. + Available for both PF and VF, and take no other effect on ZIP. + +What: /sys/kernel/debug/hisi_zip/<bdf>/qm/mb_err +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the number of failed QM mailbox commands. + Available for both PF and VF, and take no other effect on ZIP. + +What: /sys/kernel/debug/hisi_zip/<bdf>/qm/status +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the status of the QM. + Four states: initiated, started, stopped and closed. + Available for both PF and VF, and take no other effect on ZIP. + +What: /sys/kernel/debug/hisi_zip/<bdf>/zip_dfx/send_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of sent requests. + Available for both PF and VF, and take no other effect on ZIP. + +What: /sys/kernel/debug/hisi_zip/<bdf>/zip_dfx/recv_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of received requests. + Available for both PF and VF, and take no other effect on ZIP. + +What: /sys/kernel/debug/hisi_zip/<bdf>/zip_dfx/send_busy_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of requests received + with returning busy. + Available for both PF and VF, and take no other effect on ZIP. + +What: /sys/kernel/debug/hisi_zip/<bdf>/zip_dfx/err_bd_cnt +Date: Apr 2020 +Contact: linux-crypto@vger.kernel.org +Description: Dump the total number of BD type error requests + to be received. + Available for both PF and VF, and take no other effect on ZIP. diff --git a/Documentation/ABI/testing/dev-kmsg b/Documentation/ABI/testing/dev-kmsg index f307506eb54c..1e6c28b1942b 100644 --- a/Documentation/ABI/testing/dev-kmsg +++ b/Documentation/ABI/testing/dev-kmsg @@ -56,6 +56,11 @@ Description: The /dev/kmsg character device node provides userspace access seek after the last record available at the time the last SYSLOG_ACTION_CLEAR was issued. + Due to the record nature of this interface with a "read all" + behavior and the specific positions each seek operation sets, + SEEK_CUR is not supported, returning -ESPIPE (invalid seek) to + errno whenever requested. + The output format consists of a prefix carrying the syslog prefix including priority and facility, the 64 bit message sequence number and the monotonic timestamp in microseconds, diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst index fd5e2cbc4935..75b8ca007a11 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.rst +++ b/Documentation/RCU/Design/Requirements/Requirements.rst @@ -1943,56 +1943,27 @@ invoked from a CPU-hotplug notifier. Scheduler and RCU ~~~~~~~~~~~~~~~~~ -RCU depends on the scheduler, and the scheduler uses RCU to protect some -of its data structures. The preemptible-RCU ``rcu_read_unlock()`` -implementation must therefore be written carefully to avoid deadlocks -involving the scheduler's runqueue and priority-inheritance locks. In -particular, ``rcu_read_unlock()`` must tolerate an interrupt where the -interrupt handler invokes both ``rcu_read_lock()`` and -``rcu_read_unlock()``. This possibility requires ``rcu_read_unlock()`` -to use negative nesting levels to avoid destructive recursion via -interrupt handler's use of RCU. - -This scheduler-RCU requirement came as a `complete -surprise <https://lwn.net/Articles/453002/>`__. - -As noted above, RCU makes use of kthreads, and it is necessary to avoid -excessive CPU-time accumulation by these kthreads. This requirement was -no surprise, but RCU's violation of it when running context-switch-heavy -workloads when built with ``CONFIG_NO_HZ_FULL=y`` `did come as a -surprise +RCU makes use of kthreads, and it is necessary to avoid excessive CPU-time +accumulation by these kthreads. This requirement was no surprise, but +RCU's violation of it when running context-switch-heavy workloads when +built with ``CONFIG_NO_HZ_FULL=y`` `did come as a surprise [PDF] <http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf>`__. RCU has made good progress towards meeting this requirement, even for context-switch-heavy ``CONFIG_NO_HZ_FULL=y`` workloads, but there is room for further improvement. -It is forbidden to hold any of scheduler's runqueue or -priority-inheritance spinlocks across an ``rcu_read_unlock()`` unless -interrupts have been disabled across the entire RCU read-side critical -section, that is, up to and including the matching ``rcu_read_lock()``. -Violating this restriction can result in deadlocks involving these -scheduler spinlocks. There was hope that this restriction might be -lifted when interrupt-disabled calls to ``rcu_read_unlock()`` started -deferring the reporting of the resulting RCU-preempt quiescent state -until the end of the corresponding interrupts-disabled region. -Unfortunately, timely reporting of the corresponding quiescent state to -expedited grace periods requires a call to ``raise_softirq()``, which -can acquire these scheduler spinlocks. In addition, real-time systems -using RCU priority boosting need this restriction to remain in effect -because deferred quiescent-state reporting would also defer deboosting, -which in turn would degrade real-time latencies. - -In theory, if a given RCU read-side critical section could be guaranteed -to be less than one second in duration, holding a scheduler spinlock -across that critical section's ``rcu_read_unlock()`` would require only -that preemption be disabled across the entire RCU read-side critical -section, not interrupts. Unfortunately, given the possibility of vCPU -preemption, long-running interrupts, and so on, it is not possible in -practice to guarantee that a given RCU read-side critical section will -complete in less than one second. Therefore, as noted above, if -scheduler spinlocks are held across a given call to -``rcu_read_unlock()``, interrupts must be disabled across the entire RCU -read-side critical section. +There is no longer any prohibition against holding any of +scheduler's runqueue or priority-inheritance spinlocks across an +``rcu_read_unlock()``, even if interrupts and preemption were enabled +somewhere within the corresponding RCU read-side critical section. +Therefore, it is now perfectly legal to execute ``rcu_read_lock()`` +with preemption enabled, acquire one of the scheduler locks, and hold +that lock across the matching ``rcu_read_unlock()``. + +Similarly, the RCU flavor consolidation has removed the need for negative +nesting. The fact that interrupt-disabled regions of code act as RCU +read-side critical sections implicitly avoids earlier issues that used +to result in destructive recursion via interrupt handler's use of RCU. Tracing and RCU ~~~~~~~~~~~~~~~ diff --git a/Documentation/admin-guide/device-mapper/dm-integrity.rst b/Documentation/admin-guide/device-mapper/dm-integrity.rst index c00f9f11e3f3..8439d2ae689b 100644 --- a/Documentation/admin-guide/device-mapper/dm-integrity.rst +++ b/Documentation/admin-guide/device-mapper/dm-integrity.rst @@ -182,12 +182,15 @@ fix_padding space-efficient. If this option is not present, large padding is used - that is for compatibility with older kernels. - -The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can -be changed when reloading the target (load an inactive table and swap the -tables with suspend and resume). The other arguments should not be changed -when reloading the target because the layout of disk data depend on them -and the reloaded target would be non-functional. +allow_discards + Allow block discard requests (a.k.a. TRIM) for the integrity device. + Discards are only allowed to devices using internal hash. + +The journal mode (D/J), buffer_sectors, journal_watermark, commit_time and +allow_discards can be changed when reloading the target (load an inactive +table and swap the tables with suspend and resume). The other arguments +should not be changed when reloading the target because the layout of disk +data depend on them and the reloaded target would be non-functional. The layout of the formatted block device: diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 7bc83f3d9bdf..69ff5c4e539d 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1748,6 +1748,13 @@ initrd= [BOOT] Specify the location of the initial ramdisk + initrdmem= [KNL] Specify a physical address and size from which to + load the initrd. If an initrd is compiled in or + specified in the bootparams, it takes priority over this + setting. + Format: ss[KMG],nn[KMG] + Default is 0, 0 + init_on_alloc= [MM] Fill newly allocated pages and heap objects with zeroes. Format: 0 | 1 @@ -4210,12 +4217,24 @@ Duration of CPU stall (s) to test RCU CPU stall warnings, zero to disable. + rcutorture.stall_cpu_block= [KNL] + Sleep while stalling if set. This will result + in warnings from preemptible RCU in addition + to any other stall-related activity. + rcutorture.stall_cpu_holdoff= [KNL] Time to wait (s) after boot before inducing stall. rcutorture.stall_cpu_irqsoff= [KNL] Disable interrupts while stalling if set. + rcutorture.stall_gp_kthread= [KNL] + Duration (s) of forced sleep within RCU + grace-period kthread to test RCU CPU stall + warnings, zero to disable. If both stall_cpu + and stall_gp_kthread are specified, the + kthread is starved first, then the CPU. + rcutorture.stat_interval= [KNL] Time (s) between statistics printk()s. @@ -4286,6 +4305,13 @@ only normal grace-period primitives. No effect on CONFIG_TINY_RCU kernels. + rcupdate.rcu_task_ipi_delay= [KNL] + Set time in jiffies during which RCU tasks will + avoid sending IPIs, starting with the beginning + of a given grace period. Setting a large + number avoids disturbing real-time workloads, + but lengthens grace periods. + rcupdate.rcu_task_stall_timeout= [KNL] Set timeout in jiffies for RCU task stall warning messages. Disable with a value less than or equal diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst index 72effa7c23b9..1307b5274a0f 100644 --- a/Documentation/admin-guide/perf-security.rst +++ b/Documentation/admin-guide/perf-security.rst @@ -1,6 +1,6 @@ .. _perf_security: -Perf Events and tool security +Perf events and tool security ============================= Overview @@ -42,11 +42,11 @@ categories: Data that belong to the fourth category can potentially contain sensitive process data. If PMUs in some monitoring modes capture values of execution context registers or data from process memory then access -to such monitoring capabilities requires to be ordered and secured -properly. So, perf_events/Perf performance monitoring is the subject for -security access control management [5]_ . +to such monitoring modes requires to be ordered and secured properly. +So, perf_events performance monitoring and observability operations are +the subject for security access control management [5]_ . -perf_events/Perf access control +perf_events access control ------------------------------- To perform security checks, the Linux implementation splits processes @@ -66,11 +66,25 @@ into distinct units, known as capabilities [6]_ , which can be independently enabled and disabled on per-thread basis for processes and files of unprivileged users. -Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated +Unprivileged processes with enabled CAP_PERFMON capability are treated as privileged processes with respect to perf_events performance -monitoring and bypass *scope* permissions checks in the kernel. - -Unprivileged processes using perf_events system call API is also subject +monitoring and observability operations, thus, bypass *scope* permissions +checks in the kernel. CAP_PERFMON implements the principle of least +privilege [13]_ (POSIX 1003.1e: 2.2.2.39) for performance monitoring and +observability operations in the kernel and provides a secure approach to +perfomance monitoring and observability in the system. + +For backward compatibility reasons the access to perf_events monitoring and +observability operations is also open for CAP_SYS_ADMIN privileged +processes but CAP_SYS_ADMIN usage for secure monitoring and observability +use cases is discouraged with respect to the CAP_PERFMON capability. +If system audit records [14]_ for a process using perf_events system call +API contain denial records of acquiring both CAP_PERFMON and CAP_SYS_ADMIN +capabilities then providing the process with CAP_PERFMON capability singly +is recommended as the preferred secure approach to resolve double access +denial logging related to usage of performance monitoring and observability. + +Unprivileged processes using perf_events system call are also subject for PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose outcome determines whether monitoring is permitted. So unprivileged processes provided with CAP_SYS_PTRACE capability are effectively @@ -82,14 +96,14 @@ performance analysis of monitored processes or a system. For example, CAP_SYSLOG capability permits reading kernel space memory addresses from /proc/kallsyms file. -perf_events/Perf privileged users +Privileged Perf users groups --------------------------------- Mechanisms of capabilities, privileged capability-dumb files [6]_ and -file system ACLs [10]_ can be used to create a dedicated group of -perf_events/Perf privileged users who are permitted to execute -performance monitoring without scope limits. The following steps can be -taken to create such a group of privileged Perf users. +file system ACLs [10]_ can be used to create dedicated groups of +privileged Perf users who are permitted to execute performance monitoring +and observability without scope limits. The following steps can be +taken to create such groups of privileged Perf users. 1. Create perf_users group of privileged Perf users, assign perf_users group to Perf tool executable and limit access to the executable for @@ -108,30 +122,51 @@ taken to create such a group of privileged Perf users. -rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf 2. Assign the required capabilities to the Perf tool executable file and - enable members of perf_users group with performance monitoring + enable members of perf_users group with monitoring and observability privileges [6]_ : :: - # setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf - # setcap -v "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf + # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf + # setcap -v "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf perf: OK # getcap perf - perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep + perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep + +If the libcap installed doesn't yet support "cap_perfmon", use "38" instead, +i.e.: + +:: + + # setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf + +Note that you may need to have 'cap_ipc_lock' in the mix for tools such as +'perf top', alternatively use 'perf top -m N', to reduce the memory that +it uses for the perf ring buffer, see the memory allocation section below. + +Using a libcap without support for CAP_PERFMON will make cap_get_flag(caps, 38, +CAP_EFFECTIVE, &val) fail, which will lead the default event to be 'cycles:u', +so as a workaround explicitly ask for the 'cycles' event, i.e.: + +:: + + # perf top -e cycles + +To get kernel and user samples with a perf binary with just CAP_PERFMON. As a result, members of perf_users group are capable of conducting -performance monitoring by using functionality of the configured Perf -tool executable that, when executes, passes perf_events subsystem scope -checks. +performance monitoring and observability by using functionality of the +configured Perf tool executable that, when executes, passes perf_events +subsystem scope checks. This specific access control management is only available to superuser or root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_ capabilities. -perf_events/Perf unprivileged users +Unprivileged users ----------------------------------- -perf_events/Perf *scope* and *access* control for unprivileged processes +perf_events *scope* and *access* control for unprivileged processes is governed by perf_event_paranoid [2]_ setting: -1: @@ -166,7 +201,7 @@ is governed by perf_event_paranoid [2]_ setting: perf_event_mlock_kb locking limit is imposed but ignored for unprivileged processes with CAP_IPC_LOCK capability. -perf_events/Perf resource control +Resource control --------------------------------- Open file descriptors @@ -227,4 +262,5 @@ Bibliography .. [10] `<http://man7.org/linux/man-pages/man5/acl.5.html>`_ .. [11] `<http://man7.org/linux/man-pages/man2/getrlimit.2.html>`_ .. [12] `<http://man7.org/linux/man-pages/man5/limits.conf.5.html>`_ - +.. [13] `<https://sites.google.com/site/fullycapable>`_ +.. [14] `<http://man7.org/linux/man-pages/man8/auditd.8.html>`_ diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst new file mode 100644 index 000000000000..296d5027787a --- /dev/null +++ b/Documentation/admin-guide/pstore-blk.rst @@ -0,0 +1,243 @@ +.. SPDX-License-Identifier: GPL-2.0 + +pstore block oops/panic logger +============================== + +Introduction +------------ + +pstore block (pstore/blk) is an oops/panic logger that writes its logs to a +block device and non-block device before the system crashes. You can get +these log files by mounting pstore filesystem like:: + + mount -t pstore pstore /sys/fs/pstore + + +pstore block concepts +--------------------- + +pstore/blk provides efficient configuration method for pstore/blk, which +divides all configurations into two parts, configurations for user and +configurations for driver. + +Configurations for user determine how pstore/blk works, such as pmsg_size, +kmsg_size and so on. All of them support both Kconfig and module parameters, +but module parameters have priority over Kconfig. + +Configurations for driver are all about block device and non-block device, +such as total_size of block device and read/write operations. + +Configurations for user +----------------------- + +All of these configurations support both Kconfig and module parameters, but +module parameters have priority over Kconfig. + +Here is an example for module parameters:: + + pstore_blk.blkdev=179:7 pstore_blk.kmsg_size=64 + +The detail of each configurations may be of interest to you. + +blkdev +~~~~~~ + +The block device to use. Most of the time, it is a partition of block device. +It's required for pstore/blk. It is also used for MTD device. + +It accepts the following variants for block device: + +1. <hex_major><hex_minor> device number in hexadecimal represents itself; no + leading 0x, for example b302. +#. /dev/<disk_name> represents the device number of disk +#. /dev/<disk_name><decimal> represents the device number of partition - device + number of disk plus the partition number +#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk + name of partitioned disk ends with a digit. +#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of + a partition if the partition table provides it. The UUID may be either an + EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP, + where SSSSSSSS is a zero-filled hex representation of the 32-bit + "NT disk signature", and PP is a zero-filled hex representation of the + 1-based partition number. +#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a + partition with a known unique id. +#. <major>:<minor> major and minor number of the device separated by a colon. + +It accepts the following variants for MTD device: + +1. <device name> MTD device name. "pstore" is recommended. +#. <device number> MTD device number. + +kmsg_size +~~~~~~~~~ + +The chunk size in KB for oops/panic front-end. It **MUST** be a multiple of 4. +It's optional if you do not care oops/panic log. + +There are multiple chunks for oops/panic front-end depending on the remaining +space except other pstore front-ends. + +pstore/blk will log to oops/panic chunks one by one, and always overwrite the +oldest chunk if there is no more free chunk. + +pmsg_size +~~~~~~~~~ + +The chunk size in KB for pmsg front-end. It **MUST** be a multiple of 4. +It's optional if you do not care pmsg log. + +Unlike oops/panic front-end, there is only one chunk for pmsg front-end. + +Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are +appended to the chunk. On reboot the contents are available in +*/sys/fs/pstore/pmsg-pstore-blk-0*. + +console_size +~~~~~~~~~~~~ + +The chunk size in KB for console front-end. It **MUST** be a multiple of 4. +It's optional if you do not care console log. + +Similar to pmsg front-end, there is only one chunk for console front-end. + +All log of console will be appended to the chunk. On reboot the contents are +available in */sys/fs/pstore/console-pstore-blk-0*. + +ftrace_size +~~~~~~~~~~~ + +The chunk size in KB for ftrace front-end. It **MUST** be a multiple of 4. +It's optional if you do not care console log. + +Similar to oops front-end, there are multiple chunks for ftrace front-end +depending on the count of cpu processors. Each chunk size is equal to +ftrace_size / processors_count. + +All log of ftrace will be appended to the chunk. On reboot the contents are +combined and available in */sys/fs/pstore/ftrace-pstore-blk-0*. + +Persistent function tracing might be useful for debugging software or hardware +related hangs. Here is an example of usage:: + + # mount -t pstore pstore /sys/fs/pstore + # mount -t debugfs debugfs /sys/kernel/debug/ + # echo 1 > /sys/kernel/debug/pstore/record_ftrace + # reboot -f + [...] + # mount -t pstore pstore /sys/fs/pstore + # tail /sys/fs/pstore/ftrace-pstore-blk-0 + CPU:0 ts:5914676 c0063828 c0063b94 call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0 + CPU:0 ts:5914678 c039ecdc c006385c cpuidle_enter_state <- call_cpuidle+0x44/0x48 + CPU:0 ts:5914680 c039e9a0 c039ecf0 cpuidle_enter_freeze <- cpuidle_enter_state+0x304/0x314 + CPU:0 ts:5914681 c0063870 c039ea30 sched_idle_set_state <- cpuidle_enter_state+0x44/0x314 + CPU:1 ts:5916720 c0160f59 c015ee04 kernfs_unmap_bin_file <- __kernfs_remove+0x140/0x204 + CPU:1 ts:5916721 c05ca625 c015ee0c __mutex_lock_slowpath <- __kernfs_remove+0x148/0x204 + CPU:1 ts:5916723 c05c813d c05ca630 yield_to <- __mutex_lock_slowpath+0x314/0x358 + CPU:1 ts:5916724 c05ca2d1 c05ca638 __ww_mutex_lock <- __mutex_lock_slowpath+0x31c/0x358 + +max_reason +~~~~~~~~~~ + +Limiting which kinds of kmsg dumps are stored can be controlled via +the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's +``enum kmsg_dump_reason``. For example, to store both Oopses and Panics, +``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics +``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0 +(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the +``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS, +otherwise KMSG_DUMP_MAX. + +Configurations for driver +------------------------- + +Only a block device driver cares about these configurations. A block device +driver uses ``register_pstore_blk`` to register to pstore/blk. + +.. kernel-doc:: fs/pstore/blk.c + :identifiers: register_pstore_blk + +A non-block device driver uses ``register_pstore_device`` with +``struct pstore_device_info`` to register to pstore/blk. + +.. kernel-doc:: fs/pstore/blk.c + :identifiers: register_pstore_device + +.. kernel-doc:: include/linux/pstore_blk.h + :identifiers: pstore_device_info + +Compression and header +---------------------- + +Block device is large enough for uncompressed oops data. Actually we do not +recommend data compression because pstore/blk will insert some information into +the first line of oops/panic data. For example:: + + Panic: Total 16 times + +It means that it's OOPS|Panic for the 16th time since the first booting. +Sometimes the number of occurrences of oops|panic since the first booting is +important to judge whether the system is stable. + +The following line is inserted by pstore filesystem. For example:: + + Oops#2 Part1 + +It means that it's OOPS for the 2nd time on the last boot. + +Reading the data +---------------- + +The dump data can be read from the pstore filesystem. The format for these +files is ``dmesg-pstore-blk-[N]`` for oops/panic front-end, +``pmsg-pstore-blk-0`` for pmsg front-end and so on. The timestamp of the +dump file records the trigger time. To delete a stored record from block +device, simply unlink the respective pstore file. + +Attentions in panic read/write APIs +----------------------------------- + +If on panic, the kernel is not going to run for much longer, the tasks will not +be scheduled and most kernel resources will be out of service. It +looks like a single-threaded program running on a single-core computer. + +The following points require special attention for panic read/write APIs: + +1. Can **NOT** allocate any memory. + If you need memory, just allocate while the block driver is initializing + rather than waiting until the panic. +#. Must be polled, **NOT** interrupt driven. + No task schedule any more. The block driver should delay to ensure the write + succeeds, but NOT sleep. +#. Can **NOT** take any lock. + There is no other task, nor any shared resource; you are safe to break all + locks. +#. Just use CPU to transfer. + Do not use DMA to transfer unless you are sure that DMA will not keep lock. +#. Control registers directly. + Please control registers directly rather than use Linux kernel resources. + Do I/O map while initializing rather than wait until a panic occurs. +#. Reset your block device and controller if necessary. + If you are not sure of the state of your block device and controller when + a panic occurs, you are safe to stop and reset them. + +pstore/blk supports psblk_blkdev_info(), which is defined in +*linux/pstore_blk.h*, to get information of using block device, such as the +device number, sector count and start sector of the whole disk. + +pstore block internals +---------------------- + +For developer reference, here are all the important structures and APIs: + +.. kernel-doc:: fs/pstore/zone.c + :internal: + +.. kernel-doc:: include/linux/pstore_zone.h + :internal: + +.. kernel-doc:: fs/pstore/blk.c + :export: + +.. kernel-doc:: include/linux/pstore_blk.h + :internal: diff --git a/Documentation/admin-guide/ramoops.rst b/Documentation/admin-guide/ramoops.rst index 6dbcc5481000..a60a96218ba9 100644 --- a/Documentation/admin-guide/ramoops.rst +++ b/Documentation/admin-guide/ramoops.rst @@ -32,11 +32,17 @@ memory to be mapped strongly ordered, and atomic operations on strongly ordered memory are implementation defined, and won't work on many ARMs such as omaps. The memory area is divided into ``record_size`` chunks (also rounded down to -power of two) and each oops/panic writes a ``record_size`` chunk of +power of two) and each kmesg dump writes a ``record_size`` chunk of information. -Dumping both oopses and panics can be done by setting 1 in the ``dump_oops`` -variable while setting 0 in that variable dumps only the panics. +Limiting which kinds of kmsg dumps are stored can be controlled via +the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's +``enum kmsg_dump_reason``. For example, to store both Oopses and Panics, +``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics +``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0 +(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the +``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS, +otherwise KMSG_DUMP_MAX. The module uses a counter to record multiple dumps but the counter gets reset on restart (i.e. new dumps after the restart will overwrite old ones). @@ -90,7 +96,7 @@ Setting the ramoops parameters can be done in several different manners: .mem_address = <...>, .mem_type = <...>, .record_size = <...>, - .dump_oops = <...>, + .max_reason = <...>, .ecc = <...>, }; diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 0d427fd10941..8d25892a18f8 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -721,7 +721,13 @@ perf_event_paranoid =================== Controls use of the performance events system by unprivileged -users (without CAP_SYS_ADMIN). The default value is 2. +users (without CAP_PERFMON). The default value is 2. + +For backward compatibility reasons access to system performance +monitoring and observability remains open for CAP_SYS_ADMIN +privileged processes but CAP_SYS_ADMIN usage for secure system +performance monitoring and observability operations is discouraged +with respect to CAP_PERFMON use cases. === ================================================================== -1 Allow use of (almost) all events by all users. @@ -730,13 +736,13 @@ users (without CAP_SYS_ADMIN). The default value is 2. ``CAP_IPC_LOCK``. >=0 Disallow ftrace function tracepoint by users without - ``CAP_SYS_ADMIN``. + ``CAP_PERFMON``. - Disallow raw tracepoint access by users without ``CAP_SYS_ADMIN``. + Disallow raw tracepoint access by users without ``CAP_PERFMON``. ->=1 Disallow CPU event access by users without ``CAP_SYS_ADMIN``. +>=1 Disallow CPU event access by users without ``CAP_PERFMON``. ->=2 Disallow kernel profiling by users without ``CAP_SYS_ADMIN``. +>=2 Disallow kernel profiling by users without ``CAP_PERFMON``. === ================================================================== diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst index 8ebe46b1af39..5d8f1e84dd90 100644 --- a/Documentation/core-api/printk-formats.rst +++ b/Documentation/core-api/printk-formats.rst @@ -112,6 +112,20 @@ used when printing stack backtraces. The specifier takes into consideration the effect of compiler optimisations which may occur when tail-calls are used and marked with the noreturn GCC attribute. +Probed Pointers from BPF / tracing +---------------------------------- + +:: + + %pks kernel string + %pus user string + +The ``k`` and ``u`` specifiers are used for printing prior probed memory from +either kernel memory (k) or user memory (u). The subsequent ``s`` specifier +results in printing a string. For direct use in regular vsnprintf() the (k) +and (u) annotation is ignored, however, when used out of BPF's bpf_trace_printk(), +for example, it reads the memory it is pointing to without faulting. + Kernel Pointers --------------- @@ -468,21 +482,23 @@ Examples (OF):: %pfwf /ocp@68000000/i2c@48072000/camera@10/port/endpoint - Full name %pfwP endpoint - Node name -Time and date (struct rtc_time) -------------------------------- +Time and date +------------- :: - %ptR YYYY-mm-ddTHH:MM:SS - %ptRd YYYY-mm-dd - %ptRt HH:MM:SS - %ptR[dt][r] + %pt[RT] YYYY-mm-ddTHH:MM:SS + %pt[RT]d YYYY-mm-dd + %pt[RT]t HH:MM:SS + %pt[RT][dt][r] -For printing date and time as represented by struct rtc_time structure in -human readable format. +For printing date and time as represented by + R struct rtc_time structure + T time64_t type +in human readable format. -By default year will be incremented by 1900 and month by 1. Use %ptRr (raw) -to suppress this behaviour. +By default year will be incremented by 1900 and month by 1. +Use %pt[RT]r (raw) to suppress this behaviour. Passed by reference. diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst index 49d9833af871..ec575e72d0b2 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -5,8 +5,9 @@ Memory Protection Keys ====================== Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature -which is found on Intel's Skylake "Scalable Processor" Server CPUs. -It will be avalable in future non-server parts. +which is found on Intel's Skylake (and later) "Scalable Processor" +Server CPUs. It will be available in future non-server Intel parts +and future AMD processors. For anyone wishing to test or use this feature, it is available in Amazon's EC2 C5 instances and is known to work there using an Ubuntu diff --git a/Documentation/devicetree/bindings/dma/fsl-edma.txt b/Documentation/devicetree/bindings/dma/fsl-edma.txt index e77b08ebcd06..ee1754739b4b 100644 --- a/Documentation/devicetree/bindings/dma/fsl-edma.txt +++ b/Documentation/devicetree/bindings/dma/fsl-edma.txt @@ -10,7 +10,8 @@ Required properties: - compatible : - "fsl,vf610-edma" for eDMA used similar to that on Vybrid vf610 SoC - "fsl,imx7ulp-edma" for eDMA2 used similar to that on i.mx7ulp - - "fsl,fsl,ls1028a-edma" for eDMA used similar to that on Vybrid vf610 SoC + - "fsl,ls1028a-edma" followed by "fsl,vf610-edma" for eDMA used on the + LS1028A SoC. - reg : Specifies base physical address(s) and size of the eDMA registers. The 1st region is eDMA control register's address and size. The 2nd and the 3rd regions are programmable channel multiplexing diff --git a/Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml b/Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml index 86cfb599256e..371f18773198 100644 --- a/Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml +++ b/Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml @@ -22,9 +22,7 @@ properties: const: socionext,uniphier-xdmac reg: - items: - - description: XDMAC base register region (offset and length) - - description: XDMAC extension register region (offset and length) + maxItems: 1 interrupts: maxItems: 1 @@ -49,12 +47,13 @@ required: - reg - interrupts - "#dma-cells" + - dma-channels examples: - | xdmac: dma-controller@5fc10000 { compatible = "socionext,uniphier-xdmac"; - reg = <0x5fc10000 0x1000>, <0x5fc20000 0x800>; + reg = <0x5fc10000 0x5300>; interrupts = <0 188 4>; #dma-cells = <2>; dma-channels = <16>; diff --git a/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml b/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml new file mode 100644 index 000000000000..84ae4cdd08ed --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml @@ -0,0 +1,107 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +# Copyright (C) 2020 BAIKAL ELECTRONICS, JSC +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/hwmon/baikal,bt1-pvt.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Baikal-T1 PVT Sensor + +maintainers: + - Serge Semin <fancer.lancer@gmail.com> + +description: | + Baikal-T1 SoC provides an embedded process, voltage and temperature + sensor to monitor an internal SoC environment (chip temperature, supply + voltage and process monitor) and on time detect critical situations, + which may cause the system instability and even damages. The IP-block + is based on the Analog Bits PVT sensor, but is equipped with a dedicated + control wrapper, which provides a MMIO registers-based access to the + sensor core functionality (APB3-bus based) and exposes an additional + functions like thresholds/data ready interrupts, its status and masks, + measurements timeout. Its internal structure is depicted on the next + diagram: + + Analog Bits core Bakal-T1 PVT control block + +--------------------+ +------------------------+ + | Temperature sensor |-+ +------| Sensors control | + |--------------------| |<---En---| |------------------------| + | Voltage sensor |-|<--Mode--| +--->| Sampled data | + |--------------------| |<--Trim--+ | |------------------------| + | Low-Vt sensor |-| | +--| Thresholds comparator | + |--------------------| |---Data----| | |------------------------| + | High-Vt sensor |-| | +->| Interrupts status | + |--------------------| |--Valid--+-+ | |------------------------| + | Standard-Vt sensor |-+ +---+--| Interrupts mask | + +--------------------+ |------------------------| + ^ | Interrupts timeout | + | +------------------------+ + | ^ ^ + Rclk-----+----------------------------------------+ | + APB3-------------------------------------------------+ + + This bindings describes the external Baikal-T1 PVT control interfaces + like MMIO registers space, interrupt request number and clocks source. + These are then used by the corresponding hwmon device driver to + implement the sysfs files-based access to the sensors functionality. + +properties: + compatible: + const: baikal,bt1-pvt + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + items: + - description: PVT reference clock + - description: APB3 interface clock + + clock-names: + items: + - const: ref + - const: pclk + + "#thermal-sensor-cells": + description: Baikal-T1 can be referenced as the CPU thermal-sensor + const: 0 + + baikal,pvt-temp-offset-millicelsius: + description: | + Temperature sensor trimming factor. It can be used to manually adjust the + temperature measurements within 7.130 degrees Celsius. + maxItems: 1 + items: + default: 0 + minimum: 0 + maximum: 7130 + +unevaluatedProperties: false + +required: + - compatible + - reg + - interrupts + - clocks + - clock-names + +examples: + - | + #include <dt-bindings/interrupt-controller/mips-gic.h> + + pvt@1f200000 { + compatible = "baikal,bt1-pvt"; + reg = <0x1f200000 0x1000>; + #thermal-sensor-cells = <0>; + + interrupts = <GIC_SHARED 31 IRQ_TYPE_LEVEL_HIGH>; + + baikal,pvt-temp-trim-millicelsius = <1000>; + + clocks = <&ccu_sys>, <&ccu_sys>; + clock-names = "ref", "pclk"; + }; +... diff --git a/Documentation/devicetree/bindings/mfd/gateworks-gsc.yaml b/Documentation/devicetree/bindings/mfd/gateworks-gsc.yaml new file mode 100644 index 000000000000..487a8445722e --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/gateworks-gsc.yaml @@ -0,0 +1,196 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/mfd/gateworks-gsc.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Gateworks System Controller + +description: | + The Gateworks System Controller (GSC) is a device present across various + Gateworks product families that provides a set of system related features + such as the following (refer to the board hardware user manuals to see what + features are present) + - Watchdog Timer + - GPIO + - Pushbutton controller + - Hardware monitor with ADC's for temperature and voltage rails and + fan controller + +maintainers: + - Tim Harvey <tharvey@gateworks.com> + - Robert Jones <rjones@gateworks.com> + +properties: + $nodename: + pattern: "gsc@[0-9a-f]{1,2}" + compatible: + const: gw,gsc + + reg: + description: I2C device address + maxItems: 1 + + interrupts: + maxItems: 1 + + interrupt-controller: true + + "#interrupt-cells": + const: 1 + + "#address-cells": + const: 1 + + "#size-cells": + const: 0 + + adc: + type: object + description: Optional hardware monitoring module + + properties: + compatible: + const: gw,gsc-adc + + "#address-cells": + const: 1 + + "#size-cells": + const: 0 + + patternProperties: + "^channel@[0-9]+$": + type: object + description: | + Properties for a single ADC which can report cooked values + (i.e. temperature sensor based on thermister), raw values + (i.e. voltage rail with a pre-scaling resistor divider). + + properties: + reg: + description: Register of the ADC + maxItems: 1 + + label: + description: Name of the ADC input + + gw,mode: + description: | + conversion mode: + 0 - temperature, in C*10 + 1 - pre-scaled voltage value + 2 - scaled voltage based on an optional resistor divider + and optional offset + $ref: /schemas/types.yaml#/definitions/uint32 + enum: [0, 1, 2] + + gw,voltage-divider-ohms: + description: Values of resistors for divider on raw ADC input + maxItems: 2 + items: + minimum: 1000 + maximum: 1000000 + + gw,voltage-offset-microvolt: + description: | + A positive voltage offset to apply to a raw ADC + (i.e. to compensate for a diode drop). + minimum: 0 + maximum: 1000000 + + required: + - gw,mode + - reg + - label + + required: + - compatible + - "#address-cells" + - "#size-cells" + +patternProperties: + "^fan-controller@[0-9a-f]+$": + type: object + description: Optional fan controller + + properties: + compatible: + const: gw,gsc-fan + + "#address-cells": + const: 1 + + "#size-cells": + const: 0 + + reg: + description: The fan controller base address + maxItems: 1 + + required: + - compatible + - reg + - "#address-cells" + - "#size-cells" + +required: + - compatible + - reg + - interrupts + - interrupt-controller + - "#interrupt-cells" + - "#address-cells" + - "#size-cells" + +examples: + - | + #include <dt-bindings/gpio/gpio.h> + i2c { + #address-cells = <1>; + #size-cells = <0>; + + gsc@20 { + compatible = "gw,gsc"; + reg = <0x20>; + interrupt-parent = <&gpio1>; + interrupts = <4 GPIO_ACTIVE_LOW>; + interrupt-controller; + #interrupt-cells = <1>; + #address-cells = <1>; + #size-cells = <0>; + + adc { + compatible = "gw,gsc-adc"; + #address-cells = <1>; + #size-cells = <0>; + + channel@0 { /* A0: Board Temperature */ + reg = <0x00>; + label = "temp"; + gw,mode = <0>; + }; + + channel@2 { /* A1: Input Voltage (raw ADC) */ + reg = <0x02>; + label = "vdd_vin"; + gw,mode = <1>; + gw,voltage-divider-ohms = <22100 1000>; + gw,voltage-offset-microvolt = <800000>; + }; + + channel@b { /* A2: Battery voltage */ + reg = <0x0b>; + label = "vdd_bat"; + gw,mode = <1>; + }; + }; + + fan-controller@2c { + #address-cells = <1>; + #size-cells = <0>; + compatible = "gw,gsc-fan"; + reg = <0x2c>; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/mfd/max8998.txt b/Documentation/devicetree/bindings/mfd/max8998.txt index 5f2f07c09c90..4ed52184d081 100644 --- a/Documentation/devicetree/bindings/mfd/max8998.txt +++ b/Documentation/devicetree/bindings/mfd/max8998.txt @@ -73,6 +73,8 @@ number as described in MAX8998 datasheet. - ESAFEOUT1: (ldo19) - ESAFEOUT2: (ld020) + - CHARGER: main battery charger current control + Standard regulator bindings are used inside regulator subnodes. Check Documentation/devicetree/bindings/regulator/regulator.txt for more details. @@ -113,5 +115,11 @@ Example: regulator-always-on; regulator-boot-on; }; + + charger_reg: CHARGER { + regulator-name = "CHARGER"; + regulator-min-microamp = <90000>; + regulator-max-microamp = <800000>; + }; }; }; diff --git a/Documentation/devicetree/bindings/net/dsa/b53.txt b/Documentation/devicetree/bindings/net/dsa/b53.txt index 5201bc15fdd6..cfd1afdc6e94 100644 --- a/Documentation/devicetree/bindings/net/dsa/b53.txt +++ b/Documentation/devicetree/bindings/net/dsa/b53.txt @@ -110,6 +110,9 @@ Ethernet switch connected via MDIO to the host, CPU port wired to eth0: #size-cells = <0>; ports { + #address-cells = <1>; + #size-cells = <0>; + port0@0 { reg = <0>; label = "lan1"; diff --git a/Documentation/devicetree/bindings/regulator/anatop-regulator.txt b/Documentation/devicetree/bindings/regulator/anatop-regulator.txt deleted file mode 100644 index a3106c72fbea..000000000000 --- a/Documentation/devicetree/bindings/regulator/anatop-regulator.txt +++ /dev/null @@ -1,40 +0,0 @@ -Anatop Voltage regulators - -Required properties: -- compatible: Must be "fsl,anatop-regulator" -- regulator-name: A string used as a descriptive name for regulator outputs -- anatop-reg-offset: Anatop MFD register offset -- anatop-vol-bit-shift: Bit shift for the register -- anatop-vol-bit-width: Number of bits used in the register -- anatop-min-bit-val: Minimum value of this register -- anatop-min-voltage: Minimum voltage of this regulator -- anatop-max-voltage: Maximum voltage of this regulator - -Optional properties: -- anatop-delay-reg-offset: Anatop MFD step time register offset -- anatop-delay-bit-shift: Bit shift for the step time register -- anatop-delay-bit-width: Number of bits used in the step time register -- vin-supply: The supply for this regulator -- anatop-enable-bit: Regulator enable bit offset - -Any property defined as part of the core regulator -binding, defined in regulator.txt, can also be used. - -Example: - - regulator-vddpu { - compatible = "fsl,anatop-regulator"; - regulator-name = "vddpu"; - regulator-min-microvolt = <725000>; - regulator-max-microvolt = <1300000>; - regulator-always-on; - anatop-reg-offset = <0x140>; - anatop-vol-bit-shift = <9>; - anatop-vol-bit-width = <5>; - anatop-delay-reg-offset = <0x170>; - anatop-delay-bit-shift = <24>; - anatop-delay-bit-width = <2>; - anatop-min-bit-val = <1>; - anatop-min-voltage = <725000>; - anatop-max-voltage = <1300000>; - }; diff --git a/Documentation/devicetree/bindings/regulator/anatop-regulator.yaml b/Documentation/devicetree/bindings/regulator/anatop-regulator.yaml new file mode 100644 index 000000000000..e7b3abe30363 --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/anatop-regulator.yaml @@ -0,0 +1,94 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/anatop-regulator.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Freescale Anatop Voltage Regulators + +maintainers: + - Ying-Chun Liu (PaulLiu) <paul.liu@linaro.org> + +allOf: + - $ref: "regulator.yaml#" + +properties: + compatible: + const: fsl,anatop-regulator + + regulator-name: true + + anatop-reg-offset: + $ref: '/schemas/types.yaml#/definitions/uint32' + description: u32 value representing the anatop MFD register offset. + + anatop-vol-bit-shift: + $ref: '/schemas/types.yaml#/definitions/uint32' + description: u32 value representing the bit shift for the register. + + anatop-vol-bit-width: + $ref: '/schemas/types.yaml#/definitions/uint32' + description: u32 value representing the number of bits used in the register. + + anatop-min-bit-val: + $ref: '/schemas/types.yaml#/definitions/uint32' + description: u32 value representing the minimum value of this register. + + anatop-min-voltage: + $ref: '/schemas/types.yaml#/definitions/uint32' + description: u32 value representing the minimum voltage of this regulator. + + anatop-max-voltage: + $ref: '/schemas/types.yaml#/definitions/uint32' + description: u32 value representing the maximum voltage of this regulator. + + anatop-delay-reg-offset: + $ref: '/schemas/types.yaml#/definitions/uint32' + description: u32 value representing the anatop MFD step time register offset. + + anatop-delay-bit-shift: + $ref: '/schemas/types.yaml#/definitions/uint32' + description: u32 value representing the bit shift for the step time register. + + anatop-delay-bit-width: + $ref: '/schemas/types.yaml#/definitions/uint32' + description: u32 value representing the number of bits used in the step time register. + + anatop-enable-bit: + $ref: '/schemas/types.yaml#/definitions/uint32' + description: u32 value representing regulator enable bit offset. + + vin-supply: + $ref: '/schemas/types.yaml#/definitions/phandle' + description: input supply phandle. + +required: + - compatible + - regulator-name + - anatop-reg-offset + - anatop-vol-bit-shift + - anatop-vol-bit-width + - anatop-min-bit-val + - anatop-min-voltage + - anatop-max-voltage + +unevaluatedProperties: false + +examples: + - | + regulator-vddpu { + compatible = "fsl,anatop-regulator"; + regulator-name = "vddpu"; + regulator-min-microvolt = <725000>; + regulator-max-microvolt = <1300000>; + regulator-always-on; + anatop-reg-offset = <0x140>; + anatop-vol-bit-shift = <9>; + anatop-vol-bit-width = <5>; + anatop-delay-reg-offset = <0x170>; + anatop-delay-bit-shift = <24>; + anatop-delay-bit-width = <2>; + anatop-min-bit-val = <1>; + anatop-min-voltage = <725000>; + anatop-max-voltage = <1300000>; + }; diff --git a/Documentation/devicetree/bindings/regulator/maxim,max77826.yaml b/Documentation/devicetree/bindings/regulator/maxim,max77826.yaml new file mode 100644 index 000000000000..19cbd5eb2897 --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/maxim,max77826.yaml @@ -0,0 +1,68 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/maxim,max77826.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Maxim Integrated MAX77826 PMIC + +maintainers: + - Iskren Chernev <iskren.chernev@gmail.com> + +properties: + $nodename: + pattern: "pmic@[0-9a-f]{1,2}" + compatible: + enum: + - maxim,max77826 + + reg: + maxItems: 1 + + regulators: + type: object + allOf: + - $ref: regulator.yaml# + description: | + list of regulators provided by this controller, must be named + after their hardware counterparts LDO[1-15], BUCK and BUCKBOOST + + patternProperties: + "^LDO([1-9]|1[0-5])$": + type: object + allOf: + - $ref: regulator.yaml# + + "^BUCK|BUCKBOOST$": + type: object + allOf: + - $ref: regulator.yaml# + + additionalProperties: false + +required: + - compatible + - reg + - regulators + +additionalProperties: false + +examples: + - | + i2c { + #address-cells = <1>; + #size-cells = <0>; + + pmic@69 { + compatible = "maxim,max77826"; + reg = <0x69>; + + regulators { + LDO2 { + regulator-min-microvolt = <650000>; + regulator-max-microvolt = <3587500>; + }; + }; + }; + }; +... diff --git a/Documentation/devicetree/bindings/reserved-memory/ramoops.txt b/Documentation/devicetree/bindings/reserved-memory/ramoops.txt index 0eba562fe5c6..b7886fea368c 100644 --- a/Documentation/devicetree/bindings/reserved-memory/ramoops.txt +++ b/Documentation/devicetree/bindings/reserved-memory/ramoops.txt @@ -30,7 +30,7 @@ Optional properties: - ecc-size: enables ECC support and specifies ECC buffer size in bytes (defaults to 0: no ECC) -- record-size: maximum size in bytes of each dump done on oops/panic +- record-size: maximum size in bytes of each kmsg dump. (defaults to 0: disabled) - console-size: size in bytes of log buffer reserved for kernel messages @@ -45,7 +45,16 @@ Optional properties: - unbuffered: if present, use unbuffered mappings to map the reserved region (defaults to buffered mappings) -- no-dump-oops: if present, only dump panics (defaults to panics and oops) +- max-reason: if present, sets maximum type of kmsg dump reasons to store + (defaults to 2: log Oopses and Panics). This can be set to INT_MAX to + store all kmsg dumps. See include/linux/kmsg_dump.h KMSG_DUMP_* for other + kmsg dump reason values. Setting this to 0 (KMSG_DUMP_UNDEF), means the + reason filtering will be controlled by the printk.always_kmsg_dump boot + param: if unset, it will be KMSG_DUMP_OOPS, otherwise KMSG_DUMP_MAX. + +- no-dump-oops: deprecated, use max_reason instead. If present, and + max_reason is not specified, it is equivalent to max_reason = 1 + (KMSG_DUMP_PANIC). - flags: if present, pass ramoops behavioral flags (defaults to 0, see include/linux/pstore_ram.h RAMOOPS_FLAG_* for flag values). diff --git a/Documentation/devicetree/bindings/rng/arm-cctrng.yaml b/Documentation/devicetree/bindings/rng/arm-cctrng.yaml new file mode 100644 index 000000000000..ca6aad19b6ba --- /dev/null +++ b/Documentation/devicetree/bindings/rng/arm-cctrng.yaml @@ -0,0 +1,54 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/rng/arm-cctrng.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Arm TrustZone CryptoCell TRNG engine + +maintainers: + - Hadar Gat <hadar.gat@arm.com> + +description: |+ + Arm TrustZone CryptoCell TRNG (True Random Number Generator) engine. + +properties: + compatible: + enum: + - arm,cryptocell-713-trng + - arm,cryptocell-703-trng + + interrupts: + maxItems: 1 + + reg: + maxItems: 1 + + arm,rosc-ratio: + description: + Arm TrustZone CryptoCell TRNG engine has 4 ring oscillators. + Sampling ratio values for these 4 ring oscillators. (from calibration) + allOf: + - $ref: /schemas/types.yaml#/definitions/uint32-array + - items: + maxItems: 4 + + clocks: + maxItems: 1 + +required: + - compatible + - interrupts + - reg + - arm,rosc-ratio + +additionalProperties: false + +examples: + - | + arm_cctrng: rng@60000000 { + compatible = "arm,cryptocell-713-trng"; + interrupts = <0 29 4>; + reg = <0x60000000 0x10000>; + arm,rosc-ratio = <5000 1000 500 0>; + }; diff --git a/Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.txt b/Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.txt index ad7ac80a3841..f5e518d099f2 100644 --- a/Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.txt +++ b/Documentation/devicetree/bindings/spi/brcm,spi-bcm-qspi.txt @@ -26,6 +26,16 @@ Required properties: "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-qspi" : MSPI+BSPI on BRCMSTB SoCs "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI BRCMSTB SoCs + "brcm,spi-bcm7425-qspi", "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI + BRCMSTB SoCs + "brcm,spi-bcm7429-qspi", "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI + BRCMSTB SoCs + "brcm,spi-bcm7435-qspi", "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI + BRCMSTB SoCs + "brcm,spi-bcm7216-qspi", "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI + BRCMSTB SoCs + "brcm,spi-bcm7278-qspi", "brcm,spi-bcm-qspi", "brcm,spi-brcmstb-mspi" : Second Instance of MSPI + BRCMSTB SoCs "brcm,spi-bcm-qspi", "brcm,spi-nsp-qspi" : MSPI+BSPI on Cygnus, NSP "brcm,spi-bcm-qspi", "brcm,spi-ns2-qspi" : NS2 SoCs diff --git a/Documentation/devicetree/bindings/spi/mikrotik,rb4xx-spi.yaml b/Documentation/devicetree/bindings/spi/mikrotik,rb4xx-spi.yaml new file mode 100644 index 000000000000..4ddb42a4ae05 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/mikrotik,rb4xx-spi.yaml @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/spi/mikrotik,rb4xx-spi.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: MikroTik RB4xx series SPI master + +maintainers: + - Gabor Juhos <juhosg@openwrt.org> + - Bert Vermeulen <bert@biot.com> + +allOf: + - $ref: "spi-controller.yaml#" + +properties: + compatible: + const: mikrotik,rb4xx-spi + + reg: + maxItems: 1 + +required: + - compatible + - reg + +examples: + - | + spi: spi@1f000000 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "mikrotik,rb4xx-spi"; + reg = <0x1f000000 0x10>; + }; + +...
\ No newline at end of file diff --git a/Documentation/devicetree/bindings/spi/renesas,rspi.yaml b/Documentation/devicetree/bindings/spi/renesas,rspi.yaml new file mode 100644 index 000000000000..c54ac059043f --- /dev/null +++ b/Documentation/devicetree/bindings/spi/renesas,rspi.yaml @@ -0,0 +1,144 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/spi/renesas,rspi.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Renesas (Quad) Serial Peripheral Interface (RSPI/QSPI) + +maintainers: + - Geert Uytterhoeven <geert+renesas@glider.be> + +properties: + compatible: + oneOf: + - items: + - enum: + - renesas,rspi-sh7757 # SH7757 + - const: renesas,rspi # Legacy SH + + - items: + - enum: + - renesas,rspi-r7s72100 # RZ/A1H + - renesas,rspi-r7s9210 # RZ/A2 + - const: renesas,rspi-rz # RZ/A + + - items: + - enum: + - renesas,qspi-r8a7743 # RZ/G1M + - renesas,qspi-r8a7744 # RZ/G1N + - renesas,qspi-r8a7745 # RZ/G1E + - renesas,qspi-r8a77470 # RZ/G1C + - renesas,qspi-r8a7790 # R-Car H2 + - renesas,qspi-r8a7791 # R-Car M2-W + - renesas,qspi-r8a7792 # R-Car V2H + - renesas,qspi-r8a7793 # R-Car M2-N + - renesas,qspi-r8a7794 # R-Car E2 + - const: renesas,qspi # R-Car Gen2 and RZ/G1 + + reg: + maxItems: 1 + + interrupts: + oneOf: + - items: + - description: A combined interrupt + - items: + - description: Error interrupt (SPEI) + - description: Receive Interrupt (SPRI) + - description: Transmit Interrupt (SPTI) + + interrupt-names: + oneOf: + - items: + - const: mux + - items: + - const: error + - const: rx + - const: tx + + clocks: + maxItems: 1 + + power-domains: + maxItems: 1 + + resets: + maxItems: 1 + + dmas: + description: + Must contain a list of pairs of references to DMA specifiers, one for + transmission, and one for reception. + + dma-names: + minItems: 2 + maxItems: 4 + items: + enum: + - tx + - rx + + num-cs: + description: | + Total number of native chip selects. + Hardware limitations related to chip selects: + - When using GPIO chip selects, at least one native chip select must + be left unused, as it will be driven anyway. + minimum: 1 + maximum: 2 + default: 1 + +required: + - compatible + - reg + - interrupts + - clocks + - power-domains + - '#address-cells' + - '#size-cells' + +allOf: + - $ref: spi-controller.yaml# + - if: + properties: + compatible: + contains: + enum: + - renesas,rspi-rz + then: + properties: + interrupts: + minItems: 3 + required: + - interrupt-names + + - if: + properties: + compatible: + contains: + enum: + - renesas,qspi + then: + required: + - resets + +examples: + - | + #include <dt-bindings/clock/r8a7791-cpg-mssr.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + #include <dt-bindings/power/r8a7791-sysc.h> + + qspi: spi@e6b10000 { + compatible = "renesas,qspi-r8a7791", "renesas,qspi"; + reg = <0xe6b10000 0x2c>; + interrupts = <GIC_SPI 184 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&cpg CPG_MOD 917>; + dmas = <&dmac0 0x17>, <&dmac0 0x18>, <&dmac1 0x17>, <&dmac1 0x18>; + dma-names = "tx", "rx", "tx", "rx"; + power-domains = <&sysc R8A7791_PD_ALWAYS_ON>; + resets = <&cpg 917>; + num-cs = <1>; + #address-cells = <1>; + #size-cells = <0>; + }; diff --git a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt deleted file mode 100644 index 3ed08ee9feba..000000000000 --- a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt +++ /dev/null @@ -1,41 +0,0 @@ -Synopsys DesignWare AMBA 2.0 Synchronous Serial Interface. - -Required properties: -- compatible : "snps,dw-apb-ssi" or "mscc,<soc>-spi", where soc is "ocelot" or - "jaguar2", or "amazon,alpine-dw-apb-ssi" -- reg : The register base for the controller. For "mscc,<soc>-spi", a second - register set is required (named ICPU_CFG:SPI_MST) -- interrupts : One interrupt, used by the controller. -- #address-cells : <1>, as required by generic SPI binding. -- #size-cells : <0>, also as required by generic SPI binding. -- clocks : phandles for the clocks, see the description of clock-names below. - The phandle for the "ssi_clk" is required. The phandle for the "pclk" clock - is optional. If a single clock is specified but no clock-name, it is the - "ssi_clk" clock. If both clocks are listed, the "ssi_clk" must be first. - -Optional properties: -- clock-names : Contains the names of the clocks: - "ssi_clk", for the core clock used to generate the external SPI clock. - "pclk", the interface clock, required for register access. If a clock domain - used to enable this clock then it should be named "pclk_clkdomain". -- cs-gpios : Specifies the gpio pins to be used for chipselects. -- num-cs : The number of chipselects. If omitted, this will default to 4. -- reg-io-width : The I/O register width (in bytes) implemented by this - device. Supported values are 2 or 4 (the default). - -Child nodes as per the generic SPI binding. - -Example: - - spi@fff00000 { - compatible = "snps,dw-apb-ssi"; - reg = <0xfff00000 0x1000>; - interrupts = <0 154 4>; - #address-cells = <1>; - #size-cells = <0>; - clocks = <&spi_m_clk>; - num-cs = <2>; - cs-gpios = <&gpio0 13 0>, - <&gpio0 14 0>; - }; - diff --git a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml new file mode 100644 index 000000000000..c62cbe79f00d --- /dev/null +++ b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml @@ -0,0 +1,133 @@ +# SPDX-License-Identifier: GPL-2.0-only +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/spi/snps,dw-apb-ssi.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Synopsys DesignWare AMBA 2.0 Synchronous Serial Interface + +maintainers: + - Mark Brown <broonie@kernel.org> + +allOf: + - $ref: "spi-controller.yaml#" + - if: + properties: + compatible: + contains: + enum: + - mscc,ocelot-spi + - mscc,jaguar2-spi + then: + properties: + reg: + minItems: 2 + +properties: + compatible: + oneOf: + - description: Generic DW SPI Controller + enum: + - snps,dw-apb-ssi + - snps,dwc-ssi-1.01a + - description: Microsemi Ocelot/Jaguar2 SoC SPI Controller + items: + - enum: + - mscc,ocelot-spi + - mscc,jaguar2-spi + - const: snps,dw-apb-ssi + - description: Amazon Alpine SPI Controller + const: amazon,alpine-dw-apb-ssi + - description: Renesas RZ/N1 SPI Controller + items: + - const: renesas,rzn1-spi + - const: snps,dw-apb-ssi + - description: Intel Keem Bay SPI Controller + const: intel,keembay-ssi + + reg: + minItems: 1 + items: + - description: DW APB SSI controller memory mapped registers + - description: SPI MST region map + + interrupts: + maxItems: 1 + + clocks: + minItems: 1 + items: + - description: SPI Controller reference clock source + - description: APB interface clock source + + clock-names: + minItems: 1 + items: + - const: ssi_clk + - const: pclk + + resets: + maxItems: 1 + + reset-names: + const: spi + + reg-io-width: + $ref: /schemas/types.yaml#/definitions/uint32 + description: I/O register width (in bytes) implemented by this device + default: 4 + enum: [ 2, 4 ] + + num-cs: + default: 4 + minimum: 1 + maximum: 4 + + dmas: + items: + - description: TX DMA Channel + - description: RX DMA Channel + + dma-names: + items: + - const: tx + - const: rx + +patternProperties: + "^.*@[0-9a-f]+$": + type: object + properties: + reg: + minimum: 0 + maximum: 3 + + spi-rx-bus-width: + const: 1 + + spi-tx-bus-width: + const: 1 + +unevaluatedProperties: false + +required: + - compatible + - reg + - "#address-cells" + - "#size-cells" + - interrupts + - clocks + +examples: + - | + spi@fff00000 { + compatible = "snps,dw-apb-ssi"; + reg = <0xfff00000 0x1000>; + #address-cells = <1>; + #size-cells = <0>; + interrupts = <0 154 4>; + clocks = <&spi_m_clk>; + num-cs = <2>; + cs-gpios = <&gpio0 13 0>, + <&gpio0 14 0>; + }; +... diff --git a/Documentation/devicetree/bindings/spi/socionext,uniphier-spi.yaml b/Documentation/devicetree/bindings/spi/socionext,uniphier-spi.yaml new file mode 100644 index 000000000000..c25409298bdf --- /dev/null +++ b/Documentation/devicetree/bindings/spi/socionext,uniphier-spi.yaml @@ -0,0 +1,57 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/spi/socionext,uniphier-spi.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Socionext UniPhier SPI controller + +description: | + UniPhier SoCs have SCSSI which supports SPI single channel. + +maintainers: + - Kunihiko Hayashi <hayashi.kunihiko@socionext.com> + - Keiji Hayashibara <hayashibara.keiji@socionext.com> + +allOf: + - $ref: spi-controller.yaml# + +properties: + "#address-cells": true + "#size-cells": true + + compatible: + const: socionext,uniphier-scssi + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + maxItems: 1 + + resets: + maxItems: 1 + +required: + - compatible + - reg + - interrupts + - clocks + - resets + - "#address-cells" + - "#size-cells" + +examples: + - | + spi0: spi@54006000 { + compatible = "socionext,uniphier-scssi"; + reg = <0x54006000 0x100>; + #address-cells = <1>; + #size-cells = <0>; + interrupts = <0 39 4>; + clocks = <&peri_clk 11>; + resets = <&peri_rst 11>; + }; diff --git a/Documentation/devicetree/bindings/spi/spi-dw.txt b/Documentation/devicetree/bindings/spi/spi-dw.txt deleted file mode 100644 index 7b63ed601990..000000000000 --- a/Documentation/devicetree/bindings/spi/spi-dw.txt +++ /dev/null @@ -1,24 +0,0 @@ -Synopsys DesignWare SPI master - -Required properties: -- compatible: should be "snps,designware-spi" -- #address-cells: see spi-bus.txt -- #size-cells: see spi-bus.txt -- reg: address and length of the spi master registers -- interrupts: should contain one interrupt -- clocks: spi clock phandle -- num-cs: see spi-bus.txt - -Optional properties: -- cs-gpios: see spi-bus.txt - -Example: - -spi: spi@4020a000 { - compatible = "snps,designware-spi"; - interrupts = <11 1>; - reg = <0x4020a000 0x1000>; - clocks = <&pclk>; - num-cs = <2>; - cs-gpios = <&banka 0 0>; -}; diff --git a/Documentation/devicetree/bindings/spi/spi-rspi.txt b/Documentation/devicetree/bindings/spi/spi-rspi.txt deleted file mode 100644 index 421722b93992..000000000000 --- a/Documentation/devicetree/bindings/spi/spi-rspi.txt +++ /dev/null @@ -1,73 +0,0 @@ -Device tree configuration for Renesas RSPI/QSPI driver - -Required properties: -- compatible : For Renesas Serial Peripheral Interface on legacy SH: - "renesas,rspi-<soctype>", "renesas,rspi" as fallback. - For Renesas Serial Peripheral Interface on RZ/A: - "renesas,rspi-<soctype>", "renesas,rspi-rz" as fallback. - For Quad Serial Peripheral Interface on R-Car Gen2 and - RZ/G1 devices: - "renesas,qspi-<soctype>", "renesas,qspi" as fallback. - Examples with soctypes are: - - "renesas,rspi-sh7757" (SH) - - "renesas,rspi-r7s72100" (RZ/A1H) - - "renesas,rspi-r7s9210" (RZ/A2) - - "renesas,qspi-r8a7743" (RZ/G1M) - - "renesas,qspi-r8a7744" (RZ/G1N) - - "renesas,qspi-r8a7745" (RZ/G1E) - - "renesas,qspi-r8a77470" (RZ/G1C) - - "renesas,qspi-r8a7790" (R-Car H2) - - "renesas,qspi-r8a7791" (R-Car M2-W) - - "renesas,qspi-r8a7792" (R-Car V2H) - - "renesas,qspi-r8a7793" (R-Car M2-N) - - "renesas,qspi-r8a7794" (R-Car E2) -- reg : Address start and address range size of the device -- interrupts : A list of interrupt-specifiers, one for each entry in - interrupt-names. - If interrupt-names is not present, an interrupt specifier - for a single muxed interrupt. -- interrupt-names : A list of interrupt names. Should contain (if present): - - "error" for SPEI, - - "rx" for SPRI, - - "tx" to SPTI, - - "mux" for a single muxed interrupt. -- num-cs : Number of chip selects. Some RSPI cores have more than 1. -- #address-cells : Must be <1> -- #size-cells : Must be <0> - -Optional properties: -- clocks : Must contain a reference to the functional clock. -- dmas : Must contain a list of two references to DMA specifiers, - one for transmission, and one for reception. -- dma-names : Must contain a list of two DMA names, "tx" and "rx". - -Pinctrl properties might be needed, too. See -Documentation/devicetree/bindings/pinctrl/renesas,*. - -Examples: - - spi0: spi@e800c800 { - compatible = "renesas,rspi-r7s72100", "renesas,rspi-rz"; - reg = <0xe800c800 0x24>; - interrupts = <0 238 IRQ_TYPE_LEVEL_HIGH>, - <0 239 IRQ_TYPE_LEVEL_HIGH>, - <0 240 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "error", "rx", "tx"; - interrupt-parent = <&gic>; - num-cs = <1>; - #address-cells = <1>; - #size-cells = <0>; - }; - - spi: spi@e6b10000 { - compatible = "renesas,qspi-r8a7791", "renesas,qspi"; - reg = <0 0xe6b10000 0 0x2c>; - interrupt-parent = <&gic>; - interrupts = <0 184 IRQ_TYPE_LEVEL_HIGH>; - clocks = <&mstp9_clks R8A7791_CLK_QSPI_MOD>; - num-cs = <1>; - #address-cells = <1>; - #size-cells = <0>; - dmas = <&dmac0 0x17>, <&dmac0 0x18>; - dma-names = "tx", "rx"; - }; diff --git a/Documentation/devicetree/bindings/spi/spi-uniphier.txt b/Documentation/devicetree/bindings/spi/spi-uniphier.txt deleted file mode 100644 index e1201573a29a..000000000000 --- a/Documentation/devicetree/bindings/spi/spi-uniphier.txt +++ /dev/null @@ -1,28 +0,0 @@ -Socionext UniPhier SPI controller driver - -UniPhier SoCs have SCSSI which supports SPI single channel. - -Required properties: - - compatible: should be "socionext,uniphier-scssi" - - reg: address and length of the spi master registers - - #address-cells: must be <1>, see spi-bus.txt - - #size-cells: must be <0>, see spi-bus.txt - - interrupts: a single interrupt specifier - - pinctrl-names: should be "default" - - pinctrl-0: pin control state for the default mode - - clocks: a phandle to the clock for the device - - resets: a phandle to the reset control for the device - -Example: - -spi0: spi@54006000 { - compatible = "socionext,uniphier-scssi"; - reg = <0x54006000 0x100>; - #address-cells = <1>; - #size-cells = <0>; - interrupts = <0 39 4>; - pinctrl-names = "default"; - pinctrl-0 = <&pinctrl_spi0>; - clocks = <&peri_clk 11>; - resets = <&peri_rst 11>; -}; diff --git a/Documentation/devicetree/bindings/spi/ti_qspi.txt b/Documentation/devicetree/bindings/spi/ti_qspi.txt index e65fde4a7388..47b184bce414 100644 --- a/Documentation/devicetree/bindings/spi/ti_qspi.txt +++ b/Documentation/devicetree/bindings/spi/ti_qspi.txt @@ -29,7 +29,7 @@ modification to bootloader. Example: For am4372: -qspi: qspi@4b300000 { +qspi: qspi@47900000 { compatible = "ti,am4372-qspi"; reg = <0x47900000 0x100>, <0x30000000 0x4000000>; reg-names = "qspi_base", "qspi_mmap"; diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml b/Documentation/devicetree/bindings/vendor-prefixes.yaml index d3891386d671..d3277fe6640b 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.yaml +++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml @@ -633,6 +633,8 @@ patternProperties: description: Microsoft Corporation "^mikroe,.*": description: MikroElektronika d.o.o. + "^mikrotik,.*": + description: MikroTik "^miniand,.*": description: Miniand Tech "^minix,.*": diff --git a/Documentation/fb/efifb.rst b/Documentation/fb/efifb.rst index 04840331a00e..6badff64756f 100644 --- a/Documentation/fb/efifb.rst +++ b/Documentation/fb/efifb.rst @@ -2,8 +2,10 @@ What is efifb? ============== -This is a generic EFI platform driver for Intel based Apple computers. -efifb is only for EFI booted Intel Macs. +This is a generic EFI platform driver for systems with UEFI firmware. The +system must be booted via the EFI stub for this to be usable. efifb supports +both firmware with Graphics Output Protocol (GOP) displays as well as older +systems with only Universal Graphics Adapter (UGA) displays. Supported Hardware ================== @@ -12,11 +14,14 @@ Supported Hardware - Macbook - Macbook Pro 15"/17" - MacMini +- ARM/ARM64/X86 systems with UEFI firmware How to use it? ============== -efifb does not have any kind of autodetection of your machine. +For UGA displays, efifb does not have any kind of autodetection of your +machine. + You have to add the following kernel parameters in your elilo.conf:: Macbook : @@ -28,6 +33,9 @@ You have to add the following kernel parameters in your elilo.conf:: Macbook Pro 17", iMac 20" : video=efifb:i20 +For GOP displays, efifb can autodetect the display's resolution and framebuffer +address, so these should work out of the box without any special parameters. + Accepted options: ======= =========================================================== @@ -36,4 +44,28 @@ nowc Don't map the framebuffer write combined. This can be used when large amounts of console data are written. ======= =========================================================== +Options for GOP displays: + +mode=n + The EFI stub will set the mode of the display to mode number n if + possible. + +<xres>x<yres>[-(rgb|bgr|<bpp>)] + The EFI stub will search for a display mode that matches the specified + horizontal and vertical resolution, and optionally bit depth, and set + the mode of the display to it if one is found. The bit depth can either + "rgb" or "bgr" to match specifically those pixel formats, or a number + for a mode with matching bits per pixel. + +auto + The EFI stub will choose the mode with the highest resolution (product + of horizontal and vertical resolution). If there are multiple modes + with the highest resolution, it will choose one with the highest color + depth. + +list + The EFI stub will list out all the display modes that are available. A + specific mode can then be chosen using one of the above options for the + next boot. + Edgar Hucek <gimli@dark-green.com> diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index 87d794bc75a4..4218ac658629 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -225,8 +225,12 @@ fsync_mode=%s Control the policy of fsync. Currently supports "posix", pass, but the performance will regress. "nobarrier" is based on "posix", but doesn't issue flush command for non-atomic files likewise "nobarrier" mount option. -test_dummy_encryption Enable dummy encryption, which provides a fake fscrypt +test_dummy_encryption +test_dummy_encryption=%s + Enable dummy encryption, which provides a fake fscrypt context. The fake fscrypt context is used by xfstests. + The argument may be either "v1" or "v2", in order to + select the corresponding fscrypt policy version. checkpoint=%s[:%u[%]] Set to "disable" to turn off checkpointing. Set to "enable" to reenable checkpointing. Is enabled by default. While disabled, any unmounting or unexpected shutdowns will cause diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst index aa072112cfff..f517af8ec11c 100644 --- a/Documentation/filesystems/fscrypt.rst +++ b/Documentation/filesystems/fscrypt.rst @@ -292,8 +292,22 @@ files' data differently, inode numbers are included in the IVs. Consequently, shrinking the filesystem may not be allowed. This format is optimized for use with inline encryption hardware -compliant with the UFS or eMMC standards, which support only 64 IV -bits per I/O request and may have only a small number of keyslots. +compliant with the UFS standard, which supports only 64 IV bits per +I/O request and may have only a small number of keyslots. + +IV_INO_LBLK_32 policies +----------------------- + +IV_INO_LBLK_32 policies work like IV_INO_LBLK_64, except that for +IV_INO_LBLK_32, the inode number is hashed with SipHash-2-4 (where the +SipHash key is derived from the master key) and added to the file +logical block number mod 2^32 to produce a 32-bit IV. + +This format is optimized for use with inline encryption hardware +compliant with the eMMC v5.2 standard, which supports only 32 IV bits +per I/O request and may have only a small number of keyslots. This +format results in some level of IV reuse, so it should only be used +when necessary due to hardware limitations. Key identifiers --------------- @@ -369,6 +383,10 @@ a little endian number, except that: to 32 bits and is placed in bits 0-31 of the IV. The inode number (which is also limited to 32 bits) is placed in bits 32-63. +- With `IV_INO_LBLK_32 policies`_, the logical block number is limited + to 32 bits and is placed in bits 0-31 of the IV. The inode number + is then hashed and added mod 2^32. + Note that because file logical block numbers are included in the IVs, filesystems must enforce that blocks are never shifted around within encrypted files, e.g. via "collapse range" or "insert range". @@ -465,8 +483,15 @@ This structure must be initialized as follows: (0x3). - FSCRYPT_POLICY_FLAG_DIRECT_KEY: See `DIRECT_KEY policies`_. - FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64: See `IV_INO_LBLK_64 - policies`_. This is mutually exclusive with DIRECT_KEY and is not - supported on v1 policies. + policies`_. + - FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32: See `IV_INO_LBLK_32 + policies`_. + + v1 encryption policies only support the PAD_* and DIRECT_KEY flags. + The other flags are only supported by v2 encryption policies. + + The DIRECT_KEY, IV_INO_LBLK_64, and IV_INO_LBLK_32 flags are + mutually exclusive. - For v2 encryption policies, ``__reserved`` must be zeroed. diff --git a/Documentation/hwmon/amd_energy.rst b/Documentation/hwmon/amd_energy.rst new file mode 100644 index 000000000000..f8288edff664 --- /dev/null +++ b/Documentation/hwmon/amd_energy.rst @@ -0,0 +1,109 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Kernel driver amd_energy +========================== + +Supported chips: + +* AMD Family 17h Processors + + Prefix: 'amd_energy' + + Addresses used: RAPL MSRs + + Datasheets: + + - Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors + + https://developer.amd.com/wp-content/resources/55570-B1_PUB.zip + + - Preliminary Processor Programming Reference (PPR) for AMD Family 17h Model 31h, Revision B0 Processors + + https://developer.amd.com/wp-content/resources/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip + +Author: Naveen Krishna Chatradhi <nchatrad@amd.com> + +Description +----------- + +The Energy driver exposes the energy counters that are +reported via the Running Average Power Limit (RAPL) +Model-specific Registers (MSRs) via the hardware monitor +(HWMON) sysfs interface. + +1. Power, Energy and Time Units + MSR_RAPL_POWER_UNIT/ C001_0299: + shared with all cores in the socket + +2. Energy consumed by each Core + MSR_CORE_ENERGY_STATUS/ C001_029A: + 32-bitRO, Accumulator, core-level power reporting + +3. Energy consumed by Socket + MSR_PACKAGE_ENERGY_STATUS/ C001_029B: + 32-bitRO, Accumulator, socket-level power reporting, + shared with all cores in socket + +These registers are updated every 1ms and cleared on +reset of the system. + +Note: If SMT is enabled, Linux enumerates all threads as cpus. +Since, the energy status registers are accessed at core level, +reading those registers from the sibling threads would result +in duplicate values. Hence, energy counter entries are not +populated for the siblings. + +Energy Caluclation +------------------ + +Energy information (in Joules) is based on the multiplier, +1/2^ESU; where ESU is an unsigned integer read from +MSR_RAPL_POWER_UNIT register. Default value is 10000b, +indicating energy status unit is 15.3 micro-Joules increment. + +Reported values are scaled as per the formula + +scaled value = ((1/2^ESU) * (Raw value) * 1000000UL) in uJoules + +Users calculate power for a given domain by calculating + dEnergy/dTime for that domain. + +Energy accumulation +-------------------------- + +Current, Socket energy status register is 32bit, assuming a 240W +2P system, the register would wrap around in + + 2^32*15.3 e-6/240 * 2 = 547.60833024 secs to wrap(~9 mins) + +The Core energy register may wrap around after several days. + +To improve the wrap around time, a kernel thread is implemented +to accumulate the socket energy counters and one core energy counter +per run to a respective 64-bit counter. The kernel thread starts +running during probe, wakes up every 100secs and stops running +when driver is removed. + +A socket and core energy read would return the current register +value added to the respective energy accumulator. + +Sysfs attributes +---------------- + +=============== ======== ===================================== +Attribute Label Description +=============== ======== ===================================== + +* For index N between [1] and [nr_cpus] + +=============== ======== ====================================== +energy[N]_input EcoreX Core Energy X = [0] to [nr_cpus - 1] + Measured input core energy +=============== ======== ====================================== + +* For N between [nr_cpus] and [nr_cpus + nr_socks] + +=============== ======== ====================================== +energy[N]_input EsocketX Socket Energy X = [0] to [nr_socks -1] + Measured input socket energy +=============== ======== ====================================== diff --git a/Documentation/hwmon/bt1-pvt.rst b/Documentation/hwmon/bt1-pvt.rst new file mode 100644 index 000000000000..cbb0c0613132 --- /dev/null +++ b/Documentation/hwmon/bt1-pvt.rst @@ -0,0 +1,117 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +Kernel driver bt1-pvt +===================== + +Supported chips: + + * Baikal-T1 PVT sensor (in SoC) + + Prefix: 'bt1-pvt' + + Addresses scanned: - + + Datasheet: Provided by BAIKAL ELECTRONICS upon request and under NDA + +Authors: + Maxim Kaurkin <maxim.kaurkin@baikalelectronics.ru> + Serge Semin <Sergey.Semin@baikalelectronics.ru> + +Description +----------- + +This driver implements support for the hardware monitoring capabilities of the +embedded into Baikal-T1 process, voltage and temperature sensors. PVT IP-core +consists of one temperature and four voltage sensors, which can be used to +monitor the chip internal environment like heating, supply voltage and +transistors performance. The driver can optionally provide the hwmon alarms +for each sensor the PVT controller supports. The alarms functionality is made +compile-time configurable due to the hardware interface implementation +peculiarity, which is connected with an ability to convert data from only one +sensor at a time. Additional limitation is that the controller performs the +thresholds checking synchronously with the data conversion procedure. Due to +these in order to have the hwmon alarms automatically detected the driver code +must switch from one sensor to another, read converted data and manually check +the threshold status bits. Depending on the measurements timeout settings +(update_interval sysfs node value) this design may cause additional burden on +the system performance. So in case if alarms are unnecessary in your system +design it's recommended to have them disabled to prevent the PVT IRQs being +periodically raised to get the data cache/alarms status up to date. By default +in alarm-less configuration the data conversion is performed by the driver +on demand when read operation is requested via corresponding _input-file. + +Temperature Monitoring +---------------------- + +Temperature is measured with 10-bit resolution and reported in millidegree +Celsius. The driver performs all the scaling by itself therefore reports true +temperatures that don't need any user-space adjustments. While the data +translation formulae isn't linear, which gives us non-linear discreteness, +it's close to one, but giving a bit better accuracy for higher temperatures. +The temperature input is mapped as follows (the last column indicates the input +ranges):: + + temp1: CPU embedded diode -48.38C - +147.438C + +In case if the alarms kernel config is enabled in the driver the temperature input +has associated min and max limits which trigger an alarm when crossed. + +Voltage Monitoring +------------------ + +The voltage inputs are also sampled with 10-bit resolution and reported in +millivolts. But in this case the data translation formulae is linear, which +provides a constant measurements discreteness. The data scaling is also +performed by the driver, so returning true millivolts. The voltage inputs are +mapped as follows (the last column indicates the input ranges):: + + in0: VDD (processor core) 0.62V - 1.168V + in1: Low-Vt (low voltage threshold) 0.62V - 1.168V + in2: High-Vt (high voltage threshold) 0.62V - 1.168V + in3: Standard-Vt (standard voltage threshold) 0.62V - 1.168V + +In case if the alarms config is enabled in the driver the voltage inputs +have associated min and max limits which trigger an alarm when crossed. + +Sysfs Attributes +---------------- + +Following is a list of all sysfs attributes that the driver provides, their +permissions and a short description: + +=============================== ======= ======================================= +Name Perm Description +=============================== ======= ======================================= +update_interval RW Measurements update interval per + sensor. +temp1_type RO Sensor type (always 1 as CPU embedded + diode). +temp1_label RO CPU Core Temperature sensor. +temp1_input RO Measured temperature in millidegree + Celsius. +temp1_min RW Low limit for temp input. +temp1_max RW High limit for temp input. +temp1_min_alarm RO Temperature input alarm. Returns 1 if + temperature input went below min limit, + 0 otherwise. +temp1_max_alarm RO Temperature input alarm. Returns 1 if + temperature input went above max limit, + 0 otherwise. +temp1_offset RW Temperature offset in millidegree + Celsius which is added to the + temperature reading by the chip. It can + be used to manually adjust the + temperature measurements within 7.130 + degrees Celsius. +in[0-3]_label RO CPU Voltage sensor (either core or + low/high/standard thresholds). +in[0-3]_input RO Measured voltage in millivolts. +in[0-3]_min RW Low limit for voltage input. +in[0-3]_max RW High limit for voltage input. +in[0-3]_min_alarm RO Voltage input alarm. Returns 1 if + voltage input went below min limit, + 0 otherwise. +in[0-3]_max_alarm RO Voltage input alarm. Returns 1 if + voltage input went above max limit, + 0 otherwise. +=============================== ======= ======================================= diff --git a/Documentation/hwmon/gsc-hwmon.rst b/Documentation/hwmon/gsc-hwmon.rst new file mode 100644 index 000000000000..ffac392a7129 --- /dev/null +++ b/Documentation/hwmon/gsc-hwmon.rst @@ -0,0 +1,53 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Kernel driver gsc-hwmon +======================= + +Supported chips: Gateworks GSC +Datasheet: http://trac.gateworks.com/wiki/gsc +Author: Tim Harvey <tharvey@gateworks.com> + +Description: +------------ + +This driver supports hardware monitoring for the temperature sensor, +various ADC's connected to the GSC, and optional FAN controller available +on some boards. + + +Voltage Monitoring +------------------ + +The voltage inputs are scaled either internally or by the driver depending +on the GSC version and firmware. The values returned by the driver do not need +further scaling. The voltage input labels provide the voltage rail name: + +inX_input Measured voltage (mV). +inX_label Name of voltage rail. + + +Temperature Monitoring +---------------------- + +Temperatures are measured with 12-bit or 10-bit resolution and are scaled +either internally or by the driver depending on the GSC version and firmware. +The values returned by the driver reflect millidegree Celcius: + +tempX_input Measured temperature. +tempX_label Name of temperature input. + + +PWM Output Control +------------------ + +The GSC features 1 PWM output that operates in automatic mode where the +PWM value will be scalled depending on 6 temperature boundaries. +The tempeature boundaries are read-write and in millidegree Celcius and the +read-only PWM values range from 0 (off) to 255 (full speed). +Fan speed will be set to minimum (off) when the temperature sensor reads +less than pwm1_auto_point1_temp and maximum when the temperature sensor +equals or exceeds pwm1_auto_point6_temp. + +pwm1_auto_point[1-6]_pwm PWM value. +pwm1_auto_point[1-6]_temp Temperature boundary. + diff --git a/Documentation/hwmon/ina2xx.rst b/Documentation/hwmon/ina2xx.rst index 94b9a260c518..ed81f5416331 100644 --- a/Documentation/hwmon/ina2xx.rst +++ b/Documentation/hwmon/ina2xx.rst @@ -99,6 +99,25 @@ Sysfs entries for ina226, ina230 and ina231 only ------------------------------------------------ ======================= ==================================================== +in0_lcrit Critical low shunt voltage +in0_crit Critical high shunt voltage +in0_lcrit_alarm Shunt voltage critical low alarm +in0_crit_alarm Shunt voltage critical high alarm +in1_lcrit Critical low bus voltage +in1_crit Critical high bus voltage +in1_lcrit_alarm Bus voltage critical low alarm +in1_crit_alarm Bus voltage critical high alarm +power1_crit Critical high power +power1_crit_alarm Power critical high alarm update_interval data conversion time; affects number of samples used to average results for shunt and bus voltages. ======================= ==================================================== + +.. note:: + + - Configure `shunt_resistor` before configure `power1_crit`, because power + value is calculated based on `shunt_resistor` set. + - Because of the underlying register implementation, only one `*crit` setting + and its `alarm` can be active. Writing to one `*crit` setting clears other + `*crit` settings and alarms. Writing 0 to any `*crit` setting clears all + `*crit` settings and alarms. diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst index 8ef62fd39787..005bf9e124bb 100644 --- a/Documentation/hwmon/index.rst +++ b/Documentation/hwmon/index.rst @@ -39,10 +39,12 @@ Hardware Monitoring Kernel Drivers adt7470 adt7475 amc6821 + amd_energy asb100 asc7621 aspeed-pwm-tacho bel-pfe + bt1-pvt coretemp da9052 da9055 @@ -60,6 +62,7 @@ Hardware Monitoring Kernel Drivers ftsteutates g760a g762 + gsc-hwmon gl518sm hih6130 ibmaem @@ -106,6 +109,7 @@ Hardware Monitoring Kernel Drivers max16064 max16065 max1619 + max16601 max1668 max197 max20730 diff --git a/Documentation/hwmon/lm90.rst b/Documentation/hwmon/lm90.rst index 953315987c06..78dfc01b47a2 100644 --- a/Documentation/hwmon/lm90.rst +++ b/Documentation/hwmon/lm90.rst @@ -123,6 +123,18 @@ Supported chips: http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3497 + * Maxim MAX6654 + + Prefix: 'max6654' + + Addresses scanned: I2C 0x18, 0x19, 0x1a, 0x29, 0x2a, 0x2b, + + 0x4c, 0x4d and 0x4e + + Datasheet: Publicly available at the Maxim website + + https://www.maximintegrated.com/en/products/sensors/MAX6654.html + * Maxim MAX6657 Prefix: 'max6657' @@ -301,6 +313,13 @@ ADT7461, ADT7461A, NCT1008: * Extended temperature range (breaks compatibility) * Lower resolution for remote temperature +MAX6654: + * Better local resolution + * Selectable address + * Remote sensor type selection + * Extended temperature range + * Extended resolution only available when conversion rate <= 1 Hz + MAX6657 and MAX6658: * Better local resolution * Remote sensor type selection @@ -336,8 +355,8 @@ SA56004X: All temperature values are given in degrees Celsius. Resolution is 1.0 degree for the local temperature, 0.125 degree for the remote -temperature, except for the MAX6657, MAX6658 and MAX6659 which have a -resolution of 0.125 degree for both temperatures. +temperature, except for the MAX6654, MAX6657, MAX6658 and MAX6659 which have +a resolution of 0.125 degree for both temperatures. Each sensor has its own high and low limits, plus a critical limit. Additionally, there is a relative hysteresis value common to both critical diff --git a/Documentation/hwmon/max16601.rst b/Documentation/hwmon/max16601.rst new file mode 100644 index 000000000000..346e74674c51 --- /dev/null +++ b/Documentation/hwmon/max16601.rst @@ -0,0 +1,159 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Kernel driver max16601 +====================== + +Supported chips: + + * Maxim MAX16601 + + Prefix: 'max16601' + + Addresses scanned: - + + Datasheet: Not published + +Author: Guenter Roeck <linux@roeck-us.net> + + +Description +----------- + +This driver supports the MAX16601 VR13.HC Dual-Output Voltage Regulator +Chipset. + +The driver is a client driver to the core PMBus driver. +Please see Documentation/hwmon/pmbus.rst for details on PMBus client drivers. + + +Usage Notes +----------- + +This driver does not auto-detect devices. You will have to instantiate the +devices explicitly. Please see Documentation/i2c/instantiating-devices.rst for +details. + + +Platform data support +--------------------- + +The driver supports standard PMBus driver platform data. + + +Sysfs entries +------------- + +The following attributes are supported. + +======================= ======================================================= +in1_label "vin1" +in1_input VCORE input voltage. +in1_alarm Input voltage alarm. + +in2_label "vout1" +in2_input VCORE output voltage. +in2_alarm Output voltage alarm. + +curr1_label "iin1" +curr1_input VCORE input current, derived from duty cycle and output + current. +curr1_max Maximum input current. +curr1_max_alarm Current high alarm. + +curr2_label "iin1.0" +curr2_input VCORE phase 0 input current. + +curr3_label "iin1.1" +curr3_input VCORE phase 1 input current. + +curr4_label "iin1.2" +curr4_input VCORE phase 2 input current. + +curr5_label "iin1.3" +curr5_input VCORE phase 3 input current. + +curr6_label "iin1.4" +curr6_input VCORE phase 4 input current. + +curr7_label "iin1.5" +curr7_input VCORE phase 5 input current. + +curr8_label "iin1.6" +curr8_input VCORE phase 6 input current. + +curr9_label "iin1.7" +curr9_input VCORE phase 7 input current. + +curr10_label "iin2" +curr10_input VCORE input current, derived from sensor element. + +curr11_label "iin3" +curr11_input VSA input current. + +curr12_label "iout1" +curr12_input VCORE output current. +curr12_crit Critical output current. +curr12_crit_alarm Output current critical alarm. +curr12_max Maximum output current. +curr12_max_alarm Output current high alarm. + +curr13_label "iout1.0" +curr13_input VCORE phase 0 output current. + +curr14_label "iout1.1" +curr14_input VCORE phase 1 output current. + +curr15_label "iout1.2" +curr15_input VCORE phase 2 output current. + +curr16_label "iout1.3" +curr16_input VCORE phase 3 output current. + +curr17_label "iout1.4" +curr17_input VCORE phase 4 output current. + +curr18_label "iout1.5" +curr18_input VCORE phase 5 output current. + +curr19_label "iout1.6" +curr19_input VCORE phase 6 output current. + +curr20_label "iout1.7" +curr20_input VCORE phase 7 output current. + +curr21_label "iout3" +curr21_input VSA output current. +curr21_highest Historical maximum VSA output current. +curr21_reset_history Write any value to reset curr21_highest. +curr21_crit Critical output current. +curr21_crit_alarm Output current critical alarm. +curr21_max Maximum output current. +curr21_max_alarm Output current high alarm. + +power1_label "pin1" +power1_input Input power, derived from duty cycle and output current. +power1_alarm Input power alarm. + +power2_label "pin2" +power2_input Input power, derived from input current sensor. + +power3_label "pout" +power3_input Output power. + +temp1_input VCORE temperature. +temp1_crit Critical high temperature. +temp1_crit_alarm Chip temperature critical high alarm. +temp1_max Maximum temperature. +temp1_max_alarm Chip temperature high alarm. + +temp2_input TSENSE_0 temperature +temp3_input TSENSE_1 temperature +temp4_input TSENSE_2 temperature +temp5_input TSENSE_3 temperature + +temp6_input VSA temperature. +temp6_crit Critical high temperature. +temp6_crit_alarm Chip temperature critical high alarm. +temp6_max Maximum temperature. +temp6_max_alarm Chip temperature high alarm. +======================= ======================================================= diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst index 09f45ce38d26..1b577a8bf982 100644 --- a/Documentation/locking/locktypes.rst +++ b/Documentation/locking/locktypes.rst @@ -13,6 +13,7 @@ The kernel provides a variety of locking primitives which can be divided into two categories: - Sleeping locks + - CPU local locks - Spinning locks This document conceptually describes these lock types and provides rules @@ -44,9 +45,23 @@ Sleeping lock types: On PREEMPT_RT kernels, these lock types are converted to sleeping locks: + - local_lock - spinlock_t - rwlock_t + +CPU local locks +--------------- + + - local_lock + +On non-PREEMPT_RT kernels, local_lock functions are wrappers around +preemption and interrupt disabling primitives. Contrary to other locking +mechanisms, disabling preemption or interrupts are pure CPU local +concurrency control mechanisms and not suited for inter-CPU concurrency +control. + + Spinning locks -------------- @@ -67,6 +82,7 @@ can have suffixes which apply further protections: _irqsave/restore() Save and disable / restore interrupt disabled state =================== ==================================================== + Owner semantics =============== @@ -139,6 +155,56 @@ implementation, thus changing the fairness: writer from starving readers. +local_lock +========== + +local_lock provides a named scope to critical sections which are protected +by disabling preemption or interrupts. + +On non-PREEMPT_RT kernels local_lock operations map to the preemption and +interrupt disabling and enabling primitives: + + =========================== ====================== + local_lock(&llock) preempt_disable() + local_unlock(&llock) preempt_enable() + local_lock_irq(&llock) local_irq_disable() + local_unlock_irq(&llock) local_irq_enable() + local_lock_save(&llock) local_irq_save() + local_lock_restore(&llock) local_irq_save() + =========================== ====================== + +The named scope of local_lock has two advantages over the regular +primitives: + + - The lock name allows static analysis and is also a clear documentation + of the protection scope while the regular primitives are scopeless and + opaque. + + - If lockdep is enabled the local_lock gains a lockmap which allows to + validate the correctness of the protection. This can detect cases where + e.g. a function using preempt_disable() as protection mechanism is + invoked from interrupt or soft-interrupt context. Aside of that + lockdep_assert_held(&llock) works as with any other locking primitive. + +local_lock and PREEMPT_RT +------------------------- + +PREEMPT_RT kernels map local_lock to a per-CPU spinlock_t, thus changing +semantics: + + - All spinlock_t changes also apply to local_lock. + +local_lock usage +---------------- + +local_lock should be used in situations where disabling preemption or +interrupts is the appropriate form of concurrency control to protect +per-CPU data structures on a non PREEMPT_RT kernel. + +local_lock is not suitable to protect against preemption or interrupts on a +PREEMPT_RT kernel due to the PREEMPT_RT specific spinlock_t semantics. + + raw_spinlock_t and spinlock_t ============================= @@ -258,10 +324,82 @@ implementation, thus changing semantics: PREEMPT_RT caveats ================== +local_lock on RT +---------------- + +The mapping of local_lock to spinlock_t on PREEMPT_RT kernels has a few +implications. For example, on a non-PREEMPT_RT kernel the following code +sequence works as expected:: + + local_lock_irq(&local_lock); + raw_spin_lock(&lock); + +and is fully equivalent to:: + + raw_spin_lock_irq(&lock); + +On a PREEMPT_RT kernel this code sequence breaks because local_lock_irq() +is mapped to a per-CPU spinlock_t which neither disables interrupts nor +preemption. The following code sequence works perfectly correct on both +PREEMPT_RT and non-PREEMPT_RT kernels:: + + local_lock_irq(&local_lock); + spin_lock(&lock); + +Another caveat with local locks is that each local_lock has a specific +protection scope. So the following substitution is wrong:: + + func1() + { + local_irq_save(flags); -> local_lock_irqsave(&local_lock_1, flags); + func3(); + local_irq_restore(flags); -> local_lock_irqrestore(&local_lock_1, flags); + } + + func2() + { + local_irq_save(flags); -> local_lock_irqsave(&local_lock_2, flags); + func3(); + local_irq_restore(flags); -> local_lock_irqrestore(&local_lock_2, flags); + } + + func3() + { + lockdep_assert_irqs_disabled(); + access_protected_data(); + } + +On a non-PREEMPT_RT kernel this works correctly, but on a PREEMPT_RT kernel +local_lock_1 and local_lock_2 are distinct and cannot serialize the callers +of func3(). Also the lockdep assert will trigger on a PREEMPT_RT kernel +because local_lock_irqsave() does not disable interrupts due to the +PREEMPT_RT-specific semantics of spinlock_t. The correct substitution is:: + + func1() + { + local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags); + func3(); + local_irq_restore(flags); -> local_lock_irqrestore(&local_lock, flags); + } + + func2() + { + local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags); + func3(); + local_irq_restore(flags); -> local_lock_irqrestore(&local_lock, flags); + } + + func3() + { + lockdep_assert_held(&local_lock); + access_protected_data(); + } + + spinlock_t and rwlock_t ----------------------- -These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels +The changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels have a few implications. For example, on a non-PREEMPT_RT kernel the following code sequence works as expected:: @@ -282,9 +420,61 @@ local_lock mechanism. Acquiring the local_lock pins the task to a CPU, allowing things like per-CPU interrupt disabled locks to be acquired. However, this approach should be used only where absolutely necessary. +A typical scenario is protection of per-CPU variables in thread context:: -raw_spinlock_t --------------- + struct foo *p = get_cpu_ptr(&var1); + + spin_lock(&p->lock); + p->count += this_cpu_read(var2); + +This is correct code on a non-PREEMPT_RT kernel, but on a PREEMPT_RT kernel +this breaks. The PREEMPT_RT-specific change of spinlock_t semantics does +not allow to acquire p->lock because get_cpu_ptr() implicitly disables +preemption. The following substitution works on both kernels:: + + struct foo *p; + + migrate_disable(); + p = this_cpu_ptr(&var1); + spin_lock(&p->lock); + p->count += this_cpu_read(var2); + +On a non-PREEMPT_RT kernel migrate_disable() maps to preempt_disable() +which makes the above code fully equivalent. On a PREEMPT_RT kernel +migrate_disable() ensures that the task is pinned on the current CPU which +in turn guarantees that the per-CPU access to var1 and var2 are staying on +the same CPU. + +The migrate_disable() substitution is not valid for the following +scenario:: + + func() + { + struct foo *p; + + migrate_disable(); + p = this_cpu_ptr(&var1); + p->val = func2(); + +While correct on a non-PREEMPT_RT kernel, this breaks on PREEMPT_RT because +here migrate_disable() does not protect against reentrancy from a +preempting task. A correct substitution for this case is:: + + func() + { + struct foo *p; + + local_lock(&foo_lock); + p = this_cpu_ptr(&var1); + p->val = func2(); + +On a non-PREEMPT_RT kernel this protects against reentrancy by disabling +preemption. On a PREEMPT_RT kernel this is achieved by acquiring the +underlying per-CPU spinlock. + + +raw_spinlock_t on RT +-------------------- Acquiring a raw_spinlock_t disables preemption and possibly also interrupts, so the critical section must avoid acquiring a regular @@ -325,22 +515,25 @@ Lock type nesting rules The most basic rules are: - - Lock types of the same lock category (sleeping, spinning) can nest - arbitrarily as long as they respect the general lock ordering rules to - prevent deadlocks. + - Lock types of the same lock category (sleeping, CPU local, spinning) + can nest arbitrarily as long as they respect the general lock ordering + rules to prevent deadlocks. + + - Sleeping lock types cannot nest inside CPU local and spinning lock types. - - Sleeping lock types cannot nest inside spinning lock types. + - CPU local and spinning lock types can nest inside sleeping lock types. - - Spinning lock types can nest inside sleeping lock types. + - Spinning lock types can nest inside all lock types These constraints apply both in PREEMPT_RT and otherwise. The fact that PREEMPT_RT changes the lock category of spinlock_t and -rwlock_t from spinning to sleeping means that they cannot be acquired while -holding a raw spinlock. This results in the following nesting ordering: +rwlock_t from spinning to sleeping and substitutes local_lock with a +per-CPU spinlock_t means that they cannot be acquired while holding a raw +spinlock. This results in the following nesting ordering: 1) Sleeping locks - 2) spinlock_t and rwlock_t + 2) spinlock_t, rwlock_t, local_lock 3) raw_spinlock_t and bit spinlocks Lockdep will complain if these constraints are violated, both in diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst index 5b58fc4e1268..4574352d6ff4 100644 --- a/Documentation/networking/devlink/ice.rst +++ b/Documentation/networking/devlink/ice.rst @@ -61,8 +61,8 @@ The ``ice`` driver reports the following versions - running - ICE OS Default Package - The name of the DDP package that is active in the device. The DDP - package is loaded by the driver during initialization. Each varation - of DDP package shall have a unique name. + package is loaded by the driver during initialization. Each + variation of the DDP package has a unique name. * - ``fw.app`` - running - 1.3.1.0 diff --git a/Documentation/power/suspend-and-cpuhotplug.rst b/Documentation/power/suspend-and-cpuhotplug.rst index 572d968c5375..ebedb6c75db9 100644 --- a/Documentation/power/suspend-and-cpuhotplug.rst +++ b/Documentation/power/suspend-and-cpuhotplug.rst @@ -48,7 +48,7 @@ More details follow:: | | v - disable_nonboot_cpus() + freeze_secondary_cpus() /* start */ | v @@ -83,7 +83,7 @@ More details follow:: Release cpu_add_remove_lock | v - /* disable_nonboot_cpus() complete */ + /* freeze_secondary_cpus() complete */ | v Do suspend @@ -93,7 +93,7 @@ More details follow:: Resuming back is likewise, with the counterparts being (in the order of execution during resume): -* enable_nonboot_cpus() which involves:: +* thaw_secondary_cpus() which involves:: | Acquire cpu_add_remove_lock | Decrease cpu_hotplug_disabled, thereby enabling regular cpu hotplug diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst index acb2f1b36350..17a8e584f15f 100644 --- a/Documentation/process/coding-style.rst +++ b/Documentation/process/coding-style.rst @@ -84,15 +84,20 @@ Get a decent editor and don't leave whitespace at the end of lines. Coding style is all about readability and maintainability using commonly available tools. -The limit on the length of lines is 80 columns and this is a strongly -preferred limit. - -Statements longer than 80 columns will be broken into sensible chunks, unless -exceeding 80 columns significantly increases readability and does not hide -information. Descendants are always substantially shorter than the parent and -are placed substantially to the right. The same applies to function headers -with a long argument list. However, never break user-visible strings such as -printk messages, because that breaks the ability to grep for them. +The preferred limit on the length of a single line is 80 columns. + +Statements longer than 80 columns should be broken into sensible chunks, +unless exceeding 80 columns significantly increases readability and does +not hide information. + +Descendants are always substantially shorter than the parent and are +are placed substantially to the right. A very commonly used style +is to align descendants to a function open parenthesis. + +These same rules are applied to function headers with a long argument list. + +However, never break user-visible strings such as printk messages because +that breaks the ability to grep for them. 3) Placing Braces and Spaces diff --git a/Documentation/security/siphash.rst b/Documentation/security/siphash.rst index 4eba68cdf0a1..bd9363025fcb 100644 --- a/Documentation/security/siphash.rst +++ b/Documentation/security/siphash.rst @@ -7,7 +7,7 @@ SipHash - a short input PRF SipHash is a cryptographically secure PRF -- a keyed hash function -- that performs very well for short inputs, hence the name. It was designed by cryptographers Daniel J. Bernstein and Jean-Philippe Aumasson. It is intended -as a replacement for some uses of: `jhash`, `md5_transform`, `sha_transform`, +as a replacement for some uses of: `jhash`, `md5_transform`, `sha1_transform`, and so forth. SipHash takes a secret key filled with randomly generated numbers and either diff --git a/Documentation/trace/ftrace-design.rst b/Documentation/trace/ftrace-design.rst index a8e22e0db63c..6893399157f0 100644 --- a/Documentation/trace/ftrace-design.rst +++ b/Documentation/trace/ftrace-design.rst @@ -229,14 +229,6 @@ Adding support for it is easy: just define the macro in asm/ftrace.h and pass the return address pointer as the 'retp' argument to ftrace_push_return_trace(). -HAVE_FTRACE_NMI_ENTER ---------------------- - -If you can't trace NMI functions, then skip this option. - -<details to be filled> - - HAVE_SYSCALL_TRACEPOINTS ------------------------ diff --git a/Documentation/usb/raw-gadget.rst b/Documentation/usb/raw-gadget.rst index 9e78cb858f86..68d879a8009e 100644 --- a/Documentation/usb/raw-gadget.rst +++ b/Documentation/usb/raw-gadget.rst @@ -27,9 +27,8 @@ differences are: 3. Raw Gadget provides a way to select a UDC device/driver to bind to, while GadgetFS currently binds to the first available UDC. -4. Raw Gadget uses predictable endpoint names (handles) across different - UDCs (as long as UDCs have enough endpoints of each required transfer - type). +4. Raw Gadget explicitly exposes information about endpoints addresses and + capabilities allowing a user to write UDC-agnostic gadgets. 5. Raw Gadget has ioctl-based interface instead of a filesystem-based one. @@ -50,12 +49,36 @@ The typical usage of Raw Gadget looks like: Raw Gadget and react to those depending on what kind of USB device needs to be emulated. +Note, that some UDC drivers have fixed addresses assigned to endpoints, and +therefore arbitrary endpoint addresses can't be used in the descriptors. +Nevertheles, Raw Gadget provides a UDC-agnostic way to write USB gadgets. +Once a USB_RAW_EVENT_CONNECT event is received via USB_RAW_IOCTL_EVENT_FETCH, +the USB_RAW_IOCTL_EPS_INFO ioctl can be used to find out information about +endpoints that the UDC driver has. Based on that information, the user must +chose UDC endpoints that will be used for the gadget being emulated, and +properly assign addresses in endpoint descriptors. + +You can find usage examples (along with a test suite) here: + +https://github.com/xairy/raw-gadget + +Internal details +~~~~~~~~~~~~~~~~ + +Currently every endpoint read/write ioctl submits a USB request and waits until +its completion. This is the desired mode for coverage-guided fuzzing (as we'd +like all USB request processing happen during the lifetime of a syscall), +and must be kept in the implementation. (This might be slow for real world +applications, thus the O_NONBLOCK improvement suggestion below.) + Potential future improvements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- Implement ioctl's for setting/clearing halt status on endpoints. - -- Reporting more events (suspend, resume, etc.) through - USB_RAW_IOCTL_EVENT_FETCH. +- Report more events (suspend, resume, etc.) through USB_RAW_IOCTL_EVENT_FETCH. - Support O_NONBLOCK I/O. + +- Support USB 3 features (accept SS endpoint companion descriptor when + enabling endpoints; allow providing stream_id for bulk transfers). + +- Support ISO transfer features (expose frame_number for completed requests). diff --git a/Documentation/virt/kvm/index.rst b/Documentation/virt/kvm/index.rst index dcc252634cf9..b6833c7bb474 100644 --- a/Documentation/virt/kvm/index.rst +++ b/Documentation/virt/kvm/index.rst @@ -28,3 +28,5 @@ KVM arm/index devices/index + + running-nested-guests diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst new file mode 100644 index 000000000000..d0a1fc754c84 --- /dev/null +++ b/Documentation/virt/kvm/running-nested-guests.rst @@ -0,0 +1,276 @@ +============================== +Running nested guests with KVM +============================== + +A nested guest is the ability to run a guest inside another guest (it +can be KVM-based or a different hypervisor). The straightforward +example is a KVM guest that in turn runs on a KVM guest (the rest of +this document is built on this example):: + + .----------------. .----------------. + | | | | + | L2 | | L2 | + | (Nested Guest) | | (Nested Guest) | + | | | | + |----------------'--'----------------| + | | + | L1 (Guest Hypervisor) | + | KVM (/dev/kvm) | + | | + .------------------------------------------------------. + | L0 (Host Hypervisor) | + | KVM (/dev/kvm) | + |------------------------------------------------------| + | Hardware (with virtualization extensions) | + '------------------------------------------------------' + +Terminology: + +- L0 – level-0; the bare metal host, running KVM + +- L1 – level-1 guest; a VM running on L0; also called the "guest + hypervisor", as it itself is capable of running KVM. + +- L2 – level-2 guest; a VM running on L1, this is the "nested guest" + +.. note:: The above diagram is modelled after the x86 architecture; + s390x, ppc64 and other architectures are likely to have + a different design for nesting. + + For example, s390x always has an LPAR (LogicalPARtition) + hypervisor running on bare metal, adding another layer and + resulting in at least four levels in a nested setup — L0 (bare + metal, running the LPAR hypervisor), L1 (host hypervisor), L2 + (guest hypervisor), L3 (nested guest). + + This document will stick with the three-level terminology (L0, + L1, and L2) for all architectures; and will largely focus on + x86. + + +Use Cases +--------- + +There are several scenarios where nested KVM can be useful, to name a +few: + +- As a developer, you want to test your software on different operating + systems (OSes). Instead of renting multiple VMs from a Cloud + Provider, using nested KVM lets you rent a large enough "guest + hypervisor" (level-1 guest). This in turn allows you to create + multiple nested guests (level-2 guests), running different OSes, on + which you can develop and test your software. + +- Live migration of "guest hypervisors" and their nested guests, for + load balancing, disaster recovery, etc. + +- VM image creation tools (e.g. ``virt-install``, etc) often run + their own VM, and users expect these to work inside a VM. + +- Some OSes use virtualization internally for security (e.g. to let + applications run safely in isolation). + + +Enabling "nested" (x86) +----------------------- + +From Linux kernel v4.19 onwards, the ``nested`` KVM parameter is enabled +by default for Intel and AMD. (Though your Linux distribution might +override this default.) + +In case you are running a Linux kernel older than v4.19, to enable +nesting, set the ``nested`` KVM module parameter to ``Y`` or ``1``. To +persist this setting across reboots, you can add it in a config file, as +shown below: + +1. On the bare metal host (L0), list the kernel modules and ensure that + the KVM modules:: + + $ lsmod | grep -i kvm + kvm_intel 133627 0 + kvm 435079 1 kvm_intel + +2. Show information for ``kvm_intel`` module:: + + $ modinfo kvm_intel | grep -i nested + parm: nested:bool + +3. For the nested KVM configuration to persist across reboots, place the + below in ``/etc/modprobed/kvm_intel.conf`` (create the file if it + doesn't exist):: + + $ cat /etc/modprobe.d/kvm_intel.conf + options kvm-intel nested=y + +4. Unload and re-load the KVM Intel module:: + + $ sudo rmmod kvm-intel + $ sudo modprobe kvm-intel + +5. Verify if the ``nested`` parameter for KVM is enabled:: + + $ cat /sys/module/kvm_intel/parameters/nested + Y + +For AMD hosts, the process is the same as above, except that the module +name is ``kvm-amd``. + + +Additional nested-related kernel parameters (x86) +------------------------------------------------- + +If your hardware is sufficiently advanced (Intel Haswell processor or +higher, which has newer hardware virt extensions), the following +additional features will also be enabled by default: "Shadow VMCS +(Virtual Machine Control Structure)", APIC Virtualization on your bare +metal host (L0). Parameters for Intel hosts:: + + $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs + Y + + $ cat /sys/module/kvm_intel/parameters/enable_apicv + Y + + $ cat /sys/module/kvm_intel/parameters/ept + Y + +.. note:: If you suspect your L2 (i.e. nested guest) is running slower, + ensure the above are enabled (particularly + ``enable_shadow_vmcs`` and ``ept``). + + +Starting a nested guest (x86) +----------------------------- + +Once your bare metal host (L0) is configured for nesting, you should be +able to start an L1 guest with:: + + $ qemu-kvm -cpu host [...] + +The above will pass through the host CPU's capabilities as-is to the +gues); or for better live migration compatibility, use a named CPU +model supported by QEMU. e.g.:: + + $ qemu-kvm -cpu Haswell-noTSX-IBRS,vmx=on + +then the guest hypervisor will subsequently be capable of running a +nested guest with accelerated KVM. + + +Enabling "nested" (s390x) +------------------------- + +1. On the host hypervisor (L0), enable the ``nested`` parameter on + s390x:: + + $ rmmod kvm + $ modprobe kvm nested=1 + +.. note:: On s390x, the kernel parameter ``hpage`` is mutually exclusive + with the ``nested`` paramter — i.e. to be able to enable + ``nested``, the ``hpage`` parameter *must* be disabled. + +2. The guest hypervisor (L1) must be provided with the ``sie`` CPU + feature — with QEMU, this can be done by using "host passthrough" + (via the command-line ``-cpu host``). + +3. Now the KVM module can be loaded in the L1 (guest hypervisor):: + + $ modprobe kvm + + +Live migration with nested KVM +------------------------------ + +Migrating an L1 guest, with a *live* nested guest in it, to another +bare metal host, works as of Linux kernel 5.3 and QEMU 4.2.0 for +Intel x86 systems, and even on older versions for s390x. + +On AMD systems, once an L1 guest has started an L2 guest, the L1 guest +should no longer be migrated or saved (refer to QEMU documentation on +"savevm"/"loadvm") until the L2 guest shuts down. Attempting to migrate +or save-and-load an L1 guest while an L2 guest is running will result in +undefined behavior. You might see a ``kernel BUG!`` entry in ``dmesg``, a +kernel 'oops', or an outright kernel panic. Such a migrated or loaded L1 +guest can no longer be considered stable or secure, and must be restarted. +Migrating an L1 guest merely configured to support nesting, while not +actually running L2 guests, is expected to function normally even on AMD +systems but may fail once guests are started. + +Migrating an L2 guest is always expected to succeed, so all the following +scenarios should work even on AMD systems: + +- Migrating a nested guest (L2) to another L1 guest on the *same* bare + metal host. + +- Migrating a nested guest (L2) to another L1 guest on a *different* + bare metal host. + +- Migrating a nested guest (L2) to a bare metal host. + +Reporting bugs from nested setups +----------------------------------- + +Debugging "nested" problems can involve sifting through log files across +L0, L1 and L2; this can result in tedious back-n-forth between the bug +reporter and the bug fixer. + +- Mention that you are in a "nested" setup. If you are running any kind + of "nesting" at all, say so. Unfortunately, this needs to be called + out because when reporting bugs, people tend to forget to even + *mention* that they're using nested virtualization. + +- Ensure you are actually running KVM on KVM. Sometimes people do not + have KVM enabled for their guest hypervisor (L1), which results in + them running with pure emulation or what QEMU calls it as "TCG", but + they think they're running nested KVM. Thus confusing "nested Virt" + (which could also mean, QEMU on KVM) with "nested KVM" (KVM on KVM). + +Information to collect (generic) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following is not an exhaustive list, but a very good starting point: + + - Kernel, libvirt, and QEMU version from L0 + + - Kernel, libvirt and QEMU version from L1 + + - QEMU command-line of L1 -- when using libvirt, you'll find it here: + ``/var/log/libvirt/qemu/instance.log`` + + - QEMU command-line of L2 -- as above, when using libvirt, get the + complete libvirt-generated QEMU command-line + + - ``cat /sys/cpuinfo`` from L0 + + - ``cat /sys/cpuinfo`` from L1 + + - ``lscpu`` from L0 + + - ``lscpu`` from L1 + + - Full ``dmesg`` output from L0 + + - Full ``dmesg`` output from L1 + +x86-specific info to collect +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Both the below commands, ``x86info`` and ``dmidecode``, should be +available on most Linux distributions with the same name: + + - Output of: ``x86info -a`` from L0 + + - Output of: ``x86info -a`` from L1 + + - Output of: ``dmidecode`` from L0 + + - Output of: ``dmidecode`` from L1 + +s390x-specific info to collect +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Along with the earlier mentioned generic details, the below is +also recommended: + + - ``/proc/sysinfo`` from L1; this will also include the info from L0 |