diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-10 19:49:37 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-10 19:49:37 +0300 |
commit | 9b9e211360044c12d7738973c944d6f300134881 (patch) | |
tree | f994a379135e70397df3c653e6578ad7e5d295fe /Documentation | |
parent | a7ac314061375c7805e0d3a26aad6eb0c41100df (diff) | |
parent | 945409a6ef442cfe5f2f14e5626d4306d53100f0 (diff) | |
download | linux-9b9e211360044c12d7738973c944d6f300134881.tar.xz |
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- KCSAN enabled for arm64.
- Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.
- Some more SVE clean-ups and refactoring in preparation for SME
support (scalable matrix extensions).
- BTI clean-ups (SYM_FUNC macros etc.)
- arm64 atomics clean-up and codegen improvements.
- HWCAPs for FEAT_AFP (alternate floating point behaviour) and
FEAT_RPRESS (increased precision of reciprocal estimate and
reciprocal square root estimate).
- Use SHA3 instructions to speed-up XOR.
- arm64 unwind code refactoring/unification.
- Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP ==
1 (potentially set by a hypervisor; user-space already does this).
- Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.
- Other fixes and clean-ups; highlights: fix the handling of erratum
1418040, correct the calculation of the nomap region boundaries,
introduce io_stop_wc() mapped to the new DGH instruction (data
gathering hint).
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (81 commits)
arm64: Use correct method to calculate nomap region boundaries
arm64: Drop outdated links in comments
arm64: perf: Don't register user access sysctl handler multiple times
drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
arm64: errata: Fix exec handling in erratum 1418040 workaround
arm64: Unhash early pointer print plus improve comment
asm-generic: introduce io_stop_wc() and add implementation for ARM64
arm64: Ensure that the 'bti' macro is defined where linkage.h is included
arm64: remove __dma_*_area() aliases
docs/arm64: delete a space from tagged-address-abi
arm64: Enable KCSAN
kselftest/arm64: Add pidbench for floating point syscall cases
arm64/fp: Add comments documenting the usage of state restore functions
kselftest/arm64: Add a test program to exercise the syscall ABI
kselftest/arm64: Allow signal tests to trigger from a function
kselftest/arm64: Parameterise ptrace vector length information
arm64/sve: Minor clarification of ABI documentation
arm64/sve: Generalise vector length configuration prctl() for SME
arm64/sve: Make sysctl interface for SVE reusable by SME
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/admin-guide/perf/hisi-pcie-pmu.rst | 106 | ||||
-rw-r--r-- | Documentation/admin-guide/sysctl/kernel.rst | 11 | ||||
-rw-r--r-- | Documentation/arm64/cpu-feature-registers.rst | 17 | ||||
-rw-r--r-- | Documentation/arm64/elf_hwcaps.rst | 8 | ||||
-rw-r--r-- | Documentation/arm64/perf.rst | 78 | ||||
-rw-r--r-- | Documentation/arm64/sve.rst | 2 | ||||
-rw-r--r-- | Documentation/arm64/tagged-address-abi.rst | 2 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/perf/arm,cmn.yaml | 21 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml | 70 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml | 63 | ||||
-rw-r--r-- | Documentation/memory-barriers.txt | 8 |
11 files changed, 378 insertions, 8 deletions
diff --git a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst new file mode 100644 index 000000000000..294ebbdb22af --- /dev/null +++ b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst @@ -0,0 +1,106 @@ +================================================ +HiSilicon PCIe Performance Monitoring Unit (PMU) +================================================ + +On Hip09, HiSilicon PCIe Performance Monitoring Unit (PMU) could monitor +bandwidth, latency, bus utilization and buffer occupancy data of PCIe. + +Each PCIe Core has a PMU to monitor multi Root Ports of this PCIe Core and +all Endpoints downstream these Root Ports. + + +HiSilicon PCIe PMU driver +========================= + +The PCIe PMU driver registers a perf PMU with the name of its sicl-id and PCIe +Core id.:: + + /sys/bus/event_source/hisi_pcie<sicl>_<core> + +PMU driver provides description of available events and filter options in sysfs, +see /sys/bus/event_source/devices/hisi_pcie<sicl>_<core>. + +The "format" directory describes all formats of the config (events) and config1 +(filter options) fields of the perf_event_attr structure. The "events" directory +describes all documented events shown in perf list. + +The "identifier" sysfs file allows users to identify the version of the +PMU hardware device. + +The "bus" sysfs file allows users to get the bus number of Root Ports +monitored by PMU. + +Example usage of perf:: + + $# perf list + hisi_pcie0_0/rx_mwr_latency/ [kernel PMU event] + hisi_pcie0_0/rx_mwr_cnt/ [kernel PMU event] + ------------------------------------------ + + $# perf stat -e hisi_pcie0_0/rx_mwr_latency/ + $# perf stat -e hisi_pcie0_0/rx_mwr_cnt/ + $# perf stat -g -e hisi_pcie0_0/rx_mwr_latency/ -e hisi_pcie0_0/rx_mwr_cnt/ + +The current driver does not support sampling. So "perf record" is unsupported. +Also attach to a task is unsupported for PCIe PMU. + +Filter options +-------------- + +1. Target filter +PMU could only monitor the performance of traffic downstream target Root Ports +or downstream target Endpoint. PCIe PMU driver support "port" and "bdf" +interfaces for users, and these two interfaces aren't supported at the same +time. + +-port +"port" filter can be used in all PCIe PMU events, target Root Port can be +selected by configuring the 16-bits-bitmap "port". Multi ports can be selected +for AP-layer-events, and only one port can be selected for TL/DL-layer-events. + +For example, if target Root Port is 0000:00:00.0 (x8 lanes), bit0 of bitmap +should be set, port=0x1; if target Root Port is 0000:00:04.0 (x4 lanes), +bit8 is set, port=0x100; if these two Root Ports are both monitored, port=0x101. + +Example usage of perf:: + + $# perf stat -e hisi_pcie0_0/rx_mwr_latency,port=0x1/ sleep 5 + +-bdf + +"bdf" filter can only be used in bandwidth events, target Endpoint is selected +by configuring BDF to "bdf". Counter only counts the bandwidth of message +requested by target Endpoint. + +For example, "bdf=0x3900" means BDF of target Endpoint is 0000:39:00.0. + +Example usage of perf:: + + $# perf stat -e hisi_pcie0_0/rx_mrd_flux,bdf=0x3900/ sleep 5 + +2. Trigger filter +Event statistics start when the first time TLP length is greater/smaller +than trigger condition. You can set the trigger condition by writing "trig_len", +and set the trigger mode by writing "trig_mode". This filter can only be used +in bandwidth events. + +For example, "trig_len=4" means trigger condition is 2^4 DW, "trig_mode=0" +means statistics start when TLP length > trigger condition, "trig_mode=1" +means start when TLP length < condition. + +Example usage of perf:: + + $# perf stat -e hisi_pcie0_0/rx_mrd_flux,trig_len=0x4,trig_mode=1/ sleep 5 + +3. Threshold filter +Counter counts when TLP length within the specified range. You can set the +threshold by writing "thr_len", and set the threshold mode by writing +"thr_mode". This filter can only be used in bandwidth events. + +For example, "thr_len=4" means threshold is 2^4 DW, "thr_mode=0" means +counter counts when TLP length >= threshold, and "thr_mode=1" means counts +when TLP length < threshold. + +Example usage of perf:: + + $# perf stat -e hisi_pcie0_0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5 diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 0e486f41185e..d359bcfadd39 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -905,6 +905,17 @@ enabled, otherwise writing to this file will return ``-EBUSY``. The default value is 8. +perf_user_access (arm64 only) +================================= + +Controls user space access for reading perf event counters. When set to 1, +user space can read performance monitor counter registers directly. + +The default value is 0 (access disabled). + +See Documentation/arm64/perf.rst for more information. + + pid_max ======= diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst index 9f9b8fd06089..749ae970c319 100644 --- a/Documentation/arm64/cpu-feature-registers.rst +++ b/Documentation/arm64/cpu-feature-registers.rst @@ -275,6 +275,23 @@ infrastructure: | SVEVer | [3-0] | y | +------------------------------+---------+---------+ + 8) ID_AA64MMFR1_EL1 - Memory model feature register 1 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | AFP | [47-44] | y | + +------------------------------+---------+---------+ + + 9) ID_AA64ISAR2_EL1 - Instruction set attribute register 2 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | RPRES | [7-4] | y | + +------------------------------+---------+---------+ + + Appendix I: Example ------------------- diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst index af106af8e1c0..b72ff17d600a 100644 --- a/Documentation/arm64/elf_hwcaps.rst +++ b/Documentation/arm64/elf_hwcaps.rst @@ -251,6 +251,14 @@ HWCAP2_ECV Functionality implied by ID_AA64MMFR0_EL1.ECV == 0b0001. +HWCAP2_AFP + + Functionality implied by ID_AA64MFR1_EL1.AFP == 0b0001. + +HWCAP2_RPRES + + Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001. + 4. Unused AT_HWCAP bits ----------------------- diff --git a/Documentation/arm64/perf.rst b/Documentation/arm64/perf.rst index b567f177d385..1f87b57c2332 100644 --- a/Documentation/arm64/perf.rst +++ b/Documentation/arm64/perf.rst @@ -2,7 +2,10 @@ .. _perf_index: -===================== +==== +Perf +==== + Perf Event Attributes ===================== @@ -88,3 +91,76 @@ exclude_host. However when using !exclude_hv there is a small blackout window at the guest entry/exit where host events are not captured. On VHE systems there are no blackout windows. + +Perf Userspace PMU Hardware Counter Access +========================================== + +Overview +-------- +The perf userspace tool relies on the PMU to monitor events. It offers an +abstraction layer over the hardware counters since the underlying +implementation is cpu-dependent. +Arm64 allows userspace tools to have access to the registers storing the +hardware counters' values directly. + +This targets specifically self-monitoring tasks in order to reduce the overhead +by directly accessing the registers without having to go through the kernel. + +How-to +------ +The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu +registers is enabled and that the userspace has access to the relevant +information in order to use them. + +In order to have access to the hardware counters, the global sysctl +kernel/perf_user_access must first be enabled: + +.. code-block:: sh + + echo 1 > /proc/sys/kernel/perf_user_access + +It is necessary to open the event using the perf tool interface with config1:1 +attr bit set: the sys_perf_event_open syscall returns a fd which can +subsequently be used with the mmap syscall in order to retrieve a page of memory +containing information about the event. The PMU driver uses this page to expose +to the user the hardware counter's index and other necessary data. Using this +index enables the user to access the PMU registers using the `mrs` instruction. +Access to the PMU registers is only valid while the sequence lock is unchanged. +In particular, the PMSELR_EL0 register is zeroed each time the sequence lock is +changed. + +The userspace access is supported in libperf using the perf_evsel__mmap() +and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for +an example. + +About heterogeneous systems +--------------------------- +On heterogeneous systems such as big.LITTLE, userspace PMU counter access can +only be enabled when the tasks are pinned to a homogeneous subset of cores and +the corresponding PMU instance is opened by specifying the 'type' attribute. +The use of generic event types is not supported in this case. + +Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It +can be run using the perf tool to check that the access to the registers works +correctly from userspace: + +.. code-block:: sh + + perf test -v user + +About chained events and counter sizes +-------------------------------------- +The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1) +counter along with userspace access. The sys_perf_event_open syscall will fail +if a 64-bit counter is requested and the hardware doesn't support 64-bit +counters. Chained events are not supported in conjunction with userspace counter +access. If a 32-bit counter is requested on hardware with 64-bit counters, then +userspace must treat the upper 32-bits read from the counter as UNKNOWN. The +'pmc_width' field in the user page will indicate the valid width of the counter +and should be used to mask the upper bits as needed. + +.. Links +.. _tools/perf/arch/arm64/tests/user-events.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c +.. _tools/lib/perf/tests/test-evsel.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst index 03137154299e..9d9a4de5bc34 100644 --- a/Documentation/arm64/sve.rst +++ b/Documentation/arm64/sve.rst @@ -255,7 +255,7 @@ prctl(PR_SVE_GET_VL) vector length change (which would only normally be the case between a fork() or vfork() and the corresponding execve() in typical use). - To extract the vector length from the result, and it with + To extract the vector length from the result, bitwise and it with PR_SVE_VL_LEN_MASK. Return value: a nonnegative value on success, or a negative value on error: diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst index 0c9120ec58ae..540a1d4fc6c9 100644 --- a/Documentation/arm64/tagged-address-abi.rst +++ b/Documentation/arm64/tagged-address-abi.rst @@ -49,7 +49,7 @@ how the user addresses are used by the kernel: - ``brk()``, ``mmap()`` and the ``new_address`` argument to ``mremap()`` as these have the potential to alias with existing - user addresses. + user addresses. NOTE: This behaviour changed in v5.6 and so some earlier kernels may incorrectly accept valid tagged pointers for the ``brk()``, diff --git a/Documentation/devicetree/bindings/perf/arm,cmn.yaml b/Documentation/devicetree/bindings/perf/arm,cmn.yaml index 42424ccbdd0c..2d4219ec7eda 100644 --- a/Documentation/devicetree/bindings/perf/arm,cmn.yaml +++ b/Documentation/devicetree/bindings/perf/arm,cmn.yaml @@ -12,12 +12,14 @@ maintainers: properties: compatible: - const: arm,cmn-600 + enum: + - arm,cmn-600 + - arm,ci-700 reg: items: - description: Physical address of the base (PERIPHBASE) and - size (up to 64MB) of the configuration address space. + size of the configuration address space. interrupts: minItems: 1 @@ -31,14 +33,23 @@ properties: arm,root-node: $ref: /schemas/types.yaml#/definitions/uint32 - description: Offset from PERIPHBASE of the configuration - discovery node (see TRM definition of ROOTNODEBASE). + description: Offset from PERIPHBASE of CMN-600's configuration + discovery node (see TRM definition of ROOTNODEBASE). Not + relevant for newer CMN/CI products. required: - compatible - reg - interrupts - - arm,root-node + +if: + properties: + compatible: + contains: + const: arm,cmn-600 +then: + required: + - arm,root-node additionalProperties: false diff --git a/Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml b/Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml new file mode 100644 index 000000000000..a4b53a6a1ebf --- /dev/null +++ b/Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml @@ -0,0 +1,70 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/perf/arm,smmu-v3-pmcg.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Arm SMMUv3 Performance Monitor Counter Group + +maintainers: + - Will Deacon <will@kernel.org> + - Robin Murphy <robin.murphy@arm.com> + +description: | + An SMMUv3 may have several Performance Monitor Counter Group (PMCG). + They are standalone performance monitoring units that support both + architected and IMPLEMENTATION DEFINED event counters. + +properties: + $nodename: + pattern: "^pmu@[0-9a-f]*" + compatible: + oneOf: + - items: + - const: arm,mmu-600-pmcg + - const: arm,smmu-v3-pmcg + - const: arm,smmu-v3-pmcg + + reg: + items: + - description: Register page 0 + - description: Register page 1, if SMMU_PMCG_CFGR.RELOC_CTRS = 1 + minItems: 1 + + interrupts: + maxItems: 1 + + msi-parent: true + +required: + - compatible + - reg + +anyOf: + - required: + - interrupts + - required: + - msi-parent + +additionalProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/arm-gic.h> + #include <dt-bindings/interrupt-controller/irq.h> + + pmu@2b420000 { + compatible = "arm,smmu-v3-pmcg"; + reg = <0x2b420000 0x1000>, + <0x2b430000 0x1000>; + interrupts = <GIC_SPI 80 IRQ_TYPE_EDGE_RISING>; + msi-parent = <&its 0xff0000>; + }; + + pmu@2b440000 { + compatible = "arm,smmu-v3-pmcg"; + reg = <0x2b440000 0x1000>, + <0x2b450000 0x1000>; + interrupts = <GIC_SPI 81 IRQ_TYPE_EDGE_RISING>; + msi-parent = <&its 0xff0000>; + }; diff --git a/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml b/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml new file mode 100644 index 000000000000..362142252667 --- /dev/null +++ b/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml @@ -0,0 +1,63 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/perf/marvell-cn10k-tad.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Marvell CN10K LLC-TAD performance monitor + +maintainers: + - Bhaskara Budiredla <bbudiredla@marvell.com> + +description: | + The Tag-and-Data units (TADs) maintain coherence and contain CN10K + shared on-chip last level cache (LLC). The tad pmu measures the + performance of last-level cache. Each tad pmu supports up to eight + counters. + + The DT setup comprises of number of tad blocks, the sizes of pmu + regions, tad blocks and overall base address of the HW. + +properties: + compatible: + const: marvell,cn10k-tad-pmu + + reg: + maxItems: 1 + + marvell,tad-cnt: + description: specifies the number of tads on the soc + $ref: /schemas/types.yaml#/definitions/uint32 + + marvell,tad-page-size: + description: specifies the size of each tad page + $ref: /schemas/types.yaml#/definitions/uint32 + + marvell,tad-pmu-page-size: + description: specifies the size of page that the pmu uses + $ref: /schemas/types.yaml#/definitions/uint32 + +required: + - compatible + - reg + - marvell,tad-cnt + - marvell,tad-page-size + - marvell,tad-pmu-page-size + +additionalProperties: false + +examples: + - | + + tad { + #address-cells = <2>; + #size-cells = <2>; + + tad_pmu@80000000 { + compatible = "marvell,cn10k-tad-pmu"; + reg = <0x87e2 0x80000000 0x0 0x1000>; + marvell,tad-cnt = <1>; + marvell,tad-page-size = <0x1000>; + marvell,tad-pmu-page-size = <0x1000>; + }; + }; diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 7367ada13208..b12df9137e1c 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1950,6 +1950,14 @@ There are some more advanced barrier functions: For load from persistent memory, existing read memory barriers are sufficient to ensure read ordering. + (*) io_stop_wc(); + + For memory accesses with write-combining attributes (e.g. those returned + by ioremap_wc(), the CPU may wait for prior accesses to be merged with + subsequent ones. io_stop_wc() can be used to prevent the merging of + write-combining memory accesses before this macro with those after it when + such wait has performance implications. + =============================== IMPLICIT KERNEL MEMORY BARRIERS =============================== |