summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2022-10-04 23:38:03 +0300
committerLinus Torvalds <torvalds@linux-foundation.org>2022-10-04 23:38:03 +0300
commit0326074ff4652329f2a1a9c8685104576bd8d131 (patch)
tree9a7574c7ccb05bf4c7cb34fc5a65457bb8f495cb /Documentation
parent522667b24f08009591c90e75bfe2ffb67f555498 (diff)
parent681bf011b9b5989c6e9db6beb64494918aab9a43 (diff)
downloadlinux-0326074ff4652329f2a1a9c8685104576bd8d131.tar.xz
Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: "Core: - Introduce and use a single page frag cache for allocating small skb heads, clawing back the 10-20% performance regression in UDP flood test from previous fixes. - Run packets which already went thru HW coalescing thru SW GRO. This significantly improves TCP segment coalescing and simplifies deployments as different workloads benefit from HW or SW GRO. - Shrink the size of the base zero-copy send structure. - Move TCP init under a new slow / sleepable version of DO_ONCE(). BPF: - Add BPF-specific, any-context-safe memory allocator. - Add helpers/kfuncs for PKCS#7 signature verification from BPF programs. - Define a new map type and related helpers for user space -> kernel communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF). - Allow targeting BPF iterators to loop through resources of one task/thread. - Add ability to call selected destructive functions. Expose crash_kexec() to allow BPF to trigger a kernel dump. Use CAP_SYS_BOOT check on the loading process to judge permissions. - Enable BPF to collect custom hierarchical cgroup stats efficiently by integrating with the rstat framework. - Support struct arguments for trampoline based programs. Only structs with size <= 16B and x86 are supported. - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping sockets (instead of just TCP and UDP sockets). - Add a helper for accessing CLOCK_TAI for time sensitive network related programs. - Support accessing network tunnel metadata's flags. - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open. - Add support for writing to Netfilter's nf_conn:mark. Protocols: - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation (MLO) work (802.11be, WiFi 7). - vsock: improve support for SO_RCVLOWAT. - SMC: support SO_REUSEPORT. - Netlink: define and document how to use netlink in a "modern" way. Support reporting missing attributes via extended ACK. - IPSec: support collect metadata mode for xfrm interfaces. - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST packets. - TCP: introduce optional per-netns connection hash table to allow better isolation between namespaces (opt-in, at the cost of memory and cache pressure). - MPTCP: support TCP_FASTOPEN_CONNECT. - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior. - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets. - Open vSwitch: - Allow specifying ifindex of new interfaces. - Allow conntrack and metering in non-initial user namespace. - TLS: support the Korean ARIA-GCM crypto algorithm. - Remove DECnet support. Driver API: - Allow selecting the conduit interface used by each port in DSA switches, at runtime. - Ethernet Power Sourcing Equipment and Power Device support. - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per traffic class max frame size for time-based packet schedules. - Support PHY rate matching - adapting between differing host-side and link-side speeds. - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode. - Validate OF (device tree) nodes for DSA shared ports; make phylink-related properties mandatory on DSA and CPU ports. Enforcing more uniformity should allow transitioning to phylink. - Require that flash component name used during update matches one of the components for which version is reported by info_get(). - Remove "weight" argument from driver-facing NAPI API as much as possible. It's one of those magic knobs which seemed like a good idea at the time but is too indirect to use in practice. - Support offload of TLS connections with 256 bit keys. New hardware / drivers: - Ethernet: - Microchip KSZ9896 6-port Gigabit Ethernet Switch - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs - Analog Devices ADIN1110 and ADIN2111 industrial single pair Ethernet (10BASE-T1L) MAC+PHY. - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP). - Ethernet SFPs / modules: - RollBall / Hilink / Turris 10G copper SFPs - HALNy GPON module - WiFi: - CYW43439 SDIO chipset (brcmfmac) - CYW89459 PCIe chipset (brcmfmac) - BCM4378 on Apple platforms (brcmfmac) Drivers: - CAN: - gs_usb: HW timestamp support - Ethernet PHYs: - lan8814: cable diagnostics - Ethernet NICs: - Intel (100G): - implement control of FCS/CRC stripping - port splitting via devlink - L2TPv3 filtering offload - nVidia/Mellanox: - tunnel offload for sub-functions - MACSec offload, w/ Extended packet number and replay window offload - significantly restructure, and optimize the AF_XDP support, align the behavior with other vendors - Huawei: - configuring DSCP map for traffic class selection - querying standard FEC statistics - querying SerDes lane number via ethtool - Marvell/Cavium: - egress priority flow control - MACSec offload - AMD/SolarFlare: - PTP over IPv6 and raw Ethernet - small / embedded: - ax88772: convert to phylink (to support SFP cages) - altera: tse: convert to phylink - ftgmac100: support fixed link - enetc: standard Ethtool counters - macb: ZynqMP SGMII dynamic configuration support - tsnep: support multi-queue and use page pool - lan743x: Rx IP & TCP checksum offload - igc: add xdp frags support to ndo_xdp_xmit - Ethernet high-speed switches: - Marvell (prestera): - support SPAN port features (traffic mirroring) - nexthop object offloading - Microchip (sparx5): - multicast forwarding offload - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets) - Ethernet embedded switches: - Marvell (mv88e6xxx): - support RGMII cmode - NXP (felix): - standardized ethtool counters - Microchip (lan966x): - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets) - traffic policing and mirroring - link aggregation / bonding offload - QUSGMII PHY mode support - Qualcomm 802.11ax WiFi (ath11k): - cold boot calibration support on WCN6750 - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - Wake-on-WLAN support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz - MediaTek WiFi (mt76): - WiFi-to-Ethernet bridging offload for MT7986 chips - RealTek WiFi (rtw89): - P2P support" * tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits) eth: pse: add missing static inlines once: rename _SLOW to _SLEEPABLE net: pse-pd: add regulator based PSE driver dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller ethtool: add interface to interact with Ethernet Power Equipment net: mdiobus: search for PSE nodes by parsing PHY nodes. net: mdiobus: fwnode_mdiobus_register_phy() rework error handling net: add framework to support Ethernet PSE and PDs devices dt-bindings: net: phy: add PoDL PSE property net: marvell: prestera: Propagate nh state from hw to kernel net: marvell: prestera: Add neighbour cache accounting net: marvell: prestera: add stub handler neighbour events net: marvell: prestera: Add heplers to interact with fib_notifier_info net: marvell: prestera: Add length macros for prestera_ip_addr net: marvell: prestera: add delayed wq and flush wq on deinit net: marvell: prestera: Add strict cleanup of fib arbiter net: marvell: prestera: Add cleanup of allocated fib_nodes net: marvell: prestera: Add router nexthops ABI eth: octeon: fix build after netif_napi_add() changes net/mlx5: E-Switch, Return EBUSY if can't get mode lock ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt4
-rw-r--r--Documentation/admin-guide/sysctl/net.rst22
-rw-r--r--Documentation/bpf/clang-notes.rst30
-rw-r--r--Documentation/bpf/index.rst2
-rw-r--r--Documentation/bpf/instruction-set.rst316
-rw-r--r--Documentation/bpf/kfuncs.rst39
-rw-r--r--Documentation/bpf/linux-notes.rst53
-rw-r--r--Documentation/devicetree/bindings/arm/mediatek/mediatek,mt7622-wed.yaml1
-rw-r--r--Documentation/devicetree/bindings/arm/mediatek/mediatek,mt7986-wed-pcie.yaml43
-rw-r--r--Documentation/devicetree/bindings/memory-controllers/mediatek,mt7621-memc.yaml6
-rw-r--r--Documentation/devicetree/bindings/mfd/mscc,ocelot.yaml160
-rw-r--r--Documentation/devicetree/bindings/net/adi,adin1110.yaml77
-rw-r--r--Documentation/devicetree/bindings/net/altera_tse.txt113
-rw-r--r--Documentation/devicetree/bindings/net/altr,tse.yaml168
-rw-r--r--Documentation/devicetree/bindings/net/can/nxp,sja1000.yaml6
-rw-r--r--Documentation/devicetree/bindings/net/cortina,gemini-ethernet.yaml1
-rw-r--r--Documentation/devicetree/bindings/net/dsa/ar9331.txt1
-rw-r--r--Documentation/devicetree/bindings/net/dsa/arrow,xrs700x.yaml3
-rw-r--r--Documentation/devicetree/bindings/net/dsa/brcm,b53.yaml4
-rw-r--r--Documentation/devicetree/bindings/net/dsa/dsa-port.yaml17
-rw-r--r--Documentation/devicetree/bindings/net/dsa/hirschmann,hellcreek.yaml7
-rw-r--r--Documentation/devicetree/bindings/net/dsa/lan9303.txt2
-rw-r--r--Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt1
-rw-r--r--Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml653
-rw-r--r--Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml6
-rw-r--r--Documentation/devicetree/bindings/net/dsa/mscc,ocelot.yaml260
-rw-r--r--Documentation/devicetree/bindings/net/dsa/ocelot.txt213
-rw-r--r--Documentation/devicetree/bindings/net/dsa/qca8k.yaml3
-rw-r--r--Documentation/devicetree/bindings/net/dsa/realtek.yaml2
-rw-r--r--Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml3
-rw-r--r--Documentation/devicetree/bindings/net/dsa/vitesse,vsc73xx.txt2
-rw-r--r--Documentation/devicetree/bindings/net/engleder,tsnep.yaml43
-rw-r--r--Documentation/devicetree/bindings/net/ethernet-controller.yaml1
-rw-r--r--Documentation/devicetree/bindings/net/ethernet-phy.yaml6
-rw-r--r--Documentation/devicetree/bindings/net/fsl,fec.yaml1
-rw-r--r--Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml145
-rw-r--r--Documentation/devicetree/bindings/net/fsl-fman.txt128
-rw-r--r--Documentation/devicetree/bindings/net/mediatek,mt7620-gsw.txt24
-rw-r--r--Documentation/devicetree/bindings/net/mediatek,net.yaml27
-rw-r--r--Documentation/devicetree/bindings/net/mediatek-dwmac.yaml10
-rw-r--r--Documentation/devicetree/bindings/net/microchip,sparx5-switch.yaml36
-rw-r--r--Documentation/devicetree/bindings/net/nfc/marvell,nci.yaml6
-rw-r--r--Documentation/devicetree/bindings/net/nxp,tja11xx.yaml17
-rw-r--r--Documentation/devicetree/bindings/net/pse-pd/podl-pse-regulator.yaml40
-rw-r--r--Documentation/devicetree/bindings/net/pse-pd/pse-controller.yaml33
-rw-r--r--Documentation/devicetree/bindings/net/qca,ar803x.yaml8
-rw-r--r--Documentation/devicetree/bindings/net/ralink,rt2880-net.txt59
-rw-r--r--Documentation/devicetree/bindings/net/ralink,rt3050-esw.txt30
-rw-r--r--Documentation/devicetree/bindings/net/renesas,etheravb.yaml9
-rw-r--r--Documentation/devicetree/bindings/net/rockchip-dwmac.yaml9
-rw-r--r--Documentation/devicetree/bindings/net/snps,dwmac.yaml60
-rw-r--r--Documentation/devicetree/bindings/net/sunplus,sp7021-emac.yaml2
-rw-r--r--Documentation/devicetree/bindings/net/ti,cpsw-switch.yaml4
-rw-r--r--Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml19
-rw-r--r--Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml1
-rw-r--r--Documentation/devicetree/bindings/net/vertexcom-mse102x.yaml2
-rw-r--r--Documentation/devicetree/bindings/net/wireless/brcm,bcm4329-fmac.yaml39
-rw-r--r--Documentation/devicetree/bindings/net/wireless/microchip,wilc1000.yaml7
-rw-r--r--Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml14
-rw-r--r--Documentation/devicetree/bindings/net/wireless/silabs,wfx.yaml15
-rw-r--r--Documentation/devicetree/bindings/net/wireless/ti,wlcore.yaml32
-rw-r--r--Documentation/networking/bonding.rst2
-rw-r--r--Documentation/networking/decnet.rst243
-rw-r--r--Documentation/networking/device_drivers/can/freescale/flexcan.rst2
-rw-r--r--Documentation/networking/device_drivers/ethernet/index.rst1
-rw-r--r--Documentation/networking/device_drivers/ethernet/wangxun/ngbe.rst14
-rw-r--r--Documentation/networking/devlink/ice.rst36
-rw-r--r--Documentation/networking/devlink/index.rst6
-rw-r--r--Documentation/networking/dsa/configuration.rst96
-rw-r--r--Documentation/networking/dsa/dsa.rst38
-rw-r--r--Documentation/networking/ethtool-netlink.rst61
-rw-r--r--Documentation/networking/index.rst2
-rw-r--r--Documentation/networking/ip-sysctl.rst29
-rw-r--r--Documentation/networking/phy.rst15
-rw-r--r--Documentation/networking/representors.rst259
-rw-r--r--Documentation/networking/smc-sysctl.rst25
-rw-r--r--Documentation/networking/switchdev.rst1
-rw-r--r--Documentation/userspace-api/index.rst1
-rw-r--r--Documentation/userspace-api/ioctl/ioctl-number.rst1
-rw-r--r--Documentation/userspace-api/netlink/index.rst12
-rw-r--r--Documentation/userspace-api/netlink/intro.rst681
81 files changed, 3320 insertions, 1250 deletions
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2bc11a61c4d0..e92d63d3e878 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -966,10 +966,6 @@
debugpat [X86] Enable PAT debugging
- decnet.addr= [HW,NET]
- Format: <area>[,<node>]
- See also Documentation/networking/decnet.rst.
-
default_hugepagesz=
[HW] The size of the default HugeTLB page. This is
the size represented by the legacy /proc/ hugepages
diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index 60d44165fba7..6394f5dc2303 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -31,17 +31,18 @@ see only some of them, depending on your kernel's configuration.
Table : Subdirectories in /proc/sys/net
- ========= =================== = ========== ==================
+ ========= =================== = ========== ===================
Directory Content Directory Content
- ========= =================== = ========== ==================
- core General parameter appletalk Appletalk protocol
- unix Unix domain sockets netrom NET/ROM
- 802 E802 protocol ax25 AX25
- ethernet Ethernet protocol rose X.25 PLP layer
+ ========= =================== = ========== ===================
+ 802 E802 protocol mptcp Multipath TCP
+ appletalk Appletalk protocol netfilter Network Filter
+ ax25 AX25 netrom NET/ROM
+ bridge Bridging rose X.25 PLP layer
+ core General parameter tipc TIPC
+ ethernet Ethernet protocol unix Unix domain sockets
ipv4 IP version 4 x25 X.25 protocol
- bridge Bridging decnet DEC net
- ipv6 IP version 6 tipc TIPC
- ========= =================== = ========== ==================
+ ipv6 IP version 6
+ ========= =================== = ========== ===================
1. /proc/sys/net/core - Network core options
============================================
@@ -101,6 +102,9 @@ Values:
- 1 - enable JIT hardening for unprivileged users only
- 2 - enable JIT hardening for all users
+where "privileged user" in this context means a process having
+CAP_BPF or CAP_SYS_ADMIN in the root user name space.
+
bpf_jit_kallsyms
----------------
diff --git a/Documentation/bpf/clang-notes.rst b/Documentation/bpf/clang-notes.rst
new file mode 100644
index 000000000000..528feddf2db9
--- /dev/null
+++ b/Documentation/bpf/clang-notes.rst
@@ -0,0 +1,30 @@
+.. contents::
+.. sectnum::
+
+==========================
+Clang implementation notes
+==========================
+
+This document provides more details specific to the Clang/LLVM implementation of the eBPF instruction set.
+
+Versions
+========
+
+Clang defined "CPU" versions, where a CPU version of 3 corresponds to the current eBPF ISA.
+
+Clang can select the eBPF ISA version using ``-mcpu=v3`` for example to select version 3.
+
+Arithmetic instructions
+=======================
+
+For CPU versions prior to 3, Clang v7.0 and later can enable ``BPF_ALU`` support with
+``-Xclang -target-feature -Xclang +alu32``. In CPU version 3, support is automatically included.
+
+Atomic operations
+=================
+
+Clang can generate atomic instructions by default when ``-mcpu=v3`` is
+enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
+Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
+the atomics features, while keeping a lower ``-mcpu`` version, you can use
+``-Xclang -target-feature -Xclang +alu32``.
diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst
index 1bc2c5c58bdb..1b50de1983ee 100644
--- a/Documentation/bpf/index.rst
+++ b/Documentation/bpf/index.rst
@@ -26,6 +26,8 @@ that goes into great technical depth about the BPF Architecture.
classic_vs_extended.rst
bpf_licensing
test_debug
+ clang-notes
+ linux-notes
other
.. only:: subproject and html
diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
index 0ac7ae40be37..5d798437dad4 100644
--- a/Documentation/bpf/instruction-set.rst
+++ b/Documentation/bpf/instruction-set.rst
@@ -1,7 +1,12 @@
+.. contents::
+.. sectnum::
+
+========================================
+eBPF Instruction Set Specification, v1.0
+========================================
+
+This document specifies version 1.0 of the eBPF instruction set.
-====================
-eBPF Instruction Set
-====================
Registers and calling convention
================================
@@ -11,10 +16,10 @@ all of which are 64-bits wide.
The eBPF calling convention is defined as:
- * R0: return value from function calls, and exit value for eBPF programs
- * R1 - R5: arguments for function calls
- * R6 - R9: callee saved registers that function calls will preserve
- * R10: read-only frame pointer to access stack
+* R0: return value from function calls, and exit value for eBPF programs
+* R1 - R5: arguments for function calls
+* R6 - R9: callee saved registers that function calls will preserve
+* R10: read-only frame pointer to access stack
R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
necessary across calls.
@@ -24,17 +29,17 @@ Instruction encoding
eBPF has two instruction encodings:
- * the basic instruction encoding, which uses 64 bits to encode an instruction
- * the wide instruction encoding, which appends a second 64-bit immediate value
- (imm64) after the basic instruction for a total of 128 bits.
+* the basic instruction encoding, which uses 64 bits to encode an instruction
+* the wide instruction encoding, which appends a second 64-bit immediate value
+ (imm64) after the basic instruction for a total of 128 bits.
The basic instruction encoding looks as follows:
- ============= ======= =============== ==================== ============
- 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
- ============= ======= =============== ==================== ============
- immediate offset source register destination register opcode
- ============= ======= =============== ==================== ============
+============= ======= =============== ==================== ============
+32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
+============= ======= =============== ==================== ============
+immediate offset source register destination register opcode
+============= ======= =============== ==================== ============
Note that most instructions do not use all of the fields.
Unused fields shall be cleared to zero.
@@ -44,30 +49,30 @@ Instruction classes
The three LSB bits of the 'opcode' field store the instruction class:
- ========= ===== ===============================
- class value description
- ========= ===== ===============================
- BPF_LD 0x00 non-standard load operations
- BPF_LDX 0x01 load into register operations
- BPF_ST 0x02 store from immediate operations
- BPF_STX 0x03 store from register operations
- BPF_ALU 0x04 32-bit arithmetic operations
- BPF_JMP 0x05 64-bit jump operations
- BPF_JMP32 0x06 32-bit jump operations
- BPF_ALU64 0x07 64-bit arithmetic operations
- ========= ===== ===============================
+========= ===== =============================== ===================================
+class value description reference
+========= ===== =============================== ===================================
+BPF_LD 0x00 non-standard load operations `Load and store instructions`_
+BPF_LDX 0x01 load into register operations `Load and store instructions`_
+BPF_ST 0x02 store from immediate operations `Load and store instructions`_
+BPF_STX 0x03 store from register operations `Load and store instructions`_
+BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump instructions`_
+BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump instructions`_
+BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump instructions`_
+BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump instructions`_
+========= ===== =============================== ===================================
Arithmetic and jump instructions
================================
-For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and
-BPF_JMP32), the 8-bit 'opcode' field is divided into three parts:
+For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and
+``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts:
- ============== ====== =================
- 4 bits (MSB) 1 bit 3 bits (LSB)
- ============== ====== =================
- operation code source instruction class
- ============== ====== =================
+============== ====== =================
+4 bits (MSB) 1 bit 3 bits (LSB)
+============== ====== =================
+operation code source instruction class
+============== ====== =================
The 4th bit encodes the source operand:
@@ -84,51 +89,51 @@ The four MSB bits store the operation code.
Arithmetic instructions
-----------------------
-BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for
+``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
otherwise identical operations.
-The code field encodes the operation as below:
-
- ======== ===== =================================================
- code value description
- ======== ===== =================================================
- BPF_ADD 0x00 dst += src
- BPF_SUB 0x10 dst -= src
- BPF_MUL 0x20 dst \*= src
- BPF_DIV 0x30 dst /= src
- BPF_OR 0x40 dst \|= src
- BPF_AND 0x50 dst &= src
- BPF_LSH 0x60 dst <<= src
- BPF_RSH 0x70 dst >>= src
- BPF_NEG 0x80 dst = ~src
- BPF_MOD 0x90 dst %= src
- BPF_XOR 0xa0 dst ^= src
- BPF_MOV 0xb0 dst = src
- BPF_ARSH 0xc0 sign extending shift right
- BPF_END 0xd0 byte swap operations (see separate section below)
- ======== ===== =================================================
-
-BPF_ADD | BPF_X | BPF_ALU means::
+The 'code' field encodes the operation as below:
+
+======== ===== ==========================================================
+code value description
+======== ===== ==========================================================
+BPF_ADD 0x00 dst += src
+BPF_SUB 0x10 dst -= src
+BPF_MUL 0x20 dst \*= src
+BPF_DIV 0x30 dst /= src
+BPF_OR 0x40 dst \|= src
+BPF_AND 0x50 dst &= src
+BPF_LSH 0x60 dst <<= src
+BPF_RSH 0x70 dst >>= src
+BPF_NEG 0x80 dst = ~src
+BPF_MOD 0x90 dst %= src
+BPF_XOR 0xa0 dst ^= src
+BPF_MOV 0xb0 dst = src
+BPF_ARSH 0xc0 sign extending shift right
+BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below)
+======== ===== ==========================================================
+
+``BPF_ADD | BPF_X | BPF_ALU`` means::
dst_reg = (u32) dst_reg + (u32) src_reg;
-BPF_ADD | BPF_X | BPF_ALU64 means::
+``BPF_ADD | BPF_X | BPF_ALU64`` means::
dst_reg = dst_reg + src_reg
-BPF_XOR | BPF_K | BPF_ALU means::
+``BPF_XOR | BPF_K | BPF_ALU`` means::
src_reg = (u32) src_reg ^ (u32) imm32
-BPF_XOR | BPF_K | BPF_ALU64 means::
+``BPF_XOR | BPF_K | BPF_ALU64`` means::
src_reg = src_reg ^ imm32
Byte swap instructions
-----------------------
+~~~~~~~~~~~~~~~~~~~~~~
The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
-code field of ``BPF_END``.
+'code' field of ``BPF_END``.
The byte swap instructions operate on the destination register
only and do not use a separate source register or immediate value.
@@ -136,14 +141,14 @@ only and do not use a separate source register or immediate value.
The 1-bit source operand field in the opcode is used to select what byte
order the operation convert from or to:
- ========= ===== =================================================
- source value description
- ========= ===== =================================================
- BPF_TO_LE 0x00 convert between host byte order and little endian
- BPF_TO_BE 0x08 convert between host byte order and big endian
- ========= ===== =================================================
+========= ===== =================================================
+source value description
+========= ===== =================================================
+BPF_TO_LE 0x00 convert between host byte order and little endian
+BPF_TO_BE 0x08 convert between host byte order and big endian
+========= ===== =================================================
-The imm field encodes the width of the swap operations. The following widths
+The 'imm' field encodes the width of the swap operations. The following widths
are supported: 16, 32 and 64.
Examples:
@@ -156,35 +161,31 @@ Examples:
dst_reg = htobe64(dst_reg)
-``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and
-``BPF_TO_BE`` respectively.
-
-
Jump instructions
-----------------
-BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for
+``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
otherwise identical operations.
-The code field encodes the operation as below:
-
- ======== ===== ========================= ============
- code value description notes
- ======== ===== ========================= ============
- BPF_JA 0x00 PC += off BPF_JMP only
- BPF_JEQ 0x10 PC += off if dst == src
- BPF_JGT 0x20 PC += off if dst > src unsigned
- BPF_JGE 0x30 PC += off if dst >= src unsigned
- BPF_JSET 0x40 PC += off if dst & src
- BPF_JNE 0x50 PC += off if dst != src
- BPF_JSGT 0x60 PC += off if dst > src signed
- BPF_JSGE 0x70 PC += off if dst >= src signed
- BPF_CALL 0x80 function call
- BPF_EXIT 0x90 function / program return BPF_JMP only
- BPF_JLT 0xa0 PC += off if dst < src unsigned
- BPF_JLE 0xb0 PC += off if dst <= src unsigned
- BPF_JSLT 0xc0 PC += off if dst < src signed
- BPF_JSLE 0xd0 PC += off if dst <= src signed
- ======== ===== ========================= ============
+The 'code' field encodes the operation as below:
+
+======== ===== ========================= ============
+code value description notes
+======== ===== ========================= ============
+BPF_JA 0x00 PC += off BPF_JMP only
+BPF_JEQ 0x10 PC += off if dst == src
+BPF_JGT 0x20 PC += off if dst > src unsigned
+BPF_JGE 0x30 PC += off if dst >= src unsigned
+BPF_JSET 0x40 PC += off if dst & src
+BPF_JNE 0x50 PC += off if dst != src
+BPF_JSGT 0x60 PC += off if dst > src signed
+BPF_JSGE 0x70 PC += off if dst >= src signed
+BPF_CALL 0x80 function call
+BPF_EXIT 0x90 function / program return BPF_JMP only
+BPF_JLT 0xa0 PC += off if dst < src unsigned
+BPF_JLE 0xb0 PC += off if dst <= src unsigned
+BPF_JSLT 0xc0 PC += off if dst < src signed
+BPF_JSLE 0xd0 PC += off if dst <= src signed
+======== ===== ========================= ============
The eBPF program needs to store the return value into register R0 before doing a
BPF_EXIT.
@@ -193,14 +194,26 @@ BPF_EXIT.
Load and store instructions
===========================
-For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the
+For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the
8-bit 'opcode' field is divided as:
- ============ ====== =================
- 3 bits (MSB) 2 bits 3 bits (LSB)
- ============ ====== =================
- mode size instruction class
- ============ ====== =================
+============ ====== =================
+3 bits (MSB) 2 bits 3 bits (LSB)
+============ ====== =================
+mode size instruction class
+============ ====== =================
+
+The mode modifier is one of:
+
+ ============= ===== ==================================== =============
+ mode modifier value description reference
+ ============= ===== ==================================== =============
+ BPF_IMM 0x00 64-bit immediate instructions `64-bit immediate instructions`_
+ BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_
+ BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_
+ BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_
+ BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_
+ ============= ===== ==================================== =============
The size modifier is one of:
@@ -213,19 +226,6 @@ The size modifier is one of:
BPF_DW 0x18 double word (8 bytes)
============= ===== =====================
-The mode modifier is one of:
-
- ============= ===== ====================================
- mode modifier value description
- ============= ===== ====================================
- BPF_IMM 0x00 64-bit immediate instructions
- BPF_ABS 0x20 legacy BPF packet access (absolute)
- BPF_IND 0x40 legacy BPF packet access (indirect)
- BPF_MEM 0x60 regular load and store operations
- BPF_ATOMIC 0xc0 atomic operations
- ============= ===== ====================================
-
-
Regular load and store operations
---------------------------------
@@ -256,44 +256,42 @@ by other eBPF programs or means outside of this specification.
All atomic operations supported by eBPF are encoded as store operations
that use the ``BPF_ATOMIC`` mode modifier as follows:
- * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
- * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
- * 8-bit and 16-bit wide atomic operations are not supported.
+* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
+* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
+* 8-bit and 16-bit wide atomic operations are not supported.
-The imm field is used to encode the actual atomic operation.
+The 'imm' field is used to encode the actual atomic operation.
Simple atomic operation use a subset of the values defined to encode
-arithmetic operations in the imm field to encode the atomic operation:
+arithmetic operations in the 'imm' field to encode the atomic operation:
- ======== ===== ===========
- imm value description
- ======== ===== ===========
- BPF_ADD 0x00 atomic add
- BPF_OR 0x40 atomic or
- BPF_AND 0x50 atomic and
- BPF_XOR 0xa0 atomic xor
- ======== ===== ===========
+======== ===== ===========
+imm value description
+======== ===== ===========
+BPF_ADD 0x00 atomic add
+BPF_OR 0x40 atomic or
+BPF_AND 0x50 atomic and
+BPF_XOR 0xa0 atomic xor
+======== ===== ===========
-``BPF_ATOMIC | BPF_W | BPF_STX`` with imm = BPF_ADD means::
+``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means::
*(u32 *)(dst_reg + off16) += src_reg
-``BPF_ATOMIC | BPF_DW | BPF_STX`` with imm = BPF ADD means::
+``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
*(u64 *)(dst_reg + off16) += src_reg
-``BPF_XADD`` is a deprecated name for ``BPF_ATOMIC | BPF_ADD``.
-
In addition to the simple atomic operations, there also is a modifier and
two complex atomic operations:
- =========== ================ ===========================
- imm value description
- =========== ================ ===========================
- BPF_FETCH 0x01 modifier: return old value
- BPF_XCHG 0xe0 | BPF_FETCH atomic exchange
- BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange
- =========== ================ ===========================
+=========== ================ ===========================
+imm value description
+=========== ================ ===========================
+BPF_FETCH 0x01 modifier: return old value
+BPF_XCHG 0xe0 | BPF_FETCH atomic exchange
+BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange
+=========== ================ ===========================
The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
always set for the complex atomic operations. If the ``BPF_FETCH`` flag
@@ -309,16 +307,10 @@ The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
value that was at ``dst_reg + off`` before the operation is zero-extended
and loaded back to ``R0``.
-Clang can generate atomic instructions by default when ``-mcpu=v3`` is
-enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
-Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
-the atomics features, while keeping a lower ``-mcpu`` version, you can use
-``-Xclang -target-feature -Xclang +alu32``.
-
64-bit immediate instructions
-----------------------------
-Instructions with the ``BPF_IMM`` mode modifier use the wide instruction
+Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
encoding for an extra imm64 value.
There is currently only one such instruction.
@@ -331,36 +323,6 @@ There is currently only one such instruction.
Legacy BPF Packet access instructions
-------------------------------------
-eBPF has special instructions for access to packet data that have been
-carried over from classic BPF to retain the performance of legacy socket
-filters running in the eBPF interpreter.
-
-The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and
-``BPF_IND | <size> | BPF_LD``.
-
-These instructions are used to access packet data and can only be used when
-the program context is a pointer to networking packet. ``BPF_ABS``
-accesses packet data at an absolute offset specified by the immediate data
-and ``BPF_IND`` access packet data at an offset that includes the value of
-a register in addition to the immediate data.
-
-These instructions have seven implicit operands:
-
- * Register R6 is an implicit input that must contain pointer to a
- struct sk_buff.
- * Register R0 is an implicit output which contains the data fetched from
- the packet.
- * Registers R1-R5 are scratch registers that are clobbered after a call to
- ``BPF_ABS | BPF_LD`` or ``BPF_IND | BPF_LD`` instructions.
-
-These instructions have an implicit program exit condition as well. When an
-eBPF program is trying to access the data beyond the packet boundary, the
-program execution will be aborted.
-
-``BPF_ABS | BPF_W | BPF_LD`` means::
-
- R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + imm32))
-
-``BPF_IND | BPF_W | BPF_LD`` means::
-
- R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
+eBPF previously introduced special instructions for access to packet data that were
+carried over from classic BPF. However, these instructions are
+deprecated and should no longer be used.
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index c0b7dae6dbf5..0f858156371d 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -137,14 +137,37 @@ KF_ACQUIRE and KF_RET_NULL flags.
--------------------------
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
-indicates that the all pointer arguments will always be refcounted, and have
-their offset set to 0. It can be used to enforce that a pointer to a refcounted
-object acquired from a kfunc or BPF helper is passed as an argument to this
-kfunc without any modifications (e.g. pointer arithmetic) such that it is
-trusted and points to the original object. This flag is often used for kfuncs
-that operate (change some property, perform some operation) on an object that
-was obtained using an acquire kfunc. Such kfuncs need an unchanged pointer to
-ensure the integrity of the operation being performed on the expected object.
+indicates that the all pointer arguments will always have a guaranteed lifetime,
+and pointers to kernel objects are always passed to helpers in their unmodified
+form (as obtained from acquire kfuncs).
+
+It can be used to enforce that a pointer to a refcounted object acquired from a
+kfunc or BPF helper is passed as an argument to this kfunc without any
+modifications (e.g. pointer arithmetic) such that it is trusted and points to
+the original object.
+
+Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
+but those can have a non-zero offset.
+
+This flag is often used for kfuncs that operate (change some property, perform
+some operation) on an object that was obtained using an acquire kfunc. Such
+kfuncs need an unchanged pointer to ensure the integrity of the operation being
+performed on the expected object.
+
+2.4.6 KF_SLEEPABLE flag
+-----------------------
+
+The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
+be called by sleepable BPF programs (BPF_F_SLEEPABLE).
+
+2.4.7 KF_DESTRUCTIVE flag
+--------------------------
+
+The KF_DESTRUCTIVE flag is used to indicate functions calling which is
+destructive to the system. For example such a call can result in system
+rebooting or panicking. Due to this additional restrictions apply to these
+calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
+added later.
2.5 Registering the kfuncs
--------------------------
diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
new file mode 100644
index 000000000000..956b0c86699d
--- /dev/null
+++ b/Documentation/bpf/linux-notes.rst
@@ -0,0 +1,53 @@
+.. contents::
+.. sectnum::
+
+==========================
+Linux implementation notes
+==========================
+
+This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
+
+Byte swap instructions
+======================
+
+``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and ``BPF_TO_BE`` respectively.
+
+Legacy BPF Packet access instructions
+=====================================
+
+As mentioned in the `ISA standard documentation <instruction-set.rst#legacy-bpf-packet-access-instructions>`_,
+Linux has special eBPF instructions for access to packet data that have been
+carried over from classic BPF to retain the performance of legacy socket
+filters running in the eBPF interpreter.
+
+The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and
+``BPF_IND | <size> | BPF_LD``.
+
+These instructions are used to access packet data and can only be used when
+the program context is a pointer to a networking packet. ``BPF_ABS``
+accesses packet data at an absolute offset specified by the immediate data
+and ``BPF_IND`` access packet data at an offset that includes the value of
+a register in addition to the immediate data.
+
+These instructions have seven implicit operands:
+
+* Register R6 is an implicit input that must contain a pointer to a
+ struct sk_buff.
+* Register R0 is an implicit output which contains the data fetched from
+ the packet.
+* Registers R1-R5 are scratch registers that are clobbered by the
+ instruction.
+
+These instructions have an implicit program exit condition as well. If an
+eBPF program attempts access data beyond the packet boundary, the
+program execution will be aborted.
+
+``BPF_ABS | BPF_W | BPF_LD`` (0x20) means::
+
+ R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + imm))
+
+where ``ntohl()`` converts a 32-bit value from network byte order to host byte order.
+
+``BPF_IND | BPF_W | BPF_LD`` (0x40) means::
+
+ R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + src + imm))
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mt7622-wed.yaml b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mt7622-wed.yaml
index 787d6673f952..84fb0a146b6e 100644
--- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mt7622-wed.yaml
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mt7622-wed.yaml
@@ -20,6 +20,7 @@ properties:
items:
- enum:
- mediatek,mt7622-wed
+ - mediatek,mt7986-wed
- const: syscon
reg:
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mt7986-wed-pcie.yaml b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mt7986-wed-pcie.yaml
new file mode 100644
index 000000000000..96221f51c1c3
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mt7986-wed-pcie.yaml
@@ -0,0 +1,43 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/arm/mediatek/mediatek,mt7986-wed-pcie.yaml#"
+$schema: "http://devicetree.org/meta-schemas/core.yaml#"
+
+title: MediaTek PCIE WED Controller for MT7986
+
+maintainers:
+ - Lorenzo Bianconi <lorenzo@kernel.org>
+ - Felix Fietkau <nbd@nbd.name>
+
+description:
+ The mediatek WED PCIE provides a configuration interface for PCIE
+ controller on MT7986 soc.
+
+properties:
+ compatible:
+ items:
+ - enum:
+ - mediatek,mt7986-wed-pcie
+ - const: syscon
+
+ reg:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+
+additionalProperties: false
+
+examples:
+ - |
+ soc {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ wed_pcie: wed-pcie@10003000 {
+ compatible = "mediatek,mt7986-wed-pcie",
+ "syscon";
+ reg = <0 0x10003000 0 0x10>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/memory-controllers/mediatek,mt7621-memc.yaml b/Documentation/devicetree/bindings/memory-controllers/mediatek,mt7621-memc.yaml
index 85e02854f083..6ccdaf99c778 100644
--- a/Documentation/devicetree/bindings/memory-controllers/mediatek,mt7621-memc.yaml
+++ b/Documentation/devicetree/bindings/memory-controllers/mediatek,mt7621-memc.yaml
@@ -11,7 +11,9 @@ maintainers:
properties:
compatible:
- const: mediatek,mt7621-memc
+ items:
+ - const: mediatek,mt7621-memc
+ - const: syscon
reg:
maxItems: 1
@@ -25,6 +27,6 @@ additionalProperties: false
examples:
- |
memory-controller@5000 {
- compatible = "mediatek,mt7621-memc";
+ compatible = "mediatek,mt7621-memc", "syscon";
reg = <0x5000 0x1000>;
};
diff --git a/Documentation/devicetree/bindings/mfd/mscc,ocelot.yaml b/Documentation/devicetree/bindings/mfd/mscc,ocelot.yaml
new file mode 100644
index 000000000000..8bf45a5673a4
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/mscc,ocelot.yaml
@@ -0,0 +1,160 @@
+# SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/mfd/mscc,ocelot.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Ocelot Externally-Controlled Ethernet Switch
+
+maintainers:
+ - Colin Foster <colin.foster@in-advantage.com>
+
+description: |
+ The Ocelot ethernet switch family contains chips that have an internal CPU
+ (VSC7513, VSC7514) and chips that don't (VSC7511, VSC7512). All switches have
+ the option to be controlled externally, which is the purpose of this driver.
+
+ The switch family is a multi-port networking switch that supports many
+ interfaces. Additionally, the device can perform pin control, MDIO buses, and
+ external GPIO expanders.
+
+properties:
+ compatible:
+ enum:
+ - mscc,vsc7512
+
+ reg:
+ maxItems: 1
+
+ "#address-cells":
+ const: 1
+
+ "#size-cells":
+ const: 1
+
+ spi-max-frequency:
+ maxItems: 1
+
+patternProperties:
+ "^pinctrl@[0-9a-f]+$":
+ type: object
+ $ref: /schemas/pinctrl/mscc,ocelot-pinctrl.yaml
+
+ "^gpio@[0-9a-f]+$":
+ type: object
+ $ref: /schemas/pinctrl/microchip,sparx5-sgpio.yaml
+ properties:
+ compatible:
+ enum:
+ - mscc,ocelot-sgpio
+
+ "^mdio@[0-9a-f]+$":
+ type: object
+ $ref: /schemas/net/mscc,miim.yaml
+ properties:
+ compatible:
+ enum:
+ - mscc,ocelot-miim
+
+required:
+ - compatible
+ - reg
+ - '#address-cells'
+ - '#size-cells'
+ - spi-max-frequency
+
+additionalProperties: false
+
+examples:
+ - |
+ ocelot_clock: ocelot-clock {
+ compatible = "fixed-clock";
+ #clock-cells = <0>;
+ clock-frequency = <125000000>;
+ };
+
+ spi {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ soc@0 {
+ compatible = "mscc,vsc7512";
+ spi-max-frequency = <2500000>;
+ reg = <0>;
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ mdio@7107009c {
+ compatible = "mscc,ocelot-miim";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ reg = <0x7107009c 0x24>;
+
+ sw_phy0: ethernet-phy@0 {
+ reg = <0x0>;
+ };
+ };
+
+ mdio@710700c0 {
+ compatible = "mscc,ocelot-miim";
+ pinctrl-names = "default";
+ pinctrl-0 = <&miim1_pins>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ reg = <0x710700c0 0x24>;
+
+ sw_phy4: ethernet-phy@4 {
+ reg = <0x4>;
+ };
+ };
+
+ gpio: pinctrl@71070034 {
+ compatible = "mscc,ocelot-pinctrl";
+ gpio-controller;
+ #gpio-cells = <2>;
+ gpio-ranges = <&gpio 0 0 22>;
+ reg = <0x71070034 0x6c>;
+
+ sgpio_pins: sgpio-pins {
+ pins = "GPIO_0", "GPIO_1", "GPIO_2", "GPIO_3";
+ function = "sg0";
+ };
+
+ miim1_pins: miim1-pins {
+ pins = "GPIO_14", "GPIO_15";
+ function = "miim";
+ };
+ };
+
+ gpio@710700f8 {
+ compatible = "mscc,ocelot-sgpio";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ bus-frequency = <12500000>;
+ clocks = <&ocelot_clock>;
+ microchip,sgpio-port-ranges = <0 15>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&sgpio_pins>;
+ reg = <0x710700f8 0x100>;
+
+ sgpio_in0: gpio@0 {
+ compatible = "microchip,sparx5-sgpio-bank";
+ reg = <0>;
+ gpio-controller;
+ #gpio-cells = <3>;
+ ngpios = <64>;
+ };
+
+ sgpio_out1: gpio@1 {
+ compatible = "microchip,sparx5-sgpio-bank";
+ reg = <1>;
+ gpio-controller;
+ #gpio-cells = <3>;
+ ngpios = <64>;
+ };
+ };
+ };
+ };
+
+...
+
diff --git a/Documentation/devicetree/bindings/net/adi,adin1110.yaml b/Documentation/devicetree/bindings/net/adi,adin1110.yaml
new file mode 100644
index 000000000000..b6bd8ee38a18
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/adi,adin1110.yaml
@@ -0,0 +1,77 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/adi,adin1110.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: ADI ADIN1110 MAC-PHY
+
+maintainers:
+ - Alexandru Tachici <alexandru.tachici@analog.com>
+
+description: |
+ The ADIN1110 is a low power single port 10BASE-T1L MAC-
+ PHY designed for industrial Ethernet applications. It integrates
+ an Ethernet PHY core with a MAC and all the associated analog
+ circuitry, input and output clock buffering.
+
+ The ADIN2111 is a low power, low complexity, two-Ethernet ports
+ switch with integrated 10BASE-T1L PHYs and one serial peripheral
+ interface (SPI) port. The device is designed for industrial Ethernet
+ applications using low power constrained nodes and is compliant
+ with the IEEE 802.3cg-2019 Ethernet standard for long reach
+ 10 Mbps single pair Ethernet (SPE).
+
+ The device has a 4-wire SPI interface for communication
+ between the MAC and host processor.
+
+allOf:
+ - $ref: ethernet-controller.yaml#
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
+
+properties:
+ compatible:
+ enum:
+ - adi,adin1110
+ - adi,adin2111
+
+ reg:
+ maxItems: 1
+
+ adi,spi-crc:
+ description: |
+ Enable CRC8 checks on SPI read/writes.
+ type: boolean
+
+ interrupts:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - interrupts
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+
+ spi {
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ethernet@0 {
+ compatible = "adi,adin2111";
+ reg = <0>;
+ spi-max-frequency = <24500000>;
+
+ adi,spi-crc;
+
+ interrupt-parent = <&gpio>;
+ interrupts = <25 IRQ_TYPE_LEVEL_LOW>;
+
+ local-mac-address = [ 00 11 22 33 44 55 ];
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/altera_tse.txt b/Documentation/devicetree/bindings/net/altera_tse.txt
deleted file mode 100644
index 1d9148ff5130..000000000000
--- a/Documentation/devicetree/bindings/net/altera_tse.txt
+++ /dev/null
@@ -1,113 +0,0 @@
-* Altera Triple-Speed Ethernet MAC driver (TSE)
-
-Required properties:
-- compatible: Should be "altr,tse-1.0" for legacy SGDMA based TSE, and should
- be "altr,tse-msgdma-1.0" for the preferred MSGDMA based TSE.
- ALTR is supported for legacy device trees, but is deprecated.
- altr should be used for all new designs.
-- reg: Address and length of the register set for the device. It contains
- the information of registers in the same order as described by reg-names
-- reg-names: Should contain the reg names
- "control_port": MAC configuration space region
- "tx_csr": xDMA Tx dispatcher control and status space region
- "tx_desc": MSGDMA Tx dispatcher descriptor space region
- "rx_csr" : xDMA Rx dispatcher control and status space region
- "rx_desc": MSGDMA Rx dispatcher descriptor space region
- "rx_resp": MSGDMA Rx dispatcher response space region
- "s1": SGDMA descriptor memory
-- interrupts: Should contain the TSE interrupts and its mode.
-- interrupt-names: Should contain the interrupt names
- "rx_irq": xDMA Rx dispatcher interrupt
- "tx_irq": xDMA Tx dispatcher interrupt
-- rx-fifo-depth: MAC receive FIFO buffer depth in bytes
-- tx-fifo-depth: MAC transmit FIFO buffer depth in bytes
-- phy-mode: See ethernet.txt in the same directory.
-- phy-handle: See ethernet.txt in the same directory.
-- phy-addr: See ethernet.txt in the same directory. A configuration should
- include phy-handle or phy-addr.
-- altr,has-supplementary-unicast:
- If present, TSE supports additional unicast addresses.
- Otherwise additional unicast addresses are not supported.
-- altr,has-hash-multicast-filter:
- If present, TSE supports a hash based multicast filter.
- Otherwise, hash-based multicast filtering is not supported.
-
-- mdio device tree subnode: When the TSE has a phy connected to its local
- mdio, there must be device tree subnode with the following
- required properties:
-
- - compatible: Must be "altr,tse-mdio".
- - #address-cells: Must be <1>.
- - #size-cells: Must be <0>.
-
- For each phy on the mdio bus, there must be a node with the following
- fields:
-
- - reg: phy id used to communicate to phy.
- - device_type: Must be "ethernet-phy".
-
-The MAC address will be determined using the optional properties defined in
-ethernet.txt.
-
-Example:
-
- tse_sub_0_eth_tse_0: ethernet@1,00000000 {
- compatible = "altr,tse-msgdma-1.0";
- reg = <0x00000001 0x00000000 0x00000400>,
- <0x00000001 0x00000460 0x00000020>,
- <0x00000001 0x00000480 0x00000020>,
- <0x00000001 0x000004A0 0x00000008>,
- <0x00000001 0x00000400 0x00000020>,
- <0x00000001 0x00000420 0x00000020>;
- reg-names = "control_port", "rx_csr", "rx_desc", "rx_resp", "tx_csr", "tx_desc";
- interrupt-parent = <&hps_0_arm_gic_0>;
- interrupts = <0 41 4>, <0 40 4>;
- interrupt-names = "rx_irq", "tx_irq";
- rx-fifo-depth = <2048>;
- tx-fifo-depth = <2048>;
- address-bits = <48>;
- max-frame-size = <1500>;
- local-mac-address = [ 00 00 00 00 00 00 ];
- phy-mode = "gmii";
- altr,has-supplementary-unicast;
- altr,has-hash-multicast-filter;
- phy-handle = <&phy0>;
- mdio {
- compatible = "altr,tse-mdio";
- #address-cells = <1>;
- #size-cells = <0>;
- phy0: ethernet-phy@0 {
- reg = <0x0>;
- device_type = "ethernet-phy";
- };
-
- phy1: ethernet-phy@1 {
- reg = <0x1>;
- device_type = "ethernet-phy";
- };
-
- };
- };
-
- tse_sub_1_eth_tse_0: ethernet@1,00001000 {
- compatible = "altr,tse-msgdma-1.0";
- reg = <0x00000001 0x00001000 0x00000400>,
- <0x00000001 0x00001460 0x00000020>,
- <0x00000001 0x00001480 0x00000020>,
- <0x00000001 0x000014A0 0x00000008>,
- <0x00000001 0x00001400 0x00000020>,
- <0x00000001 0x00001420 0x00000020>;
- reg-names = "control_port", "rx_csr", "rx_desc", "rx_resp", "tx_csr", "tx_desc";
- interrupt-parent = <&hps_0_arm_gic_0>;
- interrupts = <0 43 4>, <0 42 4>;
- interrupt-names = "rx_irq", "tx_irq";
- rx-fifo-depth = <2048>;
- tx-fifo-depth = <2048>;
- address-bits = <48>;
- max-frame-size = <1500>;
- local-mac-address = [ 00 00 00 00 00 00 ];
- phy-mode = "gmii";
- altr,has-supplementary-unicast;
- altr,has-hash-multicast-filter;
- phy-handle = <&phy1>;
- };
diff --git a/Documentation/devicetree/bindings/net/altr,tse.yaml b/Documentation/devicetree/bindings/net/altr,tse.yaml
new file mode 100644
index 000000000000..8d1d94494349
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/altr,tse.yaml
@@ -0,0 +1,168 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/altr,tse.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Altera Triple Speed Ethernet MAC driver (TSE)
+
+maintainers:
+ - Maxime Chevallier <maxime.chevallier@bootlin.com>
+
+properties:
+ compatible:
+ oneOf:
+ - const: altr,tse-1.0
+ - const: ALTR,tse-1.0
+ deprecated: true
+ - const: altr,tse-msgdma-1.0
+
+ interrupts:
+ minItems: 2
+
+ interrupt-names:
+ items:
+ - const: rx_irq
+ - const: tx_irq
+
+ rx-fifo-depth:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ Depth in bytes of the RX FIFO
+
+ tx-fifo-depth:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ Depth in bytes of the TX FIFO
+
+ altr,has-supplementary-unicast:
+ type: boolean
+ description:
+ If present, TSE supports additional unicast addresses.
+
+ altr,has-hash-multicast-filter:
+ type: boolean
+ description:
+ If present, TSE supports hash based multicast filter.
+
+ mdio:
+ $ref: mdio.yaml#
+ unevaluatedProperties: false
+ description:
+ Creates and registers an MDIO bus.
+
+ properties:
+ compatible:
+ const: altr,tse-mdio
+
+ required:
+ - compatible
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - rx-fifo-depth
+ - tx-fifo-depth
+
+allOf:
+ - $ref: "ethernet-controller.yaml#"
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - const: altr,tse-1.0
+ - const: ALTR,tse-1.0
+ then:
+ properties:
+ reg:
+ minItems: 4
+ reg-names:
+ items:
+ - const: control_port
+ - const: rx_csr
+ - const: tx_csr
+ - const: s1
+
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - altr,tse-msgdma-1.0
+ then:
+ properties:
+ reg:
+ minItems: 6
+ maxItems: 7
+ reg-names:
+ minItems: 6
+ items:
+ - const: control_port
+ - const: rx_csr
+ - const: rx_desc
+ - const: rx_resp
+ - const: tx_csr
+ - const: tx_desc
+ - const: pcs
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ tse_sub_0: ethernet@c0100000 {
+ compatible = "altr,tse-msgdma-1.0";
+ reg = <0xc0100000 0x00000400>,
+ <0xc0101000 0x00000020>,
+ <0xc0102000 0x00000020>,
+ <0xc0103000 0x00000008>,
+ <0xc0104000 0x00000020>,
+ <0xc0105000 0x00000020>,
+ <0xc0106000 0x00000100>;
+ reg-names = "control_port", "rx_csr", "rx_desc", "rx_resp", "tx_csr", "tx_desc", "pcs";
+ interrupt-parent = <&intc>;
+ interrupts = <0 44 4>,<0 45 4>;
+ interrupt-names = "rx_irq","tx_irq";
+ rx-fifo-depth = <2048>;
+ tx-fifo-depth = <2048>;
+ max-frame-size = <1500>;
+ local-mac-address = [ 00 00 00 00 00 00 ];
+ altr,has-supplementary-unicast;
+ altr,has-hash-multicast-filter;
+ sfp = <&sfp0>;
+ phy-mode = "sgmii";
+ managed = "in-band-status";
+ };
+ - |
+ tse_sub_1_eth_tse_0: ethernet@1,00001000 {
+ compatible = "altr,tse-msgdma-1.0";
+ reg = <0x00001000 0x00000400>,
+ <0x00001460 0x00000020>,
+ <0x00001480 0x00000020>,
+ <0x000014A0 0x00000008>,
+ <0x00001400 0x00000020>,
+ <0x00001420 0x00000020>;
+ reg-names = "control_port", "rx_csr", "rx_desc", "rx_resp", "tx_csr", "tx_desc";
+ interrupt-parent = <&hps_0_arm_gic_0>;
+ interrupts = <0 43 4>, <0 42 4>;
+ interrupt-names = "rx_irq", "tx_irq";
+ rx-fifo-depth = <2048>;
+ tx-fifo-depth = <2048>;
+ max-frame-size = <1500>;
+ local-mac-address = [ 00 00 00 00 00 00 ];
+ phy-mode = "gmii";
+ altr,has-supplementary-unicast;
+ altr,has-hash-multicast-filter;
+ phy-handle = <&phy1>;
+ mdio {
+ compatible = "altr,tse-mdio";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ phy1: ethernet-phy@1 {
+ reg = <0x1>;
+ };
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/net/can/nxp,sja1000.yaml b/Documentation/devicetree/bindings/net/can/nxp,sja1000.yaml
index b1327c5b86cf..144a3785132c 100644
--- a/Documentation/devicetree/bindings/net/can/nxp,sja1000.yaml
+++ b/Documentation/devicetree/bindings/net/can/nxp,sja1000.yaml
@@ -30,8 +30,10 @@ properties:
clocks:
maxItems: 1
+ power-domains:
+ maxItems: 1
+
reg-io-width:
- $ref: /schemas/types.yaml#/definitions/uint32
description: I/O register width (in bytes) implemented by this device
default: 1
enum: [ 1, 2, 4 ]
@@ -105,6 +107,7 @@ allOf:
then:
required:
- clocks
+ - power-domains
unevaluatedProperties: false
@@ -129,4 +132,5 @@ examples:
reg-io-width = <4>;
interrupts = <GIC_SPI 95 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&sysctrl R9A06G032_HCLK_CAN0>;
+ power-domains = <&sysctrl>;
};
diff --git a/Documentation/devicetree/bindings/net/cortina,gemini-ethernet.yaml b/Documentation/devicetree/bindings/net/cortina,gemini-ethernet.yaml
index cc01b9b5752a..253b5d1407ee 100644
--- a/Documentation/devicetree/bindings/net/cortina,gemini-ethernet.yaml
+++ b/Documentation/devicetree/bindings/net/cortina,gemini-ethernet.yaml
@@ -37,6 +37,7 @@ properties:
patternProperties:
"^ethernet-port@[0-9]+$":
type: object
+ unevaluatedProperties: false
description: contains the resources for ethernet port
allOf:
- $ref: ethernet-controller.yaml#
diff --git a/Documentation/devicetree/bindings/net/dsa/ar9331.txt b/Documentation/devicetree/bindings/net/dsa/ar9331.txt
index 320607cbbb17..f824fdae0da2 100644
--- a/Documentation/devicetree/bindings/net/dsa/ar9331.txt
+++ b/Documentation/devicetree/bindings/net/dsa/ar9331.txt
@@ -76,7 +76,6 @@ eth1: ethernet@1a000000 {
switch_port0: port@0 {
reg = <0x0>;
- label = "cpu";
ethernet = <&eth1>;
phy-mode = "gmii";
diff --git a/Documentation/devicetree/bindings/net/dsa/arrow,xrs700x.yaml b/Documentation/devicetree/bindings/net/dsa/arrow,xrs700x.yaml
index 3f01b65f3b22..259a0c6547f3 100644
--- a/Documentation/devicetree/bindings/net/dsa/arrow,xrs700x.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/arrow,xrs700x.yaml
@@ -61,8 +61,9 @@ examples:
};
ethernet-port@3 {
reg = <3>;
- label = "cpu";
ethernet = <&fec1>;
+ phy-mode = "rgmii-id";
+
fixed-link {
speed = <1000>;
full-duplex;
diff --git a/Documentation/devicetree/bindings/net/dsa/brcm,b53.yaml b/Documentation/devicetree/bindings/net/dsa/brcm,b53.yaml
index 23114d691d2a..1219b830b1a4 100644
--- a/Documentation/devicetree/bindings/net/dsa/brcm,b53.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/brcm,b53.yaml
@@ -169,7 +169,6 @@ examples:
port@8 {
reg = <8>;
- label = "cpu";
phy-mode = "rgmii-txid";
ethernet = <&eth0>;
fixed-link {
@@ -252,8 +251,9 @@ examples:
port@8 {
ethernet = <&amac2>;
- label = "cpu";
reg = <8>;
+ phy-mode = "internal";
+
fixed-link {
speed = <1000>;
full-duplex;
diff --git a/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml b/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml
index 09317e16cb5d..10ad7e71097b 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml
@@ -76,6 +76,23 @@ properties:
required:
- reg
+# CPU and DSA ports must have phylink-compatible link descriptions
+if:
+ oneOf:
+ - required: [ ethernet ]
+ - required: [ link ]
+then:
+ allOf:
+ - required:
+ - phy-mode
+ - oneOf:
+ - required:
+ - fixed-link
+ - required:
+ - phy-handle
+ - required:
+ - managed
+
additionalProperties: true
...
diff --git a/Documentation/devicetree/bindings/net/dsa/hirschmann,hellcreek.yaml b/Documentation/devicetree/bindings/net/dsa/hirschmann,hellcreek.yaml
index 228683773151..73b774eadd0b 100644
--- a/Documentation/devicetree/bindings/net/dsa/hirschmann,hellcreek.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/hirschmann,hellcreek.yaml
@@ -91,8 +91,13 @@ examples:
port@0 {
reg = <0>;
- label = "cpu";
ethernet = <&gmac0>;
+ phy-mode = "mii";
+
+ fixed-link {
+ speed = <100>;
+ full-duplex;
+ };
};
port@2 {
diff --git a/Documentation/devicetree/bindings/net/dsa/lan9303.txt b/Documentation/devicetree/bindings/net/dsa/lan9303.txt
index 464d6bf87605..46a732087f5c 100644
--- a/Documentation/devicetree/bindings/net/dsa/lan9303.txt
+++ b/Documentation/devicetree/bindings/net/dsa/lan9303.txt
@@ -46,7 +46,6 @@ I2C managed mode:
port@0 { /* RMII fixed link to master */
reg = <0>;
- label = "cpu";
ethernet = <&master>;
};
@@ -83,7 +82,6 @@ MDIO managed mode:
port@0 {
reg = <0>;
- label = "cpu";
ethernet = <&master>;
};
diff --git a/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt b/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt
index e3829d3e480e..8bb1eff21cb1 100644
--- a/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt
+++ b/Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt
@@ -96,7 +96,6 @@ switch@e108000 {
port@6 {
reg = <0x6>;
- label = "cpu";
ethernet = <&eth0>;
};
};
diff --git a/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml b/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml
index 17ab6c69ecc7..f2e9ff3f580b 100644
--- a/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml
@@ -4,67 +4,92 @@
$id: http://devicetree.org/schemas/net/dsa/mediatek,mt7530.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
-title: Mediatek MT7530 Ethernet switch
+title: Mediatek MT7530 and MT7531 Ethernet Switches
maintainers:
- - Sean Wang <sean.wang@mediatek.com>
+ - Arınç ÜNAL <arinc.unal@arinc9.com>
- Landen Chao <Landen.Chao@mediatek.com>
- DENG Qingfang <dqfext@gmail.com>
+ - Sean Wang <sean.wang@mediatek.com>
description: |
- Port 5 of mt7530 and mt7621 switch is muxed between:
- 1. GMAC5: GMAC5 can interface with another external MAC or PHY.
- 2. PHY of port 0 or port 4: PHY interfaces with an external MAC like 2nd GMAC
- of the SOC. Used in many setups where port 0/4 becomes the WAN port.
- Note: On a MT7621 SOC with integrated switch: 2nd GMAC can only connected to
- GMAC5 when the gpios for RGMII2 (GPIO 22-33) are not used and not
- connected to external component!
-
- Port 5 modes/configurations:
- 1. Port 5 is disabled and isolated: An external phy can interface to the 2nd
- GMAC of the SOC.
- In the case of a build-in MT7530 switch, port 5 shares the RGMII bus with 2nd
- GMAC and an optional external phy. Mind the GPIO/pinctl settings of the SOC!
- 2. Port 5 is muxed to PHY of port 0/4: Port 0/4 interfaces with 2nd GMAC.
- It is a simple MAC to PHY interface, port 5 needs to be setup for xMII mode
- and RGMII delay.
- 3. Port 5 is muxed to GMAC5 and can interface to an external phy.
- Port 5 becomes an extra switch port.
- Only works on platform where external phy TX<->RX lines are swapped.
- Like in the Ubiquiti ER-X-SFP.
- 4. Port 5 is muxed to GMAC5 and interfaces with the 2nd GAMC as 2nd CPU port.
- Currently a 2nd CPU port is not supported by DSA code.
-
- Depending on how the external PHY is wired:
- 1. normal: The PHY can only connect to 2nd GMAC but not to the switch
- 2. swapped: RGMII TX, RX are swapped; external phy interface with the switch as
- a ethernet port. But can't interface to the 2nd GMAC.
-
- Based on the DT the port 5 mode is configured.
-
- Driver tries to lookup the phy-handle of the 2nd GMAC of the master device.
- When phy-handle matches PHY of port 0 or 4 then port 5 set-up as mode 2.
- phy-mode must be set, see also example 2 below!
- * mt7621: phy-mode = "rgmii-txid";
- * mt7623: phy-mode = "rgmii";
-
- CPU-Ports need a phy-mode property:
- Allowed values on mt7530 and mt7621:
- - "rgmii"
- - "trgmii"
- On mt7531:
- - "1000base-x"
- - "2500base-x"
- - "rgmii"
- - "sgmii"
+ There are two versions of MT7530, standalone and in a multi-chip module.
+
+ MT7530 is a part of the multi-chip module in MT7620AN, MT7620DA, MT7620DAN,
+ MT7620NN, MT7621AT, MT7621DAT, MT7621ST and MT7623AI SoCs.
+
+ MT7530 in MT7620AN, MT7620DA, MT7620DAN and MT7620NN SoCs has got 10/100 PHYs
+ and the switch registers are directly mapped into SoC's memory map rather than
+ using MDIO. The DSA driver currently doesn't support this.
+
+ There is only the standalone version of MT7531.
+
+ Port 5 on MT7530 has got various ways of configuration.
+
+ For standalone MT7530:
+
+ - Port 5 can be used as a CPU port.
+
+ - PHY 0 or 4 of the switch can be muxed to connect to the gmac of the SoC
+ which port 5 is wired to. Usually used for connecting the wan port
+ directly to the CPU to achieve 2 Gbps routing in total.
+
+ The driver looks up the reg on the ethernet-phy node which the phy-handle
+ property refers to on the gmac node to mux the specified phy.
+
+ The driver requires the gmac of the SoC to have "mediatek,eth-mac" as the
+ compatible string and the reg must be 1. So, for now, only gmac1 of an
+ MediaTek SoC can benefit this. Banana Pi BPI-R2 suits this.
+ Check out example 5 for a similar configuration.
+
+ - Port 5 can be wired to an external phy. Port 5 becomes a DSA slave.
+ Check out example 7 for a similar configuration.
+
+ For multi-chip module MT7530:
+
+ - Port 5 can be used as a CPU port.
+
+ - PHY 0 or 4 of the switch can be muxed to connect to gmac1 of the SoC.
+ Usually used for connecting the wan port directly to the CPU to achieve 2
+ Gbps routing in total.
+
+ The driver looks up the reg on the ethernet-phy node which the phy-handle
+ property refers to on the gmac node to mux the specified phy.
+
+ For the MT7621 SoCs, rgmii2 group must be claimed with rgmii2 function.
+ Check out example 5.
+
+ - In case of an external phy wired to gmac1 of the SoC, port 5 must not be
+ enabled.
+
+ In case of muxing PHY 0 or 4, the external phy must not be enabled.
+
+ For the MT7621 SoCs, rgmii2 group must be claimed with rgmii2 function.
+ Check out example 6.
+ - Port 5 can be muxed to an external phy. Port 5 becomes a DSA slave.
+ The external phy must be wired TX to TX to gmac1 of the SoC for this to
+ work. Ubiquiti EdgeRouter X SFP is wired this way.
+
+ Muxing PHY 0 or 4 won't work when the external phy is connected TX to TX.
+
+ For the MT7621 SoCs, rgmii2 group must be claimed with gpio function.
+ Check out example 7.
properties:
compatible:
- enum:
- - mediatek,mt7530
- - mediatek,mt7531
- - mediatek,mt7621
+ oneOf:
+ - description:
+ Standalone MT7530 and multi-chip module MT7530 in MT7623AI SoC
+ const: mediatek,mt7530
+
+ - description:
+ Standalone MT7531
+ const: mediatek,mt7531
+
+ - description:
+ Multi-chip module MT7530 in MT7621AT, MT7621DAT and MT7621ST SoCs
+ const: mediatek,mt7621
reg:
maxItems: 1
@@ -79,7 +104,14 @@ properties:
gpio-controller:
type: boolean
description:
- if defined, MT7530's LED controller will run on GPIO mode.
+ If defined, LED controller of the MT7530 switch will run on GPIO mode.
+
+ There are 15 controllable pins.
+ port 0 LED 0..2 as GPIO 0..2
+ port 1 LED 0..2 as GPIO 3..5
+ port 2 LED 0..2 as GPIO 6..8
+ port 3 LED 0..2 as GPIO 9..11
+ port 4 LED 0..2 as GPIO 12..14
"#interrupt-cells":
const: 1
@@ -92,17 +124,21 @@ properties:
io-supply:
description:
Phandle to the regulator node necessary for the I/O power.
- See Documentation/devicetree/bindings/regulator/mt6323-regulator.txt
- for details for the regulator setup on these boards.
+ See Documentation/devicetree/bindings/regulator/mt6323-regulator.txt for
+ details for the regulator setup on these boards.
mediatek,mcm:
type: boolean
description:
- if defined, indicates that either MT7530 is the part on multi-chip
- module belong to MT7623A has or the remotely standalone chip as the
- function MT7623N reference board provided for.
+ Used for MT7621AT, MT7621DAT, MT7621ST and MT7623AI SoCs which the MT7530
+ switch is a part of the multi-chip module.
reset-gpios:
+ description:
+ GPIO to reset the switch. Use this if mediatek,mcm is not used.
+ This property is optional because some boards share the reset line with
+ other components which makes it impossible to probe the switch if the
+ reset line is used.
maxItems: 1
reset-names:
@@ -110,8 +146,8 @@ properties:
resets:
description:
- Phandle pointing to the system reset controller with line index for
- the ethsys.
+ Phandle pointing to the system reset controller with line index for the
+ ethsys.
maxItems: 1
patternProperties:
@@ -128,31 +164,88 @@ patternProperties:
properties:
reg:
description:
- Port address described must be 5 or 6 for CPU port and from 0
- to 5 for user ports.
+ Port address described must be 5 or 6 for CPU port and from 0 to 5
+ for user ports.
allOf:
- $ref: dsa-port.yaml#
- if:
- properties:
- label:
- items:
- - const: cpu
+ required: [ ethernet ]
then:
- required:
- - reg
- - phy-mode
+ properties:
+ reg:
+ enum:
+ - 5
+ - 6
required:
- compatible
- reg
+$defs:
+ mt7530-dsa-port:
+ patternProperties:
+ "^(ethernet-)?ports$":
+ patternProperties:
+ "^(ethernet-)?port@[0-9]+$":
+ if:
+ required: [ ethernet ]
+ then:
+ if:
+ properties:
+ reg:
+ const: 5
+ then:
+ properties:
+ phy-mode:
+ enum:
+ - gmii
+ - mii
+ - rgmii
+ else:
+ properties:
+ phy-mode:
+ enum:
+ - rgmii
+ - trgmii
+
+ mt7531-dsa-port:
+ patternProperties:
+ "^(ethernet-)?ports$":
+ patternProperties:
+ "^(ethernet-)?port@[0-9]+$":
+ if:
+ required: [ ethernet ]
+ then:
+ if:
+ properties:
+ reg:
+ const: 5
+ then:
+ properties:
+ phy-mode:
+ enum:
+ - 1000base-x
+ - 2500base-x
+ - rgmii
+ - sgmii
+ else:
+ properties:
+ phy-mode:
+ enum:
+ - 1000base-x
+ - 2500base-x
+ - sgmii
+
allOf:
- - $ref: "dsa.yaml#"
+ - $ref: dsa.yaml#
- if:
required:
- mediatek,mcm
then:
+ properties:
+ reset-gpios: false
+
required:
- resets
- reset-names
@@ -163,52 +256,139 @@ allOf:
- if:
properties:
compatible:
- items:
- - const: mediatek,mt7530
+ const: mediatek,mt7530
then:
+ $ref: "#/$defs/mt7530-dsa-port"
required:
- core-supply
- io-supply
+ - if:
+ properties:
+ compatible:
+ const: mediatek,mt7531
+ then:
+ $ref: "#/$defs/mt7531-dsa-port"
+ properties:
+ gpio-controller: false
+ mediatek,mcm: false
+
+ - if:
+ properties:
+ compatible:
+ const: mediatek,mt7621
+ then:
+ $ref: "#/$defs/mt7530-dsa-port"
+ required:
+ - mediatek,mcm
+
unevaluatedProperties: false
examples:
+ # Example 1: Standalone MT7530
- |
#include <dt-bindings/gpio/gpio.h>
+
mdio {
#address-cells = <1>;
#size-cells = <0>;
- switch@0 {
+
+ switch@1f {
compatible = "mediatek,mt7530";
- reg = <0>;
+ reg = <0x1f>;
+
+ reset-gpios = <&pio 33 0>;
core-supply = <&mt6323_vpa_reg>;
io-supply = <&mt6323_vemc3v3_reg>;
- reset-gpios = <&pio 33 GPIO_ACTIVE_HIGH>;
ethernet-ports {
#address-cells = <1>;
#size-cells = <0>;
+
port@0 {
reg = <0>;
- label = "lan0";
+ label = "lan1";
};
port@1 {
reg = <1>;
- label = "lan1";
+ label = "lan2";
};
port@2 {
reg = <2>;
- label = "lan2";
+ label = "lan3";
};
port@3 {
reg = <3>;
+ label = "lan4";
+ };
+
+ port@4 {
+ reg = <4>;
+ label = "wan";
+ };
+
+ port@6 {
+ reg = <6>;
+ ethernet = <&gmac0>;
+ phy-mode = "rgmii";
+
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ pause;
+ };
+ };
+ };
+ };
+ };
+
+ # Example 2: MT7530 in MT7623AI SoC
+ - |
+ #include <dt-bindings/reset/mt2701-resets.h>
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ switch@1f {
+ compatible = "mediatek,mt7530";
+ reg = <0x1f>;
+
+ mediatek,mcm;
+ resets = <&ethsys MT2701_ETHSYS_MCM_RST>;
+ reset-names = "mcm";
+
+ core-supply = <&mt6323_vpa_reg>;
+ io-supply = <&mt6323_vemc3v3_reg>;
+
+ ethernet-ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+ label = "lan1";
+ };
+
+ port@1 {
+ reg = <1>;
+ label = "lan2";
+ };
+
+ port@2 {
+ reg = <2>;
label = "lan3";
};
+ port@3 {
+ reg = <3>;
+ label = "lan4";
+ };
+
port@4 {
reg = <4>;
label = "wan";
@@ -216,96 +396,226 @@ examples:
port@6 {
reg = <6>;
- label = "cpu";
ethernet = <&gmac0>;
phy-mode = "trgmii";
+
fixed-link {
speed = <1000>;
full-duplex;
+ pause;
};
};
};
};
};
+ # Example 3: Standalone MT7531
- |
- //Example 2: MT7621: Port 4 is WAN port: 2nd GMAC -> Port 5 -> PHY port 4.
+ #include <dt-bindings/gpio/gpio.h>
+ #include <dt-bindings/interrupt-controller/irq.h>
- ethernet {
+ mdio {
#address-cells = <1>;
#size-cells = <0>;
- gmac0: mac@0 {
- compatible = "mediatek,eth-mac";
+
+ switch@0 {
+ compatible = "mediatek,mt7531";
reg = <0>;
- phy-mode = "rgmii";
- fixed-link {
- speed = <1000>;
- full-duplex;
- pause;
+ reset-gpios = <&pio 54 0>;
+
+ interrupt-controller;
+ #interrupt-cells = <1>;
+ interrupt-parent = <&pio>;
+ interrupts = <53 IRQ_TYPE_LEVEL_HIGH>;
+
+ ethernet-ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+ label = "lan1";
+ };
+
+ port@1 {
+ reg = <1>;
+ label = "lan2";
+ };
+
+ port@2 {
+ reg = <2>;
+ label = "lan3";
+ };
+
+ port@3 {
+ reg = <3>;
+ label = "lan4";
+ };
+
+ port@4 {
+ reg = <4>;
+ label = "wan";
+ };
+
+ port@6 {
+ reg = <6>;
+ ethernet = <&gmac0>;
+ phy-mode = "2500base-x";
+
+ fixed-link {
+ speed = <2500>;
+ full-duplex;
+ pause;
+ };
+ };
+ };
+ };
+ };
+
+ # Example 4: MT7530 in MT7621AT, MT7621DAT and MT7621ST SoCs
+ - |
+ #include <dt-bindings/interrupt-controller/mips-gic.h>
+ #include <dt-bindings/reset/mt7621-reset.h>
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ switch@1f {
+ compatible = "mediatek,mt7621";
+ reg = <0x1f>;
+
+ mediatek,mcm;
+ resets = <&sysc MT7621_RST_MCM>;
+ reset-names = "mcm";
+
+ interrupt-controller;
+ #interrupt-cells = <1>;
+ interrupt-parent = <&gic>;
+ interrupts = <GIC_SHARED 23 IRQ_TYPE_LEVEL_HIGH>;
+
+ ethernet-ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+ label = "lan1";
+ };
+
+ port@1 {
+ reg = <1>;
+ label = "lan2";
+ };
+
+ port@2 {
+ reg = <2>;
+ label = "lan3";
+ };
+
+ port@3 {
+ reg = <3>;
+ label = "lan4";
+ };
+
+ port@4 {
+ reg = <4>;
+ label = "wan";
+ };
+
+ port@6 {
+ reg = <6>;
+ ethernet = <&gmac0>;
+ phy-mode = "trgmii";
+
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ pause;
+ };
+ };
};
};
+ };
- gmac1: mac@1 {
+ # Example 5: MT7621: mux MT7530's phy4 to SoC's gmac1
+ - |
+ #include <dt-bindings/interrupt-controller/mips-gic.h>
+ #include <dt-bindings/reset/mt7621-reset.h>
+
+ ethernet {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&rgmii2_pins>;
+
+ mac@1 {
compatible = "mediatek,eth-mac";
reg = <1>;
- phy-mode = "rgmii-txid";
- phy-handle = <&phy4>;
+
+ phy-mode = "rgmii";
+ phy-handle = <&example5_ethphy4>;
};
- mdio: mdio-bus {
+ mdio {
#address-cells = <1>;
#size-cells = <0>;
- /* Internal phy */
- phy4: ethernet-phy@4 {
+ /* MT7530's phy4 */
+ example5_ethphy4: ethernet-phy@4 {
reg = <4>;
};
- mt7530: switch@1f {
+ switch@1f {
compatible = "mediatek,mt7621";
reg = <0x1f>;
- mediatek,mcm;
- resets = <&rstctrl 2>;
+ mediatek,mcm;
+ resets = <&sysc MT7621_RST_MCM>;
reset-names = "mcm";
+ interrupt-controller;
+ #interrupt-cells = <1>;
+ interrupt-parent = <&gic>;
+ interrupts = <GIC_SHARED 23 IRQ_TYPE_LEVEL_HIGH>;
+
ethernet-ports {
#address-cells = <1>;
#size-cells = <0>;
port@0 {
reg = <0>;
- label = "lan0";
+ label = "lan1";
};
port@1 {
reg = <1>;
- label = "lan1";
+ label = "lan2";
};
port@2 {
reg = <2>;
- label = "lan2";
+ label = "lan3";
};
port@3 {
reg = <3>;
- label = "lan3";
+ label = "lan4";
};
- /* Commented out. Port 4 is handled by 2nd GMAC.
+ /* Commented out, phy4 is muxed to gmac1.
port@4 {
reg = <4>;
- label = "lan4";
+ label = "wan";
};
*/
port@6 {
reg = <6>;
- label = "cpu";
ethernet = <&gmac0>;
- phy-mode = "rgmii";
+ phy-mode = "trgmii";
fixed-link {
speed = <1000>;
@@ -318,82 +628,169 @@ examples:
};
};
+ # Example 6: MT7621: mux external phy to SoC's gmac1
- |
- //Example 3: MT7621: Port 5 is connected to external PHY: Port 5 -> external PHY.
+ #include <dt-bindings/interrupt-controller/mips-gic.h>
+ #include <dt-bindings/reset/mt7621-reset.h>
ethernet {
#address-cells = <1>;
#size-cells = <0>;
- gmac_0: mac@0 {
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&rgmii2_pins>;
+
+ mac@1 {
compatible = "mediatek,eth-mac";
- reg = <0>;
+ reg = <1>;
+
phy-mode = "rgmii";
+ phy-handle = <&example6_ethphy7>;
+ };
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
- fixed-link {
- speed = <1000>;
- full-duplex;
- pause;
+ /* External PHY */
+ example6_ethphy7: ethernet-phy@7 {
+ reg = <7>;
+ phy-mode = "rgmii";
+ };
+
+ switch@1f {
+ compatible = "mediatek,mt7621";
+ reg = <0x1f>;
+
+ mediatek,mcm;
+ resets = <&sysc MT7621_RST_MCM>;
+ reset-names = "mcm";
+
+ interrupt-controller;
+ #interrupt-cells = <1>;
+ interrupt-parent = <&gic>;
+ interrupts = <GIC_SHARED 23 IRQ_TYPE_LEVEL_HIGH>;
+
+ ethernet-ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+ label = "lan1";
+ };
+
+ port@1 {
+ reg = <1>;
+ label = "lan2";
+ };
+
+ port@2 {
+ reg = <2>;
+ label = "lan3";
+ };
+
+ port@3 {
+ reg = <3>;
+ label = "lan4";
+ };
+
+ port@4 {
+ reg = <4>;
+ label = "wan";
+ };
+
+ port@6 {
+ reg = <6>;
+ ethernet = <&gmac0>;
+ phy-mode = "trgmii";
+
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ pause;
+ };
+ };
+ };
};
};
+ };
+
+ # Example 7: MT7621: mux external phy to MT7530's port 5
+ - |
+ #include <dt-bindings/interrupt-controller/mips-gic.h>
+ #include <dt-bindings/reset/mt7621-reset.h>
+
+ ethernet {
+ #address-cells = <1>;
+ #size-cells = <0>;
- mdio0: mdio-bus {
+ pinctrl-names = "default";
+ pinctrl-0 = <&rgmii2_pins>;
+
+ mdio {
#address-cells = <1>;
#size-cells = <0>;
- /* External phy */
- ephy5: ethernet-phy@7 {
+ /* External PHY */
+ example7_ethphy7: ethernet-phy@7 {
reg = <7>;
+ phy-mode = "rgmii";
};
switch@1f {
compatible = "mediatek,mt7621";
reg = <0x1f>;
- mediatek,mcm;
- resets = <&rstctrl 2>;
+ mediatek,mcm;
+ resets = <&sysc MT7621_RST_MCM>;
reset-names = "mcm";
+ interrupt-controller;
+ #interrupt-cells = <1>;
+ interrupt-parent = <&gic>;
+ interrupts = <GIC_SHARED 23 IRQ_TYPE_LEVEL_HIGH>;
+
ethernet-ports {
#address-cells = <1>;
#size-cells = <0>;
port@0 {
reg = <0>;
- label = "lan0";
+ label = "lan1";
};
port@1 {
reg = <1>;
- label = "lan1";
+ label = "lan2";
};
port@2 {
reg = <2>;
- label = "lan2";
+ label = "lan3";
};
port@3 {
reg = <3>;
- label = "lan3";
+ label = "lan4";
};
port@4 {
reg = <4>;
- label = "lan4";
+ label = "wan";
};
port@5 {
reg = <5>;
- label = "lan5";
- phy-mode = "rgmii";
- phy-handle = <&ephy5>;
+ label = "extphy";
+ phy-mode = "rgmii-txid";
+ phy-handle = <&example7_ethphy7>;
};
- cpu_port0: port@6 {
+ port@6 {
reg = <6>;
- label = "cpu";
- ethernet = <&gmac_0>;
- phy-mode = "rgmii";
+ ethernet = <&gmac0>;
+ phy-mode = "trgmii";
fixed-link {
speed = <1000>;
diff --git a/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml b/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml
index 6bbd8145b6c1..4da75b1f9533 100644
--- a/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml
@@ -107,8 +107,9 @@ examples:
};
port@5 {
reg = <5>;
- label = "cpu";
ethernet = <&eth0>;
+ phy-mode = "rgmii";
+
fixed-link {
speed = <1000>;
full-duplex;
@@ -144,8 +145,9 @@ examples:
};
port@6 {
reg = <6>;
- label = "cpu";
ethernet = <&eth0>;
+ phy-mode = "rgmii";
+
fixed-link {
speed = <1000>;
full-duplex;
diff --git a/Documentation/devicetree/bindings/net/dsa/mscc,ocelot.yaml b/Documentation/devicetree/bindings/net/dsa/mscc,ocelot.yaml
new file mode 100644
index 000000000000..8d93ed9c172c
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/dsa/mscc,ocelot.yaml
@@ -0,0 +1,260 @@
+# SPDX-License-Identifier: (GPL-2.0 OR MIT)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/dsa/mscc,ocelot.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Microchip Ocelot Switch Family Device Tree Bindings
+
+maintainers:
+ - Vladimir Oltean <vladimir.oltean@nxp.com>
+ - Claudiu Manoil <claudiu.manoil@nxp.com>
+ - Alexandre Belloni <alexandre.belloni@bootlin.com>
+ - UNGLinuxDriver@microchip.com
+
+description: |
+ There are multiple switches which are either part of the Ocelot-1 family, or
+ derivatives of this architecture. These switches can be found embedded in
+ various SoCs and accessed using MMIO, or as discrete chips and accessed over
+ SPI or PCIe. The present DSA binding shall be used when the host controlling
+ them performs packet I/O primarily through an Ethernet port of the switch
+ (which is attached to an Ethernet port of the host), rather than through
+ Frame DMA or register-based I/O.
+
+ VSC9953 (Seville):
+
+ This is found in the NXP T1040, where it is a memory-mapped platform
+ device.
+
+ The following PHY interface types are supported:
+
+ - phy-mode = "internal": on ports 8 and 9
+ - phy-mode = "sgmii": on ports 0, 1, 2, 3, 4, 5, 6, 7
+ - phy-mode = "qsgmii": on ports 0, 1, 2, 3, 4, 5, 6, 7
+ - phy-mode = "1000base-x": on ports 0, 1, 2, 3, 4, 5, 6, 7
+
+ VSC9959 (Felix):
+
+ This is found in the NXP LS1028A. It is a PCI device, part of the larger
+ enetc root complex. As a result, the ethernet-switch node is a sub-node of
+ the PCIe root complex node and its "reg" property conforms to the parent
+ node bindings, describing it as PF 5 of device 0, bus 0.
+
+ If any external switch port is enabled, the enetc PF2 (enetc_port2) should
+ be enabled as well. This is because the internal MDIO bus (exposed through
+ EA BAR 0) used to access the MAC PCS registers truly belongs to the enetc
+ port 2 and not to Felix.
+
+ The following PHY interface types are supported:
+
+ - phy-mode = "internal": on ports 4 and 5
+ - phy-mode = "sgmii": on ports 0, 1, 2, 3
+ - phy-mode = "qsgmii": on ports 0, 1, 2, 3
+ - phy-mode = "usxgmii": on ports 0, 1, 2, 3
+ - phy-mode = "1000base-x": on ports 0, 1, 2, 3
+ - phy-mode = "2500base-x": on ports 0, 1, 2, 3
+
+properties:
+ compatible:
+ enum:
+ - mscc,vsc9953-switch
+ - pci1957,eef0
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ description:
+ Used to signal availability of PTP TX timestamps, and state changes of
+ the MAC merge layer of ports that support Frame Preemption.
+
+ little-endian: true
+ big-endian: true
+
+required:
+ - compatible
+ - reg
+
+allOf:
+ - $ref: dsa.yaml#
+ - if:
+ properties:
+ compatible:
+ const: pci1957,eef0
+ then:
+ required:
+ - interrupts
+
+unevaluatedProperties: false
+
+examples:
+ # Felix VSC9959 (NXP LS1028A)
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+ pcie { /* Integrated Endpoint Root Complex */
+ #address-cells = <3>;
+ #size-cells = <2>;
+
+ ethernet-switch@0,5 {
+ compatible = "pci1957,eef0";
+ reg = <0x000500 0 0 0 0>;
+ interrupts = <GIC_SPI 95 IRQ_TYPE_LEVEL_HIGH>;
+
+ ethernet-ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+ phy-mode = "qsgmii";
+ phy-handle = <&phy0>;
+ managed = "in-band-status";
+ };
+
+ port@1 {
+ reg = <1>;
+ phy-mode = "qsgmii";
+ phy-handle = <&phy1>;
+ managed = "in-band-status";
+ };
+
+ port@2 {
+ reg = <2>;
+ phy-mode = "qsgmii";
+ phy-handle = <&phy2>;
+ managed = "in-band-status";
+ };
+
+ port@3 {
+ reg = <3>;
+ phy-mode = "qsgmii";
+ phy-handle = <&phy3>;
+ managed = "in-band-status";
+ };
+
+ port@4 {
+ reg = <4>;
+ ethernet = <&enetc_port2>;
+ phy-mode = "internal";
+
+ fixed-link {
+ speed = <2500>;
+ full-duplex;
+ pause;
+ };
+ };
+
+ port@5 {
+ reg = <5>;
+ ethernet = <&enetc_port3>;
+ phy-mode = "internal";
+
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ pause;
+ };
+ };
+ };
+ };
+ };
+ # Seville VSC9953 (NXP T1040)
+ - |
+ soc {
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ ethernet-switch@800000 {
+ compatible = "mscc,vsc9953-switch";
+ reg = <0x800000 0x290000>;
+ little-endian;
+
+ ethernet-ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+ phy-mode = "qsgmii";
+ phy-handle = <&phy0>;
+ managed = "in-band-status";
+ };
+
+ port@1 {
+ reg = <1>;
+ phy-mode = "qsgmii";
+ phy-handle = <&phy1>;
+ managed = "in-band-status";
+ };
+
+ port@2 {
+ reg = <2>;
+ phy-mode = "qsgmii";
+ phy-handle = <&phy2>;
+ managed = "in-band-status";
+ };
+
+ port@3 {
+ reg = <3>;
+ phy-mode = "qsgmii";
+ phy-handle = <&phy3>;
+ managed = "in-band-status";
+ };
+
+ port@4 {
+ reg = <4>;
+ phy-mode = "qsgmii";
+ phy-handle = <&phy4>;
+ managed = "in-band-status";
+ };
+
+ port@5 {
+ reg = <5>;
+ phy-mode = "qsgmii";
+ phy-handle = <&phy5>;
+ managed = "in-band-status";
+ };
+
+ port@6 {
+ reg = <6>;
+ phy-mode = "qsgmii";
+ phy-handle = <&phy6>;
+ managed = "in-band-status";
+ };
+
+ port@7 {
+ reg = <7>;
+ phy-mode = "qsgmii";
+ phy-handle = <&phy7>;
+ managed = "in-band-status";
+ };
+
+ port@8 {
+ reg = <8>;
+ phy-mode = "internal";
+ ethernet = <&enet0>;
+
+ fixed-link {
+ speed = <2500>;
+ full-duplex;
+ pause;
+ };
+ };
+
+ port@9 {
+ reg = <9>;
+ phy-mode = "internal";
+ ethernet = <&enet1>;
+
+ fixed-link {
+ speed = <2500>;
+ full-duplex;
+ pause;
+ };
+ };
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/dsa/ocelot.txt b/Documentation/devicetree/bindings/net/dsa/ocelot.txt
deleted file mode 100644
index 7a271d070b72..000000000000
--- a/Documentation/devicetree/bindings/net/dsa/ocelot.txt
+++ /dev/null
@@ -1,213 +0,0 @@
-Microchip Ocelot switch driver family
-=====================================
-
-Felix
------
-
-Currently the switches supported by the felix driver are:
-
-- VSC9959 (Felix)
-- VSC9953 (Seville)
-
-The VSC9959 switch is found in the NXP LS1028A. It is a PCI device, part of the
-larger ENETC root complex. As a result, the ethernet-switch node is a sub-node
-of the PCIe root complex node and its "reg" property conforms to the parent
-node bindings:
-
-* reg: Specifies PCIe Device Number and Function Number of the endpoint device,
- in this case for the Ethernet L2Switch it is PF5 (of device 0, bus 0).
-
-It does not require a "compatible" string.
-
-The interrupt line is used to signal availability of PTP TX timestamps and for
-TSN frame preemption.
-
-For the external switch ports, depending on board configuration, "phy-mode" and
-"phy-handle" are populated by board specific device tree instances. Ports 4 and
-5 are fixed as internal ports in the NXP LS1028A instantiation.
-
-The CPU port property ("ethernet") configures the feature called "NPI port" in
-the Ocelot hardware core. The CPU port in Ocelot is a set of queues, which are
-connected, in the Node Processor Interface (NPI) mode, to an Ethernet port.
-By default, in fsl-ls1028a.dtsi, the NPI port is assigned to the internal
-2.5Gbps port@4, but can be moved to the 1Gbps port@5, depending on the specific
-use case. Moving the NPI port to an external switch port is hardware possible,
-but there is no platform support for the Linux system on the LS1028A chip to
-operate as an entire slave DSA chip. NPI functionality (and therefore DSA
-tagging) is supported on a single port at a time.
-
-Any port can be disabled (and in fsl-ls1028a.dtsi, they are indeed all disabled
-by default, and should be enabled on a per-board basis). But if any external
-switch port is enabled at all, the ENETC PF2 (enetc_port2) should be enabled as
-well, regardless of whether it is configured as the DSA master or not. This is
-because the Felix PHYLINK implementation accesses the MAC PCS registers, which
-in hardware truly belong to the ENETC port #2 and not to Felix.
-
-Supported PHY interface types (appropriate SerDes protocol setting changes are
-needed in the RCW binary):
-
-* phy_mode = "internal": on ports 4 and 5
-* phy_mode = "sgmii": on ports 0, 1, 2, 3
-* phy_mode = "qsgmii": on ports 0, 1, 2, 3
-* phy_mode = "usxgmii": on ports 0, 1, 2, 3
-* phy_mode = "2500base-x": on ports 0, 1, 2, 3
-
-For the rest of the device tree binding definitions, which are standard DSA and
-PCI, refer to the following documents:
-
-Documentation/devicetree/bindings/net/dsa/dsa.txt
-Documentation/devicetree/bindings/pci/pci.txt
-
-Example:
-
-&soc {
- pcie@1f0000000 { /* Integrated Endpoint Root Complex */
- ethernet-switch@0,5 {
- reg = <0x000500 0 0 0 0>;
- /* IEP INT_B */
- interrupts = <GIC_SPI 95 IRQ_TYPE_LEVEL_HIGH>;
-
- ports {
- #address-cells = <1>;
- #size-cells = <0>;
-
- /* External ports */
- port@0 {
- reg = <0>;
- label = "swp0";
- };
-
- port@1 {
- reg = <1>;
- label = "swp1";
- };
-
- port@2 {
- reg = <2>;
- label = "swp2";
- };
-
- port@3 {
- reg = <3>;
- label = "swp3";
- };
-
- /* Tagging CPU port */
- port@4 {
- reg = <4>;
- ethernet = <&enetc_port2>;
- phy-mode = "internal";
-
- fixed-link {
- speed = <2500>;
- full-duplex;
- };
- };
-
- /* Non-tagging CPU port */
- port@5 {
- reg = <5>;
- phy-mode = "internal";
- status = "disabled";
-
- fixed-link {
- speed = <1000>;
- full-duplex;
- };
- };
- };
- };
- };
-};
-
-The VSC9953 switch is found inside NXP T1040. It is a platform device with the
-following required properties:
-
-- compatible:
- Must be "mscc,vsc9953-switch".
-
-Supported PHY interface types (appropriate SerDes protocol setting changes are
-needed in the RCW binary):
-
-* phy_mode = "internal": on ports 8 and 9
-* phy_mode = "sgmii": on ports 0, 1, 2, 3, 4, 5, 6, 7
-* phy_mode = "qsgmii": on ports 0, 1, 2, 3, 4, 5, 6, 7
-
-Example:
-
-&soc {
- ethernet-switch@800000 {
- #address-cells = <0x1>;
- #size-cells = <0x0>;
- compatible = "mscc,vsc9953-switch";
- little-endian;
- reg = <0x800000 0x290000>;
-
- ports {
- #address-cells = <0x1>;
- #size-cells = <0x0>;
-
- port@0 {
- reg = <0x0>;
- label = "swp0";
- };
-
- port@1 {
- reg = <0x1>;
- label = "swp1";
- };
-
- port@2 {
- reg = <0x2>;
- label = "swp2";
- };
-
- port@3 {
- reg = <0x3>;
- label = "swp3";
- };
-
- port@4 {
- reg = <0x4>;
- label = "swp4";
- };
-
- port@5 {
- reg = <0x5>;
- label = "swp5";
- };
-
- port@6 {
- reg = <0x6>;
- label = "swp6";
- };
-
- port@7 {
- reg = <0x7>;
- label = "swp7";
- };
-
- port@8 {
- reg = <0x8>;
- phy-mode = "internal";
- ethernet = <&enet0>;
-
- fixed-link {
- speed = <2500>;
- full-duplex;
- };
- };
-
- port@9 {
- reg = <0x9>;
- phy-mode = "internal";
- status = "disabled";
-
- fixed-link {
- speed = <2500>;
- full-duplex;
- };
- };
- };
- };
-};
diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.yaml b/Documentation/devicetree/bindings/net/dsa/qca8k.yaml
index f3c88371d76c..978162df51f7 100644
--- a/Documentation/devicetree/bindings/net/dsa/qca8k.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/qca8k.yaml
@@ -159,7 +159,6 @@ examples:
port@0 {
reg = <0>;
- label = "cpu";
ethernet = <&gmac1>;
phy-mode = "rgmii";
@@ -221,7 +220,6 @@ examples:
port@0 {
reg = <0>;
- label = "cpu";
ethernet = <&gmac1>;
phy-mode = "rgmii";
@@ -268,7 +266,6 @@ examples:
port@6 {
reg = <0>;
- label = "cpu";
ethernet = <&gmac1>;
phy-mode = "sgmii";
diff --git a/Documentation/devicetree/bindings/net/dsa/realtek.yaml b/Documentation/devicetree/bindings/net/dsa/realtek.yaml
index 4f99aff029dc..1a7d45a8ad66 100644
--- a/Documentation/devicetree/bindings/net/dsa/realtek.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/realtek.yaml
@@ -189,7 +189,6 @@ examples:
};
port@5 {
reg = <5>;
- label = "cpu";
ethernet = <&gmac0>;
phy-mode = "rgmii";
fixed-link {
@@ -277,7 +276,6 @@ examples:
};
port@6 {
reg = <6>;
- label = "cpu";
ethernet = <&fec1>;
phy-mode = "rgmii";
tx-internal-delay-ps = <2000>;
diff --git a/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml b/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml
index 4d428f5ad044..7ca9c19a157c 100644
--- a/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml
@@ -130,7 +130,8 @@ examples:
port@4 {
reg = <4>;
ethernet = <&gmac2>;
- label = "cpu";
+ phy-mode = "internal";
+
fixed-link {
speed = <1000>;
full-duplex;
diff --git a/Documentation/devicetree/bindings/net/dsa/vitesse,vsc73xx.txt b/Documentation/devicetree/bindings/net/dsa/vitesse,vsc73xx.txt
index bbf4a13f6d75..258bef483673 100644
--- a/Documentation/devicetree/bindings/net/dsa/vitesse,vsc73xx.txt
+++ b/Documentation/devicetree/bindings/net/dsa/vitesse,vsc73xx.txt
@@ -75,7 +75,6 @@ switch@0 {
};
vsc: port@6 {
reg = <6>;
- label = "cpu";
ethernet = <&gmac1>;
phy-mode = "rgmii";
fixed-link {
@@ -117,7 +116,6 @@ switch@2,0 {
};
vsc: port@6 {
reg = <6>;
- label = "cpu";
ethernet = <&enet0>;
phy-mode = "rgmii";
fixed-link {
diff --git a/Documentation/devicetree/bindings/net/engleder,tsnep.yaml b/Documentation/devicetree/bindings/net/engleder,tsnep.yaml
index d0e1476e15b5..5bd964a46a9d 100644
--- a/Documentation/devicetree/bindings/net/engleder,tsnep.yaml
+++ b/Documentation/devicetree/bindings/net/engleder,tsnep.yaml
@@ -20,7 +20,26 @@ properties:
maxItems: 1
interrupts:
- maxItems: 1
+ minItems: 1
+ maxItems: 8
+
+ interrupt-names:
+ minItems: 1
+ items:
+ - const: mac
+ - const: txrx-1
+ - const: txrx-2
+ - const: txrx-3
+ - const: txrx-4
+ - const: txrx-5
+ - const: txrx-6
+ - const: txrx-7
+ description:
+ The main interrupt for basic MAC features and the first TX/RX queue pair
+ is named "mac". "txrx-[1-7]" are the interrupts for additional TX/RX
+ queue pairs.
+
+ dma-coherent: true
local-mac-address: true
@@ -58,7 +77,7 @@ examples:
axi {
#address-cells = <2>;
#size-cells = <2>;
- tnsep0: ethernet@a0000000 {
+ tsnep0: ethernet@a0000000 {
compatible = "engleder,tsnep";
reg = <0x0 0xa0000000 0x0 0x10000>;
interrupts = <0 89 1>;
@@ -76,4 +95,24 @@ examples:
};
};
};
+
+ tsnep1: ethernet@a0010000 {
+ compatible = "engleder,tsnep";
+ reg = <0x0 0xa0010000 0x0 0x10000>;
+ interrupts = <0 93 1>, <0 94 1>, <0 95 1>, <0 96 1>;
+ interrupt-names = "mac", "txrx-1", "txrx-2", "txrx-3";
+ interrupt-parent = <&gic>;
+ local-mac-address = [00 00 00 00 00 00];
+ phy-mode = "rgmii";
+ phy-handle = <&phy1>;
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ suppress-preamble;
+ phy1: ethernet-phy@1 {
+ reg = <1>;
+ rxc-skew-ps = <1080>;
+ };
+ };
+ };
};
diff --git a/Documentation/devicetree/bindings/net/ethernet-controller.yaml b/Documentation/devicetree/bindings/net/ethernet-controller.yaml
index c138a1022879..4b3c590fcebf 100644
--- a/Documentation/devicetree/bindings/net/ethernet-controller.yaml
+++ b/Documentation/devicetree/bindings/net/ethernet-controller.yaml
@@ -67,6 +67,7 @@ properties:
- gmii
- sgmii
- qsgmii
+ - qusgmii
- tbi
- rev-mii
- rmii
diff --git a/Documentation/devicetree/bindings/net/ethernet-phy.yaml b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
index ed1415a4381f..ad808e9ce5b9 100644
--- a/Documentation/devicetree/bindings/net/ethernet-phy.yaml
+++ b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
@@ -144,6 +144,12 @@ properties:
Mark the corresponding energy efficient ethernet mode as
broken and request the ethernet to stop advertising it.
+ pses:
+ $ref: /schemas/types.yaml#/definitions/phandle-array
+ maxItems: 1
+ description:
+ Specifies a reference to a node representing a Power Sourcing Equipment.
+
phy-is-integrated:
$ref: /schemas/types.yaml#/definitions/flag
description:
diff --git a/Documentation/devicetree/bindings/net/fsl,fec.yaml b/Documentation/devicetree/bindings/net/fsl,fec.yaml
index 5cfb661be124..e0f376f7e274 100644
--- a/Documentation/devicetree/bindings/net/fsl,fec.yaml
+++ b/Documentation/devicetree/bindings/net/fsl,fec.yaml
@@ -21,6 +21,7 @@ properties:
- fsl,imx28-fec
- fsl,imx6q-fec
- fsl,mvf600-fec
+ - fsl,s32v234-fec
- items:
- enum:
- fsl,imx53-fec
diff --git a/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml b/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml
new file mode 100644
index 000000000000..3a35ac1c260d
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml
@@ -0,0 +1,145 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/fsl,fman-dtsec.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: NXP FMan MAC
+
+maintainers:
+ - Madalin Bucur <madalin.bucur@nxp.com>
+
+description: |
+ Each FMan has several MACs, each implementing an Ethernet interface. Earlier
+ versions of FMan used the Datapath Three Speed Ethernet Controller (dTSEC) for
+ 10/100/1000 MBit/s speeds, and the 10-Gigabit Ethernet Media Access Controller
+ (10GEC) for 10 Gbit/s speeds. Later versions of FMan use the Multirate
+ Ethernet Media Access Controller (mEMAC) to handle all speeds.
+
+properties:
+ compatible:
+ enum:
+ - fsl,fman-dtsec
+ - fsl,fman-xgec
+ - fsl,fman-memac
+
+ cell-index:
+ maximum: 64
+ description: |
+ FManV2:
+ register[bit] MAC cell-index
+ ============================================================
+ FM_EPI[16] XGEC 8
+ FM_EPI[16+n] dTSECn n-1
+ FM_NPI[11+n] dTSECn n-1
+ n = 1,..,5
+
+ FManV3:
+ register[bit] MAC cell-index
+ ============================================================
+ FM_EPI[16+n] mEMACn n-1
+ FM_EPI[25] mEMAC10 9
+
+ FM_NPI[11+n] mEMACn n-1
+ FM_NPI[10] mEMAC10 9
+ FM_NPI[11] mEMAC9 8
+ n = 1,..8
+
+ FM_EPI and FM_NPI are located in the FMan memory map.
+
+ 2. SoC registers:
+
+ - P2041, P3041, P4080 P5020, P5040:
+ register[bit] FMan MAC cell
+ Unit index
+ ============================================================
+ DCFG_DEVDISR2[7] 1 XGEC 8
+ DCFG_DEVDISR2[7+n] 1 dTSECn n-1
+ DCFG_DEVDISR2[15] 2 XGEC 8
+ DCFG_DEVDISR2[15+n] 2 dTSECn n-1
+ n = 1,..5
+
+ - T1040, T2080, T4240, B4860:
+ register[bit] FMan MAC cell
+ Unit index
+ ============================================================
+ DCFG_CCSR_DEVDISR2[n-1] 1 mEMACn n-1
+ DCFG_CCSR_DEVDISR2[11+n] 2 mEMACn n-1
+ n = 1,..6,9,10
+
+ EVDISR, DCFG_DEVDISR2 and DCFG_CCSR_DEVDISR2 are located in
+ the specific SoC "Device Configuration/Pin Control" Memory
+ Map.
+
+ reg:
+ maxItems: 1
+
+ fsl,fman-ports:
+ $ref: /schemas/types.yaml#/definitions/phandle-array
+ maxItems: 2
+ description: |
+ An array of two references: the first is the FMan RX port and the second
+ is the TX port used by this MAC.
+
+ ptp-timer:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description: A reference to the IEEE1588 timer
+
+ pcsphy-handle:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description: A reference to the PCS (typically found on the SerDes)
+
+ tbi-handle:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description: A reference to the (TBI-based) PCS
+
+required:
+ - compatible
+ - cell-index
+ - reg
+ - fsl,fman-ports
+ - ptp-timer
+
+allOf:
+ - $ref: ethernet-controller.yaml#
+ - if:
+ properties:
+ compatible:
+ contains:
+ const: fsl,fman-dtsec
+ then:
+ required:
+ - tbi-handle
+ - if:
+ properties:
+ compatible:
+ contains:
+ const: fsl,fman-memac
+ then:
+ required:
+ - pcsphy-handle
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ ethernet@e0000 {
+ compatible = "fsl,fman-dtsec";
+ cell-index = <0>;
+ reg = <0xe0000 0x1000>;
+ fsl,fman-ports = <&fman1_rx8 &fman1_tx28>;
+ ptp-timer = <&ptp_timer>;
+ tbi-handle = <&tbi0>;
+ };
+ - |
+ ethernet@e8000 {
+ cell-index = <4>;
+ compatible = "fsl,fman-memac";
+ reg = <0xe8000 0x1000>;
+ fsl,fman-ports = <&fman0_rx_0x0c &fman0_tx_0x2c>;
+ ptp-timer = <&ptp_timer0>;
+ pcsphy-handle = <&pcsphy4>;
+ phy-handle = <&sgmii_phy1>;
+ phy-connection-type = "sgmii";
+ };
+...
diff --git a/Documentation/devicetree/bindings/net/fsl-fman.txt b/Documentation/devicetree/bindings/net/fsl-fman.txt
index 801efc7d6818..b9055335db3b 100644
--- a/Documentation/devicetree/bindings/net/fsl-fman.txt
+++ b/Documentation/devicetree/bindings/net/fsl-fman.txt
@@ -232,133 +232,7 @@ port@81000 {
=============================================================================
FMan dTSEC/XGEC/mEMAC Node
-DESCRIPTION
-
-mEMAC/dTSEC/XGEC are the Ethernet network interfaces
-
-PROPERTIES
-
-- compatible
- Usage: required
- Value type: <stringlist>
- Definition: A standard property.
- Must include one of the following:
- - "fsl,fman-dtsec" for dTSEC MAC
- - "fsl,fman-xgec" for XGEC MAC
- - "fsl,fman-memac" for mEMAC MAC
-
-- cell-index
- Usage: required
- Value type: <u32>
- Definition: Specifies the MAC id.
-
- The cell-index value may be used by the FMan or the SoC, to
- identify the MAC unit in the FMan (or SoC) memory map.
- In the tables below there's a description of the cell-index
- use, there are two tables, one describes the use of cell-index
- by the FMan, the second describes the use by the SoC:
-
- 1. FMan Registers
-
- FManV2:
- register[bit] MAC cell-index
- ============================================================
- FM_EPI[16] XGEC 8
- FM_EPI[16+n] dTSECn n-1
- FM_NPI[11+n] dTSECn n-1
- n = 1,..,5
-
- FManV3:
- register[bit] MAC cell-index
- ============================================================
- FM_EPI[16+n] mEMACn n-1
- FM_EPI[25] mEMAC10 9
-
- FM_NPI[11+n] mEMACn n-1
- FM_NPI[10] mEMAC10 9
- FM_NPI[11] mEMAC9 8
- n = 1,..8
-
- FM_EPI and FM_NPI are located in the FMan memory map.
-
- 2. SoC registers:
-
- - P2041, P3041, P4080 P5020, P5040:
- register[bit] FMan MAC cell
- Unit index
- ============================================================
- DCFG_DEVDISR2[7] 1 XGEC 8
- DCFG_DEVDISR2[7+n] 1 dTSECn n-1
- DCFG_DEVDISR2[15] 2 XGEC 8
- DCFG_DEVDISR2[15+n] 2 dTSECn n-1
- n = 1,..5
-
- - T1040, T2080, T4240, B4860:
- register[bit] FMan MAC cell
- Unit index
- ============================================================
- DCFG_CCSR_DEVDISR2[n-1] 1 mEMACn n-1
- DCFG_CCSR_DEVDISR2[11+n] 2 mEMACn n-1
- n = 1,..6,9,10
-
- EVDISR, DCFG_DEVDISR2 and DCFG_CCSR_DEVDISR2 are located in
- the specific SoC "Device Configuration/Pin Control" Memory
- Map.
-
-- reg
- Usage: required
- Value type: <prop-encoded-array>
- Definition: A standard property.
-
-- fsl,fman-ports
- Usage: required
- Value type: <prop-encoded-array>
- Definition: An array of two phandles - the first references is
- the FMan RX port and the second is the TX port used by this
- MAC.
-
-- ptp-timer
- Usage required
- Value type: <phandle>
- Definition: A phandle for 1EEE1588 timer.
-
-- pcsphy-handle
- Usage required for "fsl,fman-memac" MACs
- Value type: <phandle>
- Definition: A phandle for pcsphy.
-
-- tbi-handle
- Usage required for "fsl,fman-dtsec" MACs
- Value type: <phandle>
- Definition: A phandle for tbiphy.
-
-EXAMPLE
-
-fman1_tx28: port@a8000 {
- cell-index = <0x28>;
- compatible = "fsl,fman-v2-port-tx";
- reg = <0xa8000 0x1000>;
-};
-
-fman1_rx8: port@88000 {
- cell-index = <0x8>;
- compatible = "fsl,fman-v2-port-rx";
- reg = <0x88000 0x1000>;
-};
-
-ptp-timer: ptp_timer@fe000 {
- compatible = "fsl,fman-ptp-timer";
- reg = <0xfe000 0x1000>;
-};
-
-ethernet@e0000 {
- compatible = "fsl,fman-dtsec";
- cell-index = <0>;
- reg = <0xe0000 0x1000>;
- fsl,fman-ports = <&fman1_rx8 &fman1_tx28>;
- ptp-timer = <&ptp-timer>;
- tbi-handle = <&tbi0>;
-};
+Refer to Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml
============================================================================
FMan IEEE 1588 Node
diff --git a/Documentation/devicetree/bindings/net/mediatek,mt7620-gsw.txt b/Documentation/devicetree/bindings/net/mediatek,mt7620-gsw.txt
deleted file mode 100644
index 358fed2fab43..000000000000
--- a/Documentation/devicetree/bindings/net/mediatek,mt7620-gsw.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-Mediatek Gigabit Switch
-=======================
-
-The mediatek gigabit switch can be found on Mediatek SoCs (mt7620, mt7621).
-
-Required properties:
-- compatible: Should be "mediatek,mt7620-gsw" or "mediatek,mt7621-gsw"
-- reg: Address and length of the register set for the device
-- interrupts: Should contain the gigabit switches interrupt
-- resets: Should contain the gigabit switches resets
-- reset-names: Should contain the reset names "gsw"
-
-Example:
-
-gsw@10110000 {
- compatible = "ralink,mt7620-gsw";
- reg = <0x10110000 8000>;
-
- resets = <&rstctrl 23>;
- reset-names = "gsw";
-
- interrupt-parent = <&intc>;
- interrupts = <17>;
-};
diff --git a/Documentation/devicetree/bindings/net/mediatek,net.yaml b/Documentation/devicetree/bindings/net/mediatek,net.yaml
index f5564ecddb62..7ef696204c5a 100644
--- a/Documentation/devicetree/bindings/net/mediatek,net.yaml
+++ b/Documentation/devicetree/bindings/net/mediatek,net.yaml
@@ -69,6 +69,15 @@ properties:
A list of phandle to the syscon node that handles the SGMII setup which is required for
those SoCs equipped with SGMII.
+ mediatek,wed:
+ $ref: /schemas/types.yaml#/definitions/phandle-array
+ minItems: 2
+ maxItems: 2
+ items:
+ maxItems: 1
+ description:
+ List of phandles to wireless ethernet dispatch nodes.
+
dma-coherent: true
mdio-bus:
@@ -112,6 +121,8 @@ allOf:
Phandle to the syscon node that handles the ports slew rate and
driver current.
+ mediatek,wed: false
+
- if:
properties:
compatible:
@@ -144,15 +155,6 @@ allOf:
minItems: 1
maxItems: 1
- mediatek,wed:
- $ref: /schemas/types.yaml#/definitions/phandle-array
- minItems: 2
- maxItems: 2
- items:
- maxItems: 1
- description:
- List of phandles to wireless ethernet dispatch nodes.
-
mediatek,pcie-mirror:
$ref: /schemas/types.yaml#/definitions/phandle
description:
@@ -202,6 +204,8 @@ allOf:
minItems: 2
maxItems: 2
+ mediatek,wed: false
+
- if:
properties:
compatible:
@@ -238,6 +242,11 @@ allOf:
minItems: 2
maxItems: 2
+ mediatek,wed-pcie:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description:
+ Phandle to the mediatek wed-pcie controller.
+
patternProperties:
"^mac@[0-1]$":
type: object
diff --git a/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml b/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml
index 61b2fb9e141b..0fa2132fa4f4 100644
--- a/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml
@@ -19,6 +19,7 @@ select:
contains:
enum:
- mediatek,mt2712-gmac
+ - mediatek,mt8188-gmac
- mediatek,mt8195-gmac
required:
- compatible
@@ -37,6 +38,11 @@ properties:
- enum:
- mediatek,mt8195-gmac
- const: snps,dwmac-5.10a
+ - items:
+ - enum:
+ - mediatek,mt8188-gmac
+ - const: mediatek,mt8195-gmac
+ - const: snps,dwmac-5.10a
clocks:
minItems: 5
@@ -74,7 +80,7 @@ properties:
or will round down. Range 0~31*170.
For MT2712 RMII/MII interface, Allowed value need to be a multiple of 550,
or will round down. Range 0~31*550.
- For MT8195 RGMII/RMII/MII interface, Allowed value need to be a multiple of 290,
+ For MT8188/MT8195 RGMII/RMII/MII interface, Allowed value need to be a multiple of 290,
or will round down. Range 0~31*290.
mediatek,rx-delay-ps:
@@ -84,7 +90,7 @@ properties:
or will round down. Range 0~31*170.
For MT2712 RMII/MII interface, Allowed value need to be a multiple of 550,
or will round down. Range 0~31*550.
- For MT8195 RGMII/RMII/MII interface, Allowed value need to be a multiple
+ For MT8188/MT8195 RGMII/RMII/MII interface, Allowed value need to be a multiple
of 290, or will round down. Range 0~31*290.
mediatek,rmii-rxc:
diff --git a/Documentation/devicetree/bindings/net/microchip,sparx5-switch.yaml b/Documentation/devicetree/bindings/net/microchip,sparx5-switch.yaml
index 6c86d3d85e99..57ffeb8fc876 100644
--- a/Documentation/devicetree/bindings/net/microchip,sparx5-switch.yaml
+++ b/Documentation/devicetree/bindings/net/microchip,sparx5-switch.yaml
@@ -74,16 +74,20 @@ properties:
ethernet-ports:
type: object
+ additionalProperties: false
+
+ properties:
+ '#address-cells':
+ const: 1
+ '#size-cells':
+ const: 0
+
patternProperties:
"^port@[0-9a-f]+$":
- type: object
+ $ref: /schemas/net/ethernet-controller.yaml#
+ unevaluatedProperties: false
properties:
- '#address-cells':
- const: 1
- '#size-cells':
- const: 0
-
reg:
description: Switch port number
@@ -93,29 +97,11 @@ properties:
phandle of a Ethernet SerDes PHY. This defines which SerDes
instance will handle the Ethernet traffic.
- phy-mode:
- description:
- This specifies the interface used by the Ethernet SerDes towards
- the PHY or SFP.
-
microchip,bandwidth:
description: Specifies bandwidth in Mbit/s allocated to the port.
$ref: "/schemas/types.yaml#/definitions/uint32"
maximum: 25000
- phy-handle:
- description:
- phandle of a Ethernet PHY. This is optional and if provided it
- points to the cuPHY used by the Ethernet SerDes.
-
- sfp:
- description:
- phandle of an SFP. This is optional and used when not specifying
- a cuPHY. It points to the SFP node that describes the SFP used by
- the Ethernet SerDes.
-
- managed: true
-
microchip,sd-sgpio:
description:
Index of the ports Signal Detect SGPIO in the set of 384 SGPIOs
@@ -144,8 +130,6 @@ required:
- reg-names
- interrupts
- interrupt-names
- - resets
- - reset-names
- ethernet-ports
additionalProperties: false
diff --git a/Documentation/devicetree/bindings/net/nfc/marvell,nci.yaml b/Documentation/devicetree/bindings/net/nfc/marvell,nci.yaml
index a191a04e681c..308485a8ee6c 100644
--- a/Documentation/devicetree/bindings/net/nfc/marvell,nci.yaml
+++ b/Documentation/devicetree/bindings/net/nfc/marvell,nci.yaml
@@ -128,7 +128,7 @@ examples:
i2c-int-rising;
- reset-n-io = <&gpio3 19 GPIO_ACTIVE_HIGH>;
+ reset-n-io = <&gpio3 19 GPIO_ACTIVE_LOW>;
};
};
@@ -151,7 +151,7 @@ examples:
interrupt-parent = <&gpio1>;
interrupts = <17 IRQ_TYPE_EDGE_RISING>;
- reset-n-io = <&gpio3 19 GPIO_ACTIVE_HIGH>;
+ reset-n-io = <&gpio3 19 GPIO_ACTIVE_LOW>;
};
};
@@ -162,7 +162,7 @@ examples:
nfc {
compatible = "marvell,nfc-uart";
- reset-n-io = <&gpio3 16 GPIO_ACTIVE_HIGH>;
+ reset-n-io = <&gpio3 16 GPIO_ACTIVE_LOW>;
hci-muxed;
flow-control;
diff --git a/Documentation/devicetree/bindings/net/nxp,tja11xx.yaml b/Documentation/devicetree/bindings/net/nxp,tja11xx.yaml
index d51da24f3505..ab8867e6939b 100644
--- a/Documentation/devicetree/bindings/net/nxp,tja11xx.yaml
+++ b/Documentation/devicetree/bindings/net/nxp,tja11xx.yaml
@@ -31,6 +31,22 @@ patternProperties:
description:
The ID number for the child PHY. Should be +1 of parent PHY.
+ nxp,rmii-refclk-in:
+ type: boolean
+ description: |
+ The REF_CLK is provided for both transmitted and received data
+ in RMII mode. This clock signal is provided by the PHY and is
+ typically derived from an external 25MHz crystal. Alternatively,
+ a 50MHz clock signal generated by an external oscillator can be
+ connected to pin REF_CLK. A third option is to connect a 25MHz
+ clock to pin CLK_IN_OUT. So, the REF_CLK should be configured
+ as input or output according to the actual circuit connection.
+ If present, indicates that the REF_CLK will be configured as
+ interface reference clock input when RMII mode enabled.
+ If not present, the REF_CLK will be configured as interface
+ reference clock output when RMII mode enabled.
+ Only supported on TJA1100 and TJA1101.
+
required:
- reg
@@ -44,6 +60,7 @@ examples:
tja1101_phy0: ethernet-phy@4 {
reg = <0x4>;
+ nxp,rmii-refclk-in;
};
};
- |
diff --git a/Documentation/devicetree/bindings/net/pse-pd/podl-pse-regulator.yaml b/Documentation/devicetree/bindings/net/pse-pd/podl-pse-regulator.yaml
new file mode 100644
index 000000000000..c6b1c188abf7
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/pse-pd/podl-pse-regulator.yaml
@@ -0,0 +1,40 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/pse-pd/podl-pse-regulator.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Regulator based Power Sourcing Equipment
+
+maintainers:
+ - Oleksij Rempel <o.rempel@pengutronix.de>
+
+description: Regulator based PoDL PSE controller. The device must be referenced
+ by the PHY node to control power injection to the Ethernet cable.
+
+allOf:
+ - $ref: "pse-controller.yaml#"
+
+properties:
+ compatible:
+ const: podl-pse-regulator
+
+ '#pse-cells':
+ const: 0
+
+ pse-supply:
+ description: Power supply for the PSE controller
+
+additionalProperties: false
+
+required:
+ - compatible
+ - pse-supply
+
+examples:
+ - |
+ ethernet-pse {
+ compatible = "podl-pse-regulator";
+ pse-supply = <&reg_t1l1>;
+ #pse-cells = <0>;
+ };
diff --git a/Documentation/devicetree/bindings/net/pse-pd/pse-controller.yaml b/Documentation/devicetree/bindings/net/pse-pd/pse-controller.yaml
new file mode 100644
index 000000000000..b110abb42597
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/pse-pd/pse-controller.yaml
@@ -0,0 +1,33 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/pse-pd/pse-controller.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Power Sourcing Equipment (PSE).
+
+description: Binding for the Power Sourcing Equipment (PSE) as defined in the
+ IEEE 802.3 specification. It is designed for hardware which is delivering
+ power over twisted pair/ethernet cable. The ethernet-pse nodes should be
+ used to describe PSE controller and referenced by the ethernet-phy node.
+
+maintainers:
+ - Oleksij Rempel <o.rempel@pengutronix.de>
+
+properties:
+ $nodename:
+ pattern: "^ethernet-pse(@.*)?$"
+
+ "#pse-cells":
+ description:
+ Used to uniquely identify a PSE instance within an IC. Will be
+ 0 on PSE nodes with only a single output and at least 1 on nodes
+ controlling several outputs.
+ enum: [0, 1]
+
+required:
+ - "#pse-cells"
+
+additionalProperties: true
+
+...
diff --git a/Documentation/devicetree/bindings/net/qca,ar803x.yaml b/Documentation/devicetree/bindings/net/qca,ar803x.yaml
index b3d4013b7ca6..161d28919316 100644
--- a/Documentation/devicetree/bindings/net/qca,ar803x.yaml
+++ b/Documentation/devicetree/bindings/net/qca,ar803x.yaml
@@ -40,6 +40,14 @@ properties:
Only supported on the AR8031.
type: boolean
+ qca,disable-hibernation-mode:
+ description: |
+ Disable Atheros AR803X PHYs hibernation mode. If present, indicates
+ that the hardware of PHY will not enter power saving mode when the
+ cable is disconnected. And the RX_CLK always keeps outputting a
+ valid clock.
+ type: boolean
+
qca,smarteee-tw-us-100m:
description: EEE Tw parameter for 100M links.
$ref: /schemas/types.yaml#/definitions/uint32
diff --git a/Documentation/devicetree/bindings/net/ralink,rt2880-net.txt b/Documentation/devicetree/bindings/net/ralink,rt2880-net.txt
deleted file mode 100644
index 9fe1a0a22e44..000000000000
--- a/Documentation/devicetree/bindings/net/ralink,rt2880-net.txt
+++ /dev/null
@@ -1,59 +0,0 @@
-Ralink Frame Engine Ethernet controller
-=======================================
-
-The Ralink frame engine ethernet controller can be found on Ralink and
-Mediatek SoCs (RT288x, RT3x5x, RT366x, RT388x, rt5350, mt7620, mt7621, mt76x8).
-
-Depending on the SoC, there is a number of ports connected to the CPU port
-directly and/or via a (gigabit-)switch.
-
-* Ethernet controller node
-
-Required properties:
-- compatible: Should be one of "ralink,rt2880-eth", "ralink,rt3050-eth",
- "ralink,rt3050-eth", "ralink,rt3883-eth", "ralink,rt5350-eth",
- "mediatek,mt7620-eth", "mediatek,mt7621-eth"
-- reg: Address and length of the register set for the device
-- interrupts: Should contain the frame engines interrupt
-- resets: Should contain the frame engines resets
-- reset-names: Should contain the reset names "fe". If a switch is present
- "esw" is also required.
-
-
-* Ethernet port node
-
-Required properties:
-- compatible: Should be "ralink,eth-port"
-- reg: The number of the physical port
-- phy-handle: reference to the node describing the phy
-
-Example:
-
-mdio-bus {
- ...
- phy0: ethernet-phy@0 {
- phy-mode = "mii";
- reg = <0>;
- };
-};
-
-ethernet@400000 {
- compatible = "ralink,rt2880-eth";
- reg = <0x00400000 10000>;
-
- #address-cells = <1>;
- #size-cells = <0>;
-
- resets = <&rstctrl 18>;
- reset-names = "fe";
-
- interrupt-parent = <&cpuintc>;
- interrupts = <5>;
-
- port@0 {
- compatible = "ralink,eth-port";
- reg = <0>;
- phy-handle = <&phy0>;
- };
-
-};
diff --git a/Documentation/devicetree/bindings/net/ralink,rt3050-esw.txt b/Documentation/devicetree/bindings/net/ralink,rt3050-esw.txt
deleted file mode 100644
index 87e315856efa..000000000000
--- a/Documentation/devicetree/bindings/net/ralink,rt3050-esw.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-Ralink Fast Ethernet Embedded Switch
-====================================
-
-The ralink fast ethernet embedded switch can be found on Ralink and Mediatek
-SoCs (RT3x5x, RT5350, MT76x8).
-
-Required properties:
-- compatible: Should be "ralink,rt3050-esw"
-- reg: Address and length of the register set for the device
-- interrupts: Should contain the embedded switches interrupt
-- resets: Should contain the embedded switches resets
-- reset-names: Should contain the reset names "esw"
-
-Optional properties:
-- ralink,portmap: can be used to choose if the default switch setup is
- llllw or wllll
-- ralink,led_polarity: override the active high/low settings of the leds
-
-Example:
-
-esw@10110000 {
- compatible = "ralink,rt3050-esw";
- reg = <0x10110000 8000>;
-
- resets = <&rstctrl 23>;
- reset-names = "esw";
-
- interrupt-parent = <&intc>;
- interrupts = <17>;
-};
diff --git a/Documentation/devicetree/bindings/net/renesas,etheravb.yaml b/Documentation/devicetree/bindings/net/renesas,etheravb.yaml
index acf347f3cdbe..3f41294f5997 100644
--- a/Documentation/devicetree/bindings/net/renesas,etheravb.yaml
+++ b/Documentation/devicetree/bindings/net/renesas,etheravb.yaml
@@ -40,11 +40,16 @@ properties:
- renesas,etheravb-r8a77980 # R-Car V3H
- renesas,etheravb-r8a77990 # R-Car E3
- renesas,etheravb-r8a77995 # R-Car D3
- - renesas,etheravb-r8a779a0 # R-Car V3U
- const: renesas,etheravb-rcar-gen3 # R-Car Gen3 and RZ/G2
- items:
- enum:
+ - renesas,etheravb-r8a779a0 # R-Car V3U
+ - renesas,etheravb-r8a779g0 # R-Car V4H
+ - const: renesas,etheravb-rcar-gen4 # R-Car Gen4
+
+ - items:
+ - enum:
- renesas,etheravb-r9a09g011 # RZ/V2M
- const: renesas,etheravb-rzv2m # RZ/V2M compatible
@@ -207,7 +212,7 @@ allOf:
- renesas,etheravb-r8a77965
- renesas,etheravb-r8a77970
- renesas,etheravb-r8a77980
- - renesas,etheravb-r8a779a0
+ - renesas,etheravb-rcar-gen4
then:
required:
- tx-internal-delay-ps
diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
index 083623c8d718..42fb72b6909d 100644
--- a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
@@ -25,7 +25,9 @@ select:
- rockchip,rk3368-gmac
- rockchip,rk3399-gmac
- rockchip,rk3568-gmac
+ - rockchip,rk3588-gmac
- rockchip,rv1108-gmac
+ - rockchip,rv1126-gmac
required:
- compatible
@@ -47,9 +49,11 @@ properties:
- rockchip,rk3368-gmac
- rockchip,rk3399-gmac
- rockchip,rv1108-gmac
+ - rockchip,rv1126-gmac
- items:
- enum:
- rockchip,rk3568-gmac
+ - rockchip,rk3588-gmac
- const: snps,dwmac-4.20a
clocks:
@@ -81,6 +85,11 @@ properties:
description: The phandle of the syscon node for the general register file.
$ref: /schemas/types.yaml#/definitions/phandle
+ rockchip,php-grf:
+ description:
+ The phandle of the syscon node for the peripheral general register file.
+ $ref: /schemas/types.yaml#/definitions/phandle
+
tx_delay:
description: Delay value for TXD timing. Range value is 0~0x7F, 0x30 as default.
$ref: /schemas/types.yaml#/definitions/uint32
diff --git a/Documentation/devicetree/bindings/net/snps,dwmac.yaml b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
index 491597c02edf..f94a0a4320f1 100644
--- a/Documentation/devicetree/bindings/net/snps,dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
@@ -74,6 +74,7 @@ properties:
- rockchip,rk3328-gmac
- rockchip,rk3366-gmac
- rockchip,rk3368-gmac
+ - rockchip,rk3588-gmac
- rockchip,rk3399-gmac
- rockchip,rv1108-gmac
- snps,dwmac
@@ -288,6 +289,11 @@ properties:
is supported. For example, this is used in case of SGMII and
MAC2MAC connection.
+ snps,clk-csr:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ Frequency division factor for MDC clock.
+
mdio:
$ref: mdio.yaml#
unevaluatedProperties: false
@@ -301,6 +307,60 @@ properties:
required:
- compatible
+ stmmac-axi-config:
+ type: object
+ unevaluatedProperties: false
+ description:
+ AXI BUS Mode parameters.
+
+ properties:
+ snps,lpi_en:
+ $ref: /schemas/types.yaml#/definitions/flag
+ description:
+ enable Low Power Interface
+
+ snps,xit_frm:
+ $ref: /schemas/types.yaml#/definitions/flag
+ description:
+ unlock on WoL
+
+ snps,wr_osr_lmt:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ max write outstanding req. limit
+
+ snps,rd_osr_lmt:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ max read outstanding req. limit
+
+ snps,kbbe:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ do not cross 1KiB boundary.
+
+ snps,blen:
+ $ref: /schemas/types.yaml#/definitions/uint32-array
+ description:
+ this is a vector of supported burst length.
+ minItems: 7
+ maxItems: 7
+
+ snps,fb:
+ $ref: /schemas/types.yaml#/definitions/flag
+ description:
+ fixed-burst
+
+ snps,mb:
+ $ref: /schemas/types.yaml#/definitions/flag
+ description:
+ mixed-burst
+
+ snps,rb:
+ $ref: /schemas/types.yaml#/definitions/flag
+ description:
+ rebuild INCRx Burst
+
required:
- compatible
- reg
diff --git a/Documentation/devicetree/bindings/net/sunplus,sp7021-emac.yaml b/Documentation/devicetree/bindings/net/sunplus,sp7021-emac.yaml
index 62dffee27c3d..8e51dcdb4796 100644
--- a/Documentation/devicetree/bindings/net/sunplus,sp7021-emac.yaml
+++ b/Documentation/devicetree/bindings/net/sunplus,sp7021-emac.yaml
@@ -32,6 +32,7 @@ properties:
ethernet-ports:
type: object
+ additionalProperties: false
description: Ethernet ports to PHY
properties:
@@ -44,6 +45,7 @@ properties:
patternProperties:
"^port@[0-1]$":
type: object
+ additionalProperties: false
description: Port to PHY
properties:
diff --git a/Documentation/devicetree/bindings/net/ti,cpsw-switch.yaml b/Documentation/devicetree/bindings/net/ti,cpsw-switch.yaml
index 31bf825c6598..46e330f45768 100644
--- a/Documentation/devicetree/bindings/net/ti,cpsw-switch.yaml
+++ b/Documentation/devicetree/bindings/net/ti,cpsw-switch.yaml
@@ -77,6 +77,8 @@ properties:
ethernet-ports:
type: object
+ additionalProperties: false
+
properties:
'#address-cells':
const: 1
@@ -89,6 +91,7 @@ properties:
description: CPSW external ports
$ref: ethernet-controller.yaml#
+ unevaluatedProperties: false
properties:
reg:
@@ -117,6 +120,7 @@ properties:
cpts:
type: object
+ unevaluatedProperties: false
description:
The Common Platform Time Sync (CPTS) module
diff --git a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml
index b8281d8be940..7d90beaccc60 100644
--- a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml
+++ b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml
@@ -55,6 +55,7 @@ properties:
compatible:
enum:
- ti,am654-cpsw-nuss
+ - ti,j7200-cpswxg-nuss
- ti,j721e-cpsw-nuss
- ti,am642-cpsw-nuss
@@ -110,16 +111,17 @@ properties:
const: 0
patternProperties:
- port@[1-2]:
+ "^port@[1-4]$":
type: object
description: CPSWxG NUSS external ports
$ref: ethernet-controller.yaml#
+ unevaluatedProperties: false
properties:
reg:
minimum: 1
- maximum: 2
+ maximum: 4
description: CPSW port number
phys:
@@ -178,6 +180,19 @@ required:
- '#address-cells'
- '#size-cells'
+allOf:
+ - if:
+ not:
+ properties:
+ compatible:
+ contains:
+ const: ti,j7200-cpswxg-nuss
+ then:
+ properties:
+ ethernet-ports:
+ patternProperties:
+ "^port@[3-4]$": false
+
additionalProperties: false
examples:
diff --git a/Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml b/Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml
index b783ad0d1f53..e9f78cef6b7f 100644
--- a/Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml
+++ b/Documentation/devicetree/bindings/net/ti,k3-am654-cpts.yaml
@@ -95,6 +95,7 @@ properties:
refclk-mux:
type: object
+ additionalProperties: false
description: CPTS reference clock multiplexer clock
properties:
'#clock-cells':
diff --git a/Documentation/devicetree/bindings/net/vertexcom-mse102x.yaml b/Documentation/devicetree/bindings/net/vertexcom-mse102x.yaml
index 8156a9aeb589..304757bf9281 100644
--- a/Documentation/devicetree/bindings/net/vertexcom-mse102x.yaml
+++ b/Documentation/devicetree/bindings/net/vertexcom-mse102x.yaml
@@ -7,7 +7,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
title: The Vertexcom MSE102x (SPI) Device Tree Bindings
maintainers:
- - Stefan Wahren <stefan.wahren@in-tech.com>
+ - Stefan Wahren <stefan.wahren@chargebyte.com>
description:
Vertexcom's MSE102x are a family of HomePlug GreenPHY chips.
diff --git a/Documentation/devicetree/bindings/net/wireless/brcm,bcm4329-fmac.yaml b/Documentation/devicetree/bindings/net/wireless/brcm,bcm4329-fmac.yaml
index 53b4153d9bfc..fec1cc9b9a08 100644
--- a/Documentation/devicetree/bindings/net/wireless/brcm,bcm4329-fmac.yaml
+++ b/Documentation/devicetree/bindings/net/wireless/brcm,bcm4329-fmac.yaml
@@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/net/wireless/brcm,bcm4329-fmac.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
-title: Broadcom BCM4329 family fullmac wireless SDIO devices
+title: Broadcom BCM4329 family fullmac wireless SDIO/PCIE devices
maintainers:
- Arend van Spriel <arend@broadcom.com>
@@ -41,11 +41,17 @@ properties:
- cypress,cyw4373-fmac
- cypress,cyw43012-fmac
- const: brcm,bcm4329-fmac
- - const: brcm,bcm4329-fmac
+ - enum:
+ - brcm,bcm4329-fmac
+ - pci14e4,43dc # BCM4355
+ - pci14e4,4464 # BCM4364
+ - pci14e4,4488 # BCM4377
+ - pci14e4,4425 # BCM4378
+ - pci14e4,4433 # BCM4387
reg:
- description: SDIO function number for the device, for most cases
- this will be 1.
+ description: SDIO function number for the device (for most cases
+ this will be 1) or PCI device identifier.
interrupts:
maxItems: 1
@@ -85,6 +91,31 @@ properties:
takes precedence.
type: boolean
+ brcm,cal-blob:
+ $ref: /schemas/types.yaml#/definitions/uint8-array
+ description: A per-device calibration blob for the Wi-Fi radio. This
+ should be filled in by the bootloader from platform configuration
+ data, if necessary, and will be uploaded to the device if present.
+
+ brcm,board-type:
+ $ref: /schemas/types.yaml#/definitions/string
+ description: Overrides the board type, which is normally the compatible of
+ the root node. This can be used to decouple the overall system board or
+ device name from the board type for WiFi purposes, which is used to
+ construct firmware and NVRAM configuration filenames, allowing for
+ multiple devices that share the same module or characteristics for the
+ WiFi subsystem to share the same firmware/NVRAM files. On Apple platforms,
+ this should be the Apple module-instance codename prefixed by "apple,",
+ e.g. "apple,honshu".
+
+ apple,antenna-sku:
+ $ref: /schemas/types.yaml#/definitions/string
+ description: Antenna SKU used to identify a specific antenna configuration
+ on Apple platforms. This is use to build firmware filenames, to allow
+ platforms with different antenna configs to have different firmware and/or
+ NVRAM. This would normally be filled in by the bootloader from platform
+ configuration data.
+
required:
- compatible
- reg
diff --git a/Documentation/devicetree/bindings/net/wireless/microchip,wilc1000.yaml b/Documentation/devicetree/bindings/net/wireless/microchip,wilc1000.yaml
index 60de78f1bc7b..b3405f284580 100644
--- a/Documentation/devicetree/bindings/net/wireless/microchip,wilc1000.yaml
+++ b/Documentation/devicetree/bindings/net/wireless/microchip,wilc1000.yaml
@@ -20,8 +20,6 @@ properties:
reg: true
- spi-max-frequency: true
-
interrupts:
maxItems: 1
@@ -51,7 +49,10 @@ required:
- compatible
- interrupts
-additionalProperties: false
+allOf:
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
+
+unevaluatedProperties: false
examples:
- |
diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml
index a677b056f112..f7cf135aa37f 100644
--- a/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml
+++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml
@@ -66,6 +66,18 @@ properties:
required:
- iommus
+ qcom,smem-states:
+ $ref: /schemas/types.yaml#/definitions/phandle-array
+ description: State bits used by the AP to signal the WLAN Q6.
+ items:
+ - description: Signal bits used to enable/disable low power mode
+ on WCN6750 in the case of WoW (Wake on Wireless).
+
+ qcom,smem-state-names:
+ description: The names of the state bits used for SMP2P output.
+ items:
+ - const: wlan-smp2p-out
+
required:
- compatible
- reg
@@ -448,6 +460,8 @@ examples:
<GIC_SPI 799 IRQ_TYPE_EDGE_RISING>;
qcom,rproc = <&remoteproc_wpss>;
memory-region = <&wlan_fw_mem>, <&wlan_ce_mem>;
+ qcom,smem-states = <&wlan_smp2p_out 0>;
+ qcom,smem-state-names = "wlan-smp2p-out";
wifi-firmware {
iommus = <&apps_smmu 0x1c02 0x1>;
};
diff --git a/Documentation/devicetree/bindings/net/wireless/silabs,wfx.yaml b/Documentation/devicetree/bindings/net/wireless/silabs,wfx.yaml
index 76199a67d628..b35d2f3ad1ad 100644
--- a/Documentation/devicetree/bindings/net/wireless/silabs,wfx.yaml
+++ b/Documentation/devicetree/bindings/net/wireless/silabs,wfx.yaml
@@ -29,12 +29,6 @@ description: >
Documentation/devicetree/bindings/mmc/mmc-pwrseq-simple.yaml for more
information.
- For SPI:
-
- In add of the properties below, please consult
- Documentation/devicetree/bindings/spi/spi-controller.yaml for optional SPI
- related properties.
-
properties:
compatible:
items:
@@ -52,8 +46,6 @@ properties:
bindings.
maxItems: 1
- spi-max-frequency: true
-
interrupts:
description: The interrupt line. Should be IRQ_TYPE_EDGE_RISING. When SPI is
used, this property is required. When SDIO is used, the "in-band"
@@ -84,12 +76,15 @@ properties:
mac-address: true
-additionalProperties: false
-
required:
- compatible
- reg
+allOf:
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
+
+unevaluatedProperties: false
+
examples:
- |
#include <dt-bindings/gpio/gpio.h>
diff --git a/Documentation/devicetree/bindings/net/wireless/ti,wlcore.yaml b/Documentation/devicetree/bindings/net/wireless/ti,wlcore.yaml
index d68bb2ec1f7e..e31456730e9f 100644
--- a/Documentation/devicetree/bindings/net/wireless/ti,wlcore.yaml
+++ b/Documentation/devicetree/bindings/net/wireless/ti,wlcore.yaml
@@ -36,8 +36,6 @@ properties:
This is required when connected via SPI, and optional when connected via
SDIO.
- spi-max-frequency: true
-
interrupts:
minItems: 1
maxItems: 2
@@ -69,20 +67,22 @@ required:
- compatible
- interrupts
-if:
- properties:
- compatible:
- contains:
- enum:
- - ti,wl1271
- - ti,wl1273
- - ti,wl1281
- - ti,wl1283
-then:
- required:
- - ref-clock-frequency
-
-additionalProperties: false
+allOf:
+ - $ref: /schemas/spi/spi-peripheral-props.yaml#
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - ti,wl1271
+ - ti,wl1273
+ - ti,wl1281
+ - ti,wl1283
+ then:
+ required:
+ - ref-clock-frequency
+
+unevaluatedProperties: false
examples:
- |
diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index 7823a069a903..96cd7a26f3d9 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -846,7 +846,7 @@ primary_reselect
tlb_dynamic_lb
Specifies if dynamic shuffling of flows is enabled in tlb
- mode. The value has no effect on any other modes.
+ or alb mode. The value has no effect on any other modes.
The default behavior of tlb mode is to shuffle active flows across
slaves based on the load in that interval. This gives nice lb
diff --git a/Documentation/networking/decnet.rst b/Documentation/networking/decnet.rst
deleted file mode 100644
index b8bc11ff8370..000000000000
--- a/Documentation/networking/decnet.rst
+++ /dev/null
@@ -1,243 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-=========================================
-Linux DECnet Networking Layer Information
-=========================================
-
-1. Other documentation....
-==========================
-
- - Project Home Pages
- - http://www.chygwyn.com/ - Kernel info
- - http://linux-decnet.sourceforge.net/ - Userland tools
- - http://www.sourceforge.net/projects/linux-decnet/ - Status page
-
-2. Configuring the kernel
-=========================
-
-Be sure to turn on the following options:
-
- - CONFIG_DECNET (obviously)
- - CONFIG_PROC_FS (to see what's going on)
- - CONFIG_SYSCTL (for easy configuration)
-
-if you want to try out router support (not properly debugged yet)
-you'll need the following options as well...
-
- - CONFIG_DECNET_ROUTER (to be able to add/delete routes)
- - CONFIG_NETFILTER (will be required for the DECnet routing daemon)
-
-Don't turn on SIOCGIFCONF support for DECnet unless you are really sure
-that you need it, in general you won't and it can cause ifconfig to
-malfunction.
-
-Run time configuration has changed slightly from the 2.4 system. If you
-want to configure an endnode, then the simplified procedure is as follows:
-
- - Set the MAC address on your ethernet card before starting _any_ other
- network protocols.
-
-As soon as your network card is brought into the UP state, DECnet should
-start working. If you need something more complicated or are unsure how
-to set the MAC address, see the next section. Also all configurations which
-worked with 2.4 will work under 2.5 with no change.
-
-3. Command line options
-=======================
-
-You can set a DECnet address on the kernel command line for compatibility
-with the 2.4 configuration procedure, but in general it's not needed any more.
-If you do st a DECnet address on the command line, it has only one purpose
-which is that its added to the addresses on the loopback device.
-
-With 2.4 kernels, DECnet would only recognise addresses as local if they
-were added to the loopback device. In 2.5, any local interface address
-can be used to loop back to the local machine. Of course this does not
-prevent you adding further addresses to the loopback device if you
-want to.
-
-N.B. Since the address list of an interface determines the addresses for
-which "hello" messages are sent, if you don't set an address on the loopback
-interface then you won't see any entries in /proc/net/neigh for the local
-host until such time as you start a connection. This doesn't affect the
-operation of the local communications in any other way though.
-
-The kernel command line takes options looking like the following::
-
- decnet.addr=1,2
-
-the two numbers are the node address 1,2 = 1.2 For 2.2.xx kernels
-and early 2.3.xx kernels, you must use a comma when specifying the
-DECnet address like this. For more recent 2.3.xx kernels, you may
-use almost any character except space, although a `.` would be the most
-obvious choice :-)
-
-There used to be a third number specifying the node type. This option
-has gone away in favour of a per interface node type. This is now set
-using /proc/sys/net/decnet/conf/<dev>/forwarding. This file can be
-set with a single digit, 0=EndNode, 1=L1 Router and 2=L2 Router.
-
-There are also equivalent options for modules. The node address can
-also be set through the /proc/sys/net/decnet/ files, as can other system
-parameters.
-
-Currently the only supported devices are ethernet and ip_gre. The
-ethernet address of your ethernet card has to be set according to the DECnet
-address of the node in order for it to be autoconfigured (and then appear in
-/proc/net/decnet_dev). There is a utility available at the above
-FTP sites called dn2ethaddr which can compute the correct ethernet
-address to use. The address can be set by ifconfig either before or
-at the time the device is brought up. If you are using RedHat you can
-add the line::
-
- MACADDR=AA:00:04:00:03:04
-
-or something similar, to /etc/sysconfig/network-scripts/ifcfg-eth0 or
-wherever your network card's configuration lives. Setting the MAC address
-of your ethernet card to an address starting with "hi-ord" will cause a
-DECnet address which matches to be added to the interface (which you can
-verify with iproute2).
-
-The default device for routing can be set through the /proc filesystem
-by setting /proc/sys/net/decnet/default_device to the
-device you want DECnet to route packets out of when no specific route
-is available. Usually this will be eth0, for example::
-
- echo -n "eth0" >/proc/sys/net/decnet/default_device
-
-If you don't set the default device, then it will default to the first
-ethernet card which has been autoconfigured as described above. You can
-confirm that by looking in the default_device file of course.
-
-There is a list of what the other files under /proc/sys/net/decnet/ do
-on the kernel patch web site (shown above).
-
-4. Run time kernel configuration
-================================
-
-
-This is either done through the sysctl/proc interface (see the kernel web
-pages for details on what the various options do) or through the iproute2
-package in the same way as IPv4/6 configuration is performed.
-
-Documentation for iproute2 is included with the package, although there is
-as yet no specific section on DECnet, most of the features apply to both
-IP and DECnet, albeit with DECnet addresses instead of IP addresses and
-a reduced functionality.
-
-If you want to configure a DECnet router you'll need the iproute2 package
-since its the _only_ way to add and delete routes currently. Eventually
-there will be a routing daemon to send and receive routing messages for
-each interface and update the kernel routing tables accordingly. The
-routing daemon will use netfilter to listen to routing packets, and
-rtnetlink to update the kernels routing tables.
-
-The DECnet raw socket layer has been removed since it was there purely
-for use by the routing daemon which will now use netfilter (a much cleaner
-and more generic solution) instead.
-
-5. How can I tell if its working?
-=================================
-
-Here is a quick guide of what to look for in order to know if your DECnet
-kernel subsystem is working.
-
- - Is the node address set (see /proc/sys/net/decnet/node_address)
- - Is the node of the correct type
- (see /proc/sys/net/decnet/conf/<dev>/forwarding)
- - Is the Ethernet MAC address of each Ethernet card set to match
- the DECnet address. If in doubt use the dn2ethaddr utility available
- at the ftp archive.
- - If the previous two steps are satisfied, and the Ethernet card is up,
- you should find that it is listed in /proc/net/decnet_dev and also
- that it appears as a directory in /proc/sys/net/decnet/conf/. The
- loopback device (lo) should also appear and is required to communicate
- within a node.
- - If you have any DECnet routers on your network, they should appear
- in /proc/net/decnet_neigh, otherwise this file will only contain the
- entry for the node itself (if it doesn't check to see if lo is up).
- - If you want to send to any node which is not listed in the
- /proc/net/decnet_neigh file, you'll need to set the default device
- to point to an Ethernet card with connection to a router. This is
- again done with the /proc/sys/net/decnet/default_device file.
- - Try starting a simple server and client, like the dnping/dnmirror
- over the loopback interface. With luck they should communicate.
- For this step and those after, you'll need the DECnet library
- which can be obtained from the above ftp sites as well as the
- actual utilities themselves.
- - If this seems to work, then try talking to a node on your local
- network, and see if you can obtain the same results.
- - At this point you are on your own... :-)
-
-6. How to send a bug report
-===========================
-
-If you've found a bug and want to report it, then there are several things
-you can do to help me work out exactly what it is that is wrong. Useful
-information (_most_ of which _is_ _essential_) includes:
-
- - What kernel version are you running ?
- - What version of the patch are you running ?
- - How far though the above set of tests can you get ?
- - What is in the /proc/decnet* files and /proc/sys/net/decnet/* files ?
- - Which services are you running ?
- - Which client caused the problem ?
- - How much data was being transferred ?
- - Was the network congested ?
- - How can the problem be reproduced ?
- - Can you use tcpdump to get a trace ? (N.B. Most (all?) versions of
- tcpdump don't understand how to dump DECnet properly, so including
- the hex listing of the packet contents is _essential_, usually the -x flag.
- You may also need to increase the length grabbed with the -s flag. The
- -e flag also provides very useful information (ethernet MAC addresses))
-
-7. MAC FAQ
-==========
-
-A quick FAQ on ethernet MAC addresses to explain how Linux and DECnet
-interact and how to get the best performance from your hardware.
-
-Ethernet cards are designed to normally only pass received network frames
-to a host computer when they are addressed to it, or to the broadcast address.
-
-Linux has an interface which allows the setting of extra addresses for
-an ethernet card to listen to. If the ethernet card supports it, the
-filtering operation will be done in hardware, if not the extra unwanted packets
-received will be discarded by the host computer. In the latter case,
-significant processor time and bus bandwidth can be used up on a busy
-network (see the NAPI documentation for a longer explanation of these
-effects).
-
-DECnet makes use of this interface to allow running DECnet on an ethernet
-card which has already been configured using TCP/IP (presumably using the
-built in MAC address of the card, as usual) and/or to allow multiple DECnet
-addresses on each physical interface. If you do this, be aware that if your
-ethernet card doesn't support perfect hashing in its MAC address filter
-then your computer will be doing more work than required. Some cards
-will simply set themselves into promiscuous mode in order to receive
-packets from the DECnet specified addresses. So if you have one of these
-cards its better to set the MAC address of the card as described above
-to gain the best efficiency. Better still is to use a card which supports
-NAPI as well.
-
-
-8. Mailing list
-===============
-
-If you are keen to get involved in development, or want to ask questions
-about configuration, or even just report bugs, then there is a mailing
-list that you can join, details are at:
-
-http://sourceforge.net/mail/?group_id=4993
-
-9. Legal Info
-=============
-
-The Linux DECnet project team have placed their code under the GPL. The
-software is provided "as is" and without warranty express or implied.
-DECnet is a trademark of Compaq. This software is not a product of
-Compaq. We acknowledge the help of people at Compaq in providing extra
-documentation above and beyond what was previously publicly available.
-
-Steve Whitehouse <SteveW@ACM.org>
-
diff --git a/Documentation/networking/device_drivers/can/freescale/flexcan.rst b/Documentation/networking/device_drivers/can/freescale/flexcan.rst
index 4e3eec6cecd2..106cd2890135 100644
--- a/Documentation/networking/device_drivers/can/freescale/flexcan.rst
+++ b/Documentation/networking/device_drivers/can/freescale/flexcan.rst
@@ -5,7 +5,7 @@ Flexcan CAN Controller driver
=============================
Authors: Marc Kleine-Budde <mkl@pengutronix.de>,
-Dario Binacchi <dario.binacchi@amarula.solutions.com>
+Dario Binacchi <dario.binacchi@amarulasolutions.com>
On/off RTR frames reception
===========================
diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst
index 7f1777173abb..5196905582c5 100644
--- a/Documentation/networking/device_drivers/ethernet/index.rst
+++ b/Documentation/networking/device_drivers/ethernet/index.rst
@@ -52,6 +52,7 @@ Contents:
ti/tlan
toshiba/spider_net
wangxun/txgbe
+ wangxun/ngbe
.. only:: subproject and html
diff --git a/Documentation/networking/device_drivers/ethernet/wangxun/ngbe.rst b/Documentation/networking/device_drivers/ethernet/wangxun/ngbe.rst
new file mode 100644
index 000000000000..43a02f9943e1
--- /dev/null
+++ b/Documentation/networking/device_drivers/ethernet/wangxun/ngbe.rst
@@ -0,0 +1,14 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============================================================
+Linux Base Driver for WangXun(R) Gigabit PCI Express Adapters
+=============================================================
+
+WangXun Gigabit Linux driver.
+Copyright (c) 2019 - 2022 Beijing WangXun Technology Co., Ltd.
+
+Support
+=======
+ If you have problems with the software or hardware, please contact our
+ customer support team via email at nic-support@net-swift.com or check our website
+ at https://www.net-swift.com
diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index 8c082b139bbd..0c89ceb8986d 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -139,6 +139,42 @@ EMP firmware image.
The driver does not currently support reloading the driver via
``DEVLINK_RELOAD_ACTION_DRIVER_REINIT``.
+Port split
+==========
+
+The ``ice`` driver supports port splitting only for port 0, as the FW has
+a predefined set of available port split options for the whole device.
+
+A system reboot is required for port split to be applied.
+
+The following command will select the port split option with 4 ports:
+
+.. code:: shell
+
+ $ devlink port split pci/0000:16:00.0/0 count 4
+
+The list of all available port options will be printed to dynamic debug after
+each ``split`` and ``unsplit`` command. The first option is the default.
+
+.. code:: shell
+
+ ice 0000:16:00.0: Available port split options and max port speeds (Gbps):
+ ice 0000:16:00.0: Status Split Quad 0 Quad 1
+ ice 0000:16:00.0: count L0 L1 L2 L3 L4 L5 L6 L7
+ ice 0000:16:00.0: Active 2 100 - - - 100 - - -
+ ice 0000:16:00.0: 2 50 - 50 - - - - -
+ ice 0000:16:00.0: Pending 4 25 25 25 25 - - - -
+ ice 0000:16:00.0: 4 25 25 - - 25 25 - -
+ ice 0000:16:00.0: 8 10 10 10 10 10 10 10 10
+ ice 0000:16:00.0: 1 100 - - - - - - -
+
+There could be multiple FW port options with the same port split count. When
+the same port split count request is issued again, the next FW port option with
+the same port split count will be selected.
+
+``devlink port unsplit`` will select the option with a split count of 1. If
+there is no FW option available with split count 1, you will receive an error.
+
Regions
=======
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index e3a5f985673e..4b653d040627 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -13,10 +13,8 @@ new APIs prefixed by ``devl_*``. The older APIs handle all the locking
in devlink core, but don't allow registration of most sub-objects once
the main devlink object is itself registered. The newer ``devl_*`` APIs assume
the devlink instance lock is already held. Drivers can take the instance
-lock by calling ``devl_lock()``. It is also held in most of the callbacks.
-Eventually all callbacks will be invoked under the devlink instance lock,
-refer to the use of the ``DEVLINK_NL_FLAG_NO_LOCK`` flag in devlink core
-to find out which callbacks are not converted, yet.
+lock by calling ``devl_lock()``. It is also held all callbacks of devlink
+netlink commands.
Drivers are encouraged to use the devlink instance lock for their own needs.
diff --git a/Documentation/networking/dsa/configuration.rst b/Documentation/networking/dsa/configuration.rst
index 2b08f1a772d3..827701f8cbfe 100644
--- a/Documentation/networking/dsa/configuration.rst
+++ b/Documentation/networking/dsa/configuration.rst
@@ -49,6 +49,9 @@ In this documentation the following Ethernet interfaces are used:
*eth0*
the master interface
+*eth1*
+ another master interface
+
*lan1*
a slave interface
@@ -360,3 +363,96 @@ the ``self`` flag) has been removed. This results in the following changes:
Script writers are therefore encouraged to use the ``master static`` set of
flags when working with bridge FDB entries on DSA switch interfaces.
+
+Affinity of user ports to CPU ports
+-----------------------------------
+
+Typically, DSA switches are attached to the host via a single Ethernet
+interface, but in cases where the switch chip is discrete, the hardware design
+may permit the use of 2 or more ports connected to the host, for an increase in
+termination throughput.
+
+DSA can make use of multiple CPU ports in two ways. First, it is possible to
+statically assign the termination traffic associated with a certain user port
+to be processed by a certain CPU port. This way, user space can implement
+custom policies of static load balancing between user ports, by spreading the
+affinities according to the available CPU ports.
+
+Secondly, it is possible to perform load balancing between CPU ports on a per
+packet basis, rather than statically assigning user ports to CPU ports.
+This can be achieved by placing the DSA masters under a LAG interface (bonding
+or team). DSA monitors this operation and creates a mirror of this software LAG
+on the CPU ports facing the physical DSA masters that constitute the LAG slave
+devices.
+
+To make use of multiple CPU ports, the firmware (device tree) description of
+the switch must mark all the links between CPU ports and their DSA masters
+using the ``ethernet`` reference/phandle. At startup, only a single CPU port
+and DSA master will be used - the numerically first port from the firmware
+description which has an ``ethernet`` property. It is up to the user to
+configure the system for the switch to use other masters.
+
+DSA uses the ``rtnl_link_ops`` mechanism (with a "dsa" ``kind``) to allow
+changing the DSA master of a user port. The ``IFLA_DSA_MASTER`` u32 netlink
+attribute contains the ifindex of the master device that handles each slave
+device. The DSA master must be a valid candidate based on firmware node
+information, or a LAG interface which contains only slaves which are valid
+candidates.
+
+Using iproute2, the following manipulations are possible:
+
+ .. code-block:: sh
+
+ # See the DSA master in current use
+ ip -d link show dev swp0
+ (...)
+ dsa master eth0
+
+ # Static CPU port distribution
+ ip link set swp0 type dsa master eth1
+ ip link set swp1 type dsa master eth0
+ ip link set swp2 type dsa master eth1
+ ip link set swp3 type dsa master eth0
+
+ # CPU ports in LAG, using explicit assignment of the DSA master
+ ip link add bond0 type bond mode balance-xor && ip link set bond0 up
+ ip link set eth1 down && ip link set eth1 master bond0
+ ip link set swp0 type dsa master bond0
+ ip link set swp1 type dsa master bond0
+ ip link set swp2 type dsa master bond0
+ ip link set swp3 type dsa master bond0
+ ip link set eth0 down && ip link set eth0 master bond0
+ ip -d link show dev swp0
+ (...)
+ dsa master bond0
+
+ # CPU ports in LAG, relying on implicit migration of the DSA master
+ ip link add bond0 type bond mode balance-xor && ip link set bond0 up
+ ip link set eth0 down && ip link set eth0 master bond0
+ ip link set eth1 down && ip link set eth1 master bond0
+ ip -d link show dev swp0
+ (...)
+ dsa master bond0
+
+Notice that in the case of CPU ports under a LAG, the use of the
+``IFLA_DSA_MASTER`` netlink attribute is not strictly needed, but rather, DSA
+reacts to the ``IFLA_MASTER`` attribute change of its present master (``eth0``)
+and migrates all user ports to the new upper of ``eth0``, ``bond0``. Similarly,
+when ``bond0`` is destroyed using ``RTM_DELLINK``, DSA migrates the user ports
+that were assigned to this interface to the first physical DSA master which is
+eligible, based on the firmware description (it effectively reverts to the
+startup configuration).
+
+In a setup with more than 2 physical CPU ports, it is therefore possible to mix
+static user to CPU port assignment with LAG between DSA masters. It is not
+possible to statically assign a user port towards a DSA master that has any
+upper interfaces (this includes LAG devices - the master must always be the LAG
+in this case).
+
+Live changing of the DSA master (and thus CPU port) affinity of a user port is
+permitted, in order to allow dynamic redistribution in response to traffic.
+
+Physical DSA masters are allowed to join and leave at any time a LAG interface
+used as a DSA master; however, DSA will reject a LAG interface as a valid
+candidate for being a DSA master unless it has at least one physical DSA master
+as a slave device.
diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst
index d742ba6bd211..a94ddf83348a 100644
--- a/Documentation/networking/dsa/dsa.rst
+++ b/Documentation/networking/dsa/dsa.rst
@@ -303,6 +303,20 @@ These frames are then queued for transmission using the master network device
Ethernet switch will be able to process these incoming frames from the
management interface and deliver them to the physical switch port.
+When using multiple CPU ports, it is possible to stack a LAG (bonding/team)
+device between the DSA slave devices and the physical DSA masters. The LAG
+device is thus also a DSA master, but the LAG slave devices continue to be DSA
+masters as well (just with no user port assigned to them; this is needed for
+recovery in case the LAG DSA master disappears). Thus, the data path of the LAG
+DSA master is used asymmetrically. On RX, the ``ETH_P_XDSA`` handler, which
+calls ``dsa_switch_rcv()``, is invoked early (on the physical DSA master;
+LAG slave). Therefore, the RX data path of the LAG DSA master is not used.
+On the other hand, TX takes place linearly: ``dsa_slave_xmit`` calls
+``dsa_enqueue_skb``, which calls ``dev_queue_xmit`` towards the LAG DSA master.
+The latter calls ``dev_queue_xmit`` towards one physical DSA master or the
+other, and in both cases, the packet exits the system through a hardware path
+towards the switch.
+
Graphical representation
------------------------
@@ -629,6 +643,24 @@ Switch configuration
PHY cannot be found. In this case, probing of the DSA switch continues
without that particular port.
+- ``port_change_master``: method through which the affinity (association used
+ for traffic termination purposes) between a user port and a CPU port can be
+ changed. By default all user ports from a tree are assigned to the first
+ available CPU port that makes sense for them (most of the times this means
+ the user ports of a tree are all assigned to the same CPU port, except for H
+ topologies as described in commit 2c0b03258b8b). The ``port`` argument
+ represents the index of the user port, and the ``master`` argument represents
+ the new DSA master ``net_device``. The CPU port associated with the new
+ master can be retrieved by looking at ``struct dsa_port *cpu_dp =
+ master->dsa_ptr``. Additionally, the master can also be a LAG device where
+ all the slave devices are physical DSA masters. LAG DSA masters also have a
+ valid ``master->dsa_ptr`` pointer, however this is not unique, but rather a
+ duplicate of the first physical DSA master's (LAG slave) ``dsa_ptr``. In case
+ of a LAG DSA master, a further call to ``port_lag_join`` will be emitted
+ separately for the physical CPU ports associated with the physical DSA
+ masters, requesting them to create a hardware LAG associated with the LAG
+ interface.
+
PHY devices and link management
-------------------------------
@@ -1095,9 +1127,3 @@ capable hardware, but does not enforce a strict switch device driver model. On
the other DSA enforces a fairly strict device driver model, and deals with most
of the switch specific. At some point we should envision a merger between these
two subsystems and get the best of both worlds.
-
-Other hanging fruits
---------------------
-
-- allowing more than one CPU/management interface:
- http://comments.gmane.org/gmane.linux.network/365657
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index dbca3e9ec782..d578b8bcd8a4 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -220,6 +220,8 @@ Userspace to kernel:
``ETHTOOL_MSG_PHC_VCLOCKS_GET`` get PHC virtual clocks info
``ETHTOOL_MSG_MODULE_SET`` set transceiver module parameters
``ETHTOOL_MSG_MODULE_GET`` get transceiver module parameters
+ ``ETHTOOL_MSG_PSE_SET`` set PSE parameters
+ ``ETHTOOL_MSG_PSE_GET`` get PSE parameters
===================================== =================================
Kernel to userspace:
@@ -260,6 +262,7 @@ Kernel to userspace:
``ETHTOOL_MSG_STATS_GET_REPLY`` standard statistics
``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY`` PHC virtual clocks info
``ETHTOOL_MSG_MODULE_GET_REPLY`` transceiver module parameters
+ ``ETHTOOL_MSG_PSE_GET_REPLY`` PSE parameters
======================================== =================================
``GET`` requests are sent by userspace applications to retrieve device
@@ -426,6 +429,7 @@ Kernel response contents:
``ETHTOOL_A_LINKMODES_DUPLEX`` u8 duplex mode
``ETHTOOL_A_LINKMODES_MASTER_SLAVE_CFG`` u8 Master/slave port mode
``ETHTOOL_A_LINKMODES_MASTER_SLAVE_STATE`` u8 Master/slave port state
+ ``ETHTOOL_A_LINKMODES_RATE_MATCHING`` u8 PHY rate matching
========================================== ====== ==========================
For ``ETHTOOL_A_LINKMODES_OURS``, value represents advertised modes and mask
@@ -449,6 +453,7 @@ Request contents:
``ETHTOOL_A_LINKMODES_SPEED`` u32 link speed (Mb/s)
``ETHTOOL_A_LINKMODES_DUPLEX`` u8 duplex mode
``ETHTOOL_A_LINKMODES_MASTER_SLAVE_CFG`` u8 Master/slave port mode
+ ``ETHTOOL_A_LINKMODES_RATE_MATCHING`` u8 PHY rate matching
``ETHTOOL_A_LINKMODES_LANES`` u32 lanes
========================================== ====== ==========================
@@ -1625,6 +1630,62 @@ For SFF-8636 modules, low power mode is forced by the host according to table
For CMIS modules, low power mode is forced by the host according to table 6-12
in revision 5.0 of the specification.
+PSE_GET
+=======
+
+Gets PSE attributes.
+
+Request contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_PSE_HEADER`` nested request header
+ ===================================== ====== ==========================
+
+Kernel response contents:
+
+ ====================================== ====== =============================
+ ``ETHTOOL_A_PSE_HEADER`` nested reply header
+ ``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` u32 Operational state of the PoDL
+ PSE functions
+ ``ETHTOOL_A_PODL_PSE_PW_D_STATUS`` u32 power detection status of the
+ PoDL PSE.
+ ====================================== ====== =============================
+
+When set, the optional ``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` attribute identifies
+the operational state of the PoDL PSE functions. The operational state of the
+PSE function can be changed using the ``ETHTOOL_A_PODL_PSE_ADMIN_CONTROL``
+action. This option is corresponding to ``IEEE 802.3-2018`` 30.15.1.1.2
+aPoDLPSEAdminState. Possible values are:
+
+.. kernel-doc:: include/uapi/linux/ethtool.h
+ :identifiers: ethtool_podl_pse_admin_state
+
+When set, the optional ``ETHTOOL_A_PODL_PSE_PW_D_STATUS`` attribute identifies
+the power detection status of the PoDL PSE. The status depend on internal PSE
+state machine and automatic PD classification support. This option is
+corresponding to ``IEEE 802.3-2018`` 30.15.1.1.3 aPoDLPSEPowerDetectionStatus.
+Possible values are:
+
+.. kernel-doc:: include/uapi/linux/ethtool.h
+ :identifiers: ethtool_podl_pse_pw_d_status
+
+PSE_SET
+=======
+
+Sets PSE parameters.
+
+Request contents:
+
+ ====================================== ====== =============================
+ ``ETHTOOL_A_PSE_HEADER`` nested request header
+ ``ETHTOOL_A_PODL_PSE_ADMIN_CONTROL`` u32 Control PoDL PSE Admin state
+ ====================================== ====== =============================
+
+When set, the optional ``ETHTOOL_A_PODL_PSE_ADMIN_CONTROL`` attribute is used
+to control PoDL PSE Admin functions. This option is implementing
+``IEEE 802.3-2018`` 30.15.1.2.1 acPoDLPSEAdminControl. See
+``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` for supported values.
+
Request translation
===================
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 03b215bddde8..16a153bcc5fe 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -47,7 +47,6 @@ Contents:
cdc_mbim
dccp
dctcp
- decnet
dns_resolver
driver
eql
@@ -93,6 +92,7 @@ Contents:
radiotap-headers
rds
regulatory
+ representors
rxrpc
sctp
secid
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index a759872a2883..e7b3fa7bb3f7 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1040,6 +1040,35 @@ tcp_challenge_ack_limit - INTEGER
TCP stack implements per TCP socket limits anyway.
Default: INT_MAX (unlimited)
+tcp_ehash_entries - INTEGER
+ Show the number of hash buckets for TCP sockets in the current
+ networking namespace.
+
+ A negative value means the networking namespace does not own its
+ hash buckets and shares the initial networking namespace's one.
+
+tcp_child_ehash_entries - INTEGER
+ Control the number of hash buckets for TCP sockets in the child
+ networking namespace, which must be set before clone() or unshare().
+
+ If the value is not 0, the kernel uses a value rounded up to 2^n
+ as the actual hash bucket size. 0 is a special value, meaning
+ the child networking namespace will share the initial networking
+ namespace's hash buckets.
+
+ Note that the child will use the global one in case the kernel
+ fails to allocate enough memory. In addition, the global hash
+ buckets are spread over available NUMA nodes, but the allocation
+ of the child hash table depends on the current process's NUMA
+ policy, which could result in performance differences.
+
+ Note also that the default value of tcp_max_tw_buckets and
+ tcp_max_syn_backlog depend on the hash bucket size.
+
+ Possible values: 0, 2^n (n: 0 - 24 (16Mi))
+
+ Default: 0
+
UDP variables
=============
diff --git a/Documentation/networking/phy.rst b/Documentation/networking/phy.rst
index 704f31da5167..06f4fcdb58b6 100644
--- a/Documentation/networking/phy.rst
+++ b/Documentation/networking/phy.rst
@@ -308,6 +308,21 @@ Some of the interface modes are described below:
rate of 125Mpbs using a 4B/5B encoding scheme, resulting in an underlying
data rate of 100Mpbs.
+``PHY_INTERFACE_MODE_QUSGMII``
+ This defines the Cisco the Quad USGMII mode, which is the Quad variant of
+ the USGMII (Universal SGMII) link. It's very similar to QSGMII, but uses
+ a Packet Control Header (PCH) instead of the 7 bytes preamble to carry not
+ only the port id, but also so-called "extensions". The only documented
+ extension so-far in the specification is the inclusion of timestamps, for
+ PTP-enabled PHYs. This mode isn't compatible with QSGMII, but offers the
+ same capabilities in terms of link speed and negociation.
+
+``PHY_INTERFACE_MODE_1000BASEKX``
+ This is 1000BASE-X as defined by IEEE 802.3 Clause 36 with Clause 73
+ autonegotiation. Generally, it will be used with a Clause 70 PMD. To
+ contrast with the 1000BASE-X phy mode used for Clause 38 and 39 PMDs, this
+ interface mode has different autonegotiation and only supports full duplex.
+
Pause frames / flow control
===========================
diff --git a/Documentation/networking/representors.rst b/Documentation/networking/representors.rst
new file mode 100644
index 000000000000..ee1f5cd54496
--- /dev/null
+++ b/Documentation/networking/representors.rst
@@ -0,0 +1,259 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============================
+Network Function Representors
+=============================
+
+This document describes the semantics and usage of representor netdevices, as
+used to control internal switching on SmartNICs. For the closely-related port
+representors on physical (multi-port) switches, see
+:ref:`Documentation/networking/switchdev.rst <switchdev>`.
+
+Motivation
+----------
+
+Since the mid-2010s, network cards have started offering more complex
+virtualisation capabilities than the legacy SR-IOV approach (with its simple
+MAC/VLAN-based switching model) can support. This led to a desire to offload
+software-defined networks (such as OpenVSwitch) to these NICs to specify the
+network connectivity of each function. The resulting designs are variously
+called SmartNICs or DPUs.
+
+Network function representors bring the standard Linux networking stack to
+virtual switches and IOV devices. Just as each physical port of a Linux-
+controlled switch has a separate netdev, so does each virtual port of a virtual
+switch.
+When the system boots, and before any offload is configured, all packets from
+the virtual functions appear in the networking stack of the PF via the
+representors. The PF can thus always communicate freely with the virtual
+functions.
+The PF can configure standard Linux forwarding between representors, the uplink
+or any other netdev (routing, bridging, TC classifiers).
+
+Thus, a representor is both a control plane object (representing the function in
+administrative commands) and a data plane object (one end of a virtual pipe).
+As a virtual link endpoint, the representor can be configured like any other
+netdevice; in some cases (e.g. link state) the representee will follow the
+representor's configuration, while in others there are separate APIs to
+configure the representee.
+
+Definitions
+-----------
+
+This document uses the term "switchdev function" to refer to the PCIe function
+which has administrative control over the virtual switch on the device.
+Typically, this will be a PF, but conceivably a NIC could be configured to grant
+these administrative privileges instead to a VF or SF (subfunction).
+Depending on NIC design, a multi-port NIC might have a single switchdev function
+for the whole device or might have a separate virtual switch, and hence
+switchdev function, for each physical network port.
+If the NIC supports nested switching, there might be separate switchdev
+functions for each nested switch, in which case each switchdev function should
+only create representors for the ports on the (sub-)switch it directly
+administers.
+
+A "representee" is the object that a representor represents. So for example in
+the case of a VF representor, the representee is the corresponding VF.
+
+What does a representor do?
+---------------------------
+
+A representor has three main roles.
+
+1. It is used to configure the network connection the representee sees, e.g.
+ link up/down, MTU, etc. For instance, bringing the representor
+ administratively UP should cause the representee to see a link up / carrier
+ on event.
+2. It provides the slow path for traffic which does not hit any offloaded
+ fast-path rules in the virtual switch. Packets transmitted on the
+ representor netdevice should be delivered to the representee; packets
+ transmitted by the representee which fail to match any switching rule should
+ be received on the representor netdevice. (That is, there is a virtual pipe
+ connecting the representor to the representee, similar in concept to a veth
+ pair.)
+ This allows software switch implementations (such as OpenVSwitch or a Linux
+ bridge) to forward packets between representees and the rest of the network.
+3. It acts as a handle by which switching rules (such as TC filters) can refer
+ to the representee, allowing these rules to be offloaded.
+
+The combination of 2) and 3) means that the behaviour (apart from performance)
+should be the same whether a TC filter is offloaded or not. E.g. a TC rule
+on a VF representor applies in software to packets received on that representor
+netdevice, while in hardware offload it would apply to packets transmitted by
+the representee VF. Conversely, a mirred egress redirect to a VF representor
+corresponds in hardware to delivery directly to the representee VF.
+
+What functions should have a representor?
+-----------------------------------------
+
+Essentially, for each virtual port on the device's internal switch, there
+should be a representor.
+Some vendors have chosen to omit representors for the uplink and the physical
+network port, which can simplify usage (the uplink netdev becomes in effect the
+physical port's representor) but does not generalise to devices with multiple
+ports or uplinks.
+
+Thus, the following should all have representors:
+
+ - VFs belonging to the switchdev function.
+ - Other PFs on the local PCIe controller, and any VFs belonging to them.
+ - PFs and VFs on external PCIe controllers on the device (e.g. for any embedded
+ System-on-Chip within the SmartNIC).
+ - PFs and VFs with other personalities, including network block devices (such
+ as a vDPA virtio-blk PF backed by remote/distributed storage), if (and only
+ if) their network access is implemented through a virtual switch port. [#]_
+ Note that such functions can require a representor despite the representee
+ not having a netdev.
+ - Subfunctions (SFs) belonging to any of the above PFs or VFs, if they have
+ their own port on the switch (as opposed to using their parent PF's port).
+ - Any accelerators or plugins on the device whose interface to the network is
+ through a virtual switch port, even if they do not have a corresponding PCIe
+ PF or VF.
+
+This allows the entire switching behaviour of the NIC to be controlled through
+representor TC rules.
+
+It is a common misunderstanding to conflate virtual ports with PCIe virtual
+functions or their netdevs. While in simple cases there will be a 1:1
+correspondence between VF netdevices and VF representors, more advanced device
+configurations may not follow this.
+A PCIe function which does not have network access through the internal switch
+(not even indirectly through the hardware implementation of whatever services
+the function provides) should *not* have a representor (even if it has a
+netdev).
+Such a function has no switch virtual port for the representor to configure or
+to be the other end of the virtual pipe.
+The representor represents the virtual port, not the PCIe function nor the 'end
+user' netdevice.
+
+.. [#] The concept here is that a hardware IP stack in the device performs the
+ translation between block DMA requests and network packets, so that only
+ network packets pass through the virtual port onto the switch. The network
+ access that the IP stack "sees" would then be configurable through tc rules;
+ e.g. its traffic might all be wrapped in a specific VLAN or VxLAN. However,
+ any needed configuration of the block device *qua* block device, not being a
+ networking entity, would not be appropriate for the representor and would
+ thus use some other channel such as devlink.
+ Contrast this with the case of a virtio-blk implementation which forwards the
+ DMA requests unchanged to another PF whose driver then initiates and
+ terminates IP traffic in software; in that case the DMA traffic would *not*
+ run over the virtual switch and the virtio-blk PF should thus *not* have a
+ representor.
+
+How are representors created?
+-----------------------------
+
+The driver instance attached to the switchdev function should, for each virtual
+port on the switch, create a pure-software netdevice which has some form of
+in-kernel reference to the switchdev function's own netdevice or driver private
+data (``netdev_priv()``).
+This may be by enumerating ports at probe time, reacting dynamically to the
+creation and destruction of ports at run time, or a combination of the two.
+
+The operations of the representor netdevice will generally involve acting
+through the switchdev function. For example, ``ndo_start_xmit()`` might send
+the packet through a hardware TX queue attached to the switchdev function, with
+either packet metadata or queue configuration marking it for delivery to the
+representee.
+
+How are representors identified?
+--------------------------------
+
+The representor netdevice should *not* directly refer to a PCIe device (e.g.
+through ``net_dev->dev.parent`` / ``SET_NETDEV_DEV()``), either of the
+representee or of the switchdev function.
+Instead, it should implement the ``ndo_get_devlink_port()`` netdevice op, which
+the kernel uses to provide the ``phys_switch_id`` and ``phys_port_name`` sysfs
+nodes. (Some legacy drivers implement ``ndo_get_port_parent_id()`` and
+``ndo_get_phys_port_name()`` directly, but this is deprecated.) See
+:ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>` for the
+details of this API.
+
+It is expected that userland will use this information (e.g. through udev rules)
+to construct an appropriately informative name or alias for the netdevice. For
+instance if the switchdev function is ``eth4`` then a representor with a
+``phys_port_name`` of ``p0pf1vf2`` might be renamed ``eth4pf1vf2rep``.
+
+There are as yet no established conventions for naming representors which do not
+correspond to PCIe functions (e.g. accelerators and plugins).
+
+How do representors interact with TC rules?
+-------------------------------------------
+
+Any TC rule on a representor applies (in software TC) to packets received by
+that representor netdevice. Thus, if the delivery part of the rule corresponds
+to another port on the virtual switch, the driver may choose to offload it to
+hardware, applying it to packets transmitted by the representee.
+
+Similarly, since a TC mirred egress action targeting the representor would (in
+software) send the packet through the representor (and thus indirectly deliver
+it to the representee), hardware offload should interpret this as delivery to
+the representee.
+
+As a simple example, if ``PORT_DEV`` is the physical port representor and
+``REP_DEV`` is a VF representor, the following rules::
+
+ tc filter add dev $REP_DEV parent ffff: protocol ipv4 flower \
+ action mirred egress redirect dev $PORT_DEV
+ tc filter add dev $PORT_DEV parent ffff: protocol ipv4 flower skip_sw \
+ action mirred egress mirror dev $REP_DEV
+
+would mean that all IPv4 packets from the VF are sent out the physical port, and
+all IPv4 packets received on the physical port are delivered to the VF in
+addition to ``PORT_DEV``. (Note that without ``skip_sw`` on the second rule,
+the VF would get two copies, as the packet reception on ``PORT_DEV`` would
+trigger the TC rule again and mirror the packet to ``REP_DEV``.)
+
+On devices without separate port and uplink representors, ``PORT_DEV`` would
+instead be the switchdev function's own uplink netdevice.
+
+Of course the rules can (if supported by the NIC) include packet-modifying
+actions (e.g. VLAN push/pop), which should be performed by the virtual switch.
+
+Tunnel encapsulation and decapsulation are rather more complicated, as they
+involve a third netdevice (a tunnel netdev operating in metadata mode, such as
+a VxLAN device created with ``ip link add vxlan0 type vxlan external``) and
+require an IP address to be bound to the underlay device (e.g. switchdev
+function uplink netdev or port representor). TC rules such as::
+
+ tc filter add dev $REP_DEV parent ffff: flower \
+ action tunnel_key set id $VNI src_ip $LOCAL_IP dst_ip $REMOTE_IP \
+ dst_port 4789 \
+ action mirred egress redirect dev vxlan0
+ tc filter add dev vxlan0 parent ffff: flower enc_src_ip $REMOTE_IP \
+ enc_dst_ip $LOCAL_IP enc_key_id $VNI enc_dst_port 4789 \
+ action tunnel_key unset action mirred egress redirect dev $REP_DEV
+
+where ``LOCAL_IP`` is an IP address bound to ``PORT_DEV``, and ``REMOTE_IP`` is
+another IP address on the same subnet, mean that packets sent by the VF should
+be VxLAN encapsulated and sent out the physical port (the driver has to deduce
+this by a route lookup of ``LOCAL_IP`` leading to ``PORT_DEV``, and also
+perform an ARP/neighbour table lookup to find the MAC addresses to use in the
+outer Ethernet frame), while UDP packets received on the physical port with UDP
+port 4789 should be parsed as VxLAN and, if their VSID matches ``$VNI``,
+decapsulated and forwarded to the VF.
+
+If this all seems complicated, just remember the 'golden rule' of TC offload:
+the hardware should ensure the same final results as if the packets were
+processed through the slow path, traversed software TC (except ignoring any
+``skip_hw`` rules and applying any ``skip_sw`` rules) and were transmitted or
+received through the representor netdevices.
+
+Configuring the representee's MAC
+---------------------------------
+
+The representee's link state is controlled through the representor. Setting the
+representor administratively UP or DOWN should cause carrier ON or OFF at the
+representee.
+
+Setting an MTU on the representor should cause that same MTU to be reported to
+the representee.
+(On hardware that allows configuring separate and distinct MTU and MRU values,
+the representor MTU should correspond to the representee's MRU and vice-versa.)
+
+Currently there is no way to use the representor to set the station permanent
+MAC address of the representee; other methods available to do this include:
+
+ - legacy SR-IOV (``ip link set DEVICE vf NUM mac LLADDR``)
+ - devlink port function (see **devlink-port(8)** and
+ :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`)
diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networking/smc-sysctl.rst
index 742e90e6d822..6d8acdbe9be1 100644
--- a/Documentation/networking/smc-sysctl.rst
+++ b/Documentation/networking/smc-sysctl.rst
@@ -34,3 +34,28 @@ smcr_buf_type - INTEGER
- 1 - Use virtually contiguous buffers
- 2 - Mixed use of the two types. Try physically contiguous buffers first.
If not available, use virtually contiguous buffers then.
+
+smcr_testlink_time - INTEGER
+ How frequently SMC-R link sends out TEST_LINK LLC messages to confirm
+ viability, after the last activity of connections on it. Value 0 means
+ disabling TEST_LINK.
+
+ Default: 30 seconds.
+
+wmem - INTEGER
+ Initial size of send buffer used by SMC sockets.
+ The default value inherits from net.ipv4.tcp_wmem[1].
+
+ The minimum value is 16KiB and there is no hard limit for max value, but
+ only allowed 512KiB for SMC-R and 1MiB for SMC-D.
+
+ Default: 16K
+
+rmem - INTEGER
+ Initial size of receive buffer (RMB) used by SMC sockets.
+ The default value inherits from net.ipv4.tcp_rmem[1].
+
+ The minimum value is 16KiB and there is no hard limit for max value, but
+ only allowed 512KiB for SMC-R and 1MiB for SMC-D.
+
+ Default: 128K
diff --git a/Documentation/networking/switchdev.rst b/Documentation/networking/switchdev.rst
index bbf272e9d607..758f1dae3fce 100644
--- a/Documentation/networking/switchdev.rst
+++ b/Documentation/networking/switchdev.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
+.. _switchdev:
===============================================
Ethernet switch device driver model (switchdev)
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index a61eac0c73f8..c78da9ce0ec4 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -26,6 +26,7 @@ place where this information is gathered.
ioctl/index
iommu
media/index
+ netlink/index
sysfs-platform_profile
vduse
futex2
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 3b985b19f39d..5f81e2a24a5c 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -308,7 +308,6 @@ Code Seq# Include File Comments
0x89 00-06 arch/x86/include/asm/sockios.h
0x89 0B-DF linux/sockios.h
0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range
-0x89 E0-EF linux/dn.h PROTOPRIVATE range
0x89 F0-FF linux/sockios.h SIOCDEVPRIVATE range
0x8B all linux/wireless.h
0x8C 00-3F WiNRADiO driver
diff --git a/Documentation/userspace-api/netlink/index.rst b/Documentation/userspace-api/netlink/index.rst
new file mode 100644
index 000000000000..b0c21538d97d
--- /dev/null
+++ b/Documentation/userspace-api/netlink/index.rst
@@ -0,0 +1,12 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+================
+Netlink Handbook
+================
+
+Netlink documentation for users.
+
+.. toctree::
+ :maxdepth: 2
+
+ intro
diff --git a/Documentation/userspace-api/netlink/intro.rst b/Documentation/userspace-api/netlink/intro.rst
new file mode 100644
index 000000000000..0955e9f203d3
--- /dev/null
+++ b/Documentation/userspace-api/netlink/intro.rst
@@ -0,0 +1,681 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+=======================
+Introduction to Netlink
+=======================
+
+Netlink is often described as an ioctl() replacement.
+It aims to replace fixed-format C structures as supplied
+to ioctl() with a format which allows an easy way to add
+or extended the arguments.
+
+To achieve this Netlink uses a minimal fixed-format metadata header
+followed by multiple attributes in the TLV (type, length, value) format.
+
+Unfortunately the protocol has evolved over the years, in an organic
+and undocumented fashion, making it hard to coherently explain.
+To make the most practical sense this document starts by describing
+netlink as it is used today and dives into more "historical" uses
+in later sections.
+
+Opening a socket
+================
+
+Netlink communication happens over sockets, a socket needs to be
+opened first:
+
+.. code-block:: c
+
+ fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
+
+The use of sockets allows for a natural way of exchanging information
+in both directions (to and from the kernel). The operations are still
+performed synchronously when applications send() the request but
+a separate recv() system call is needed to read the reply.
+
+A very simplified flow of a Netlink "call" will therefore look
+something like:
+
+.. code-block:: c
+
+ fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
+
+ /* format the request */
+ send(fd, &request, sizeof(request));
+ n = recv(fd, &response, RSP_BUFFER_SIZE);
+ /* interpret the response */
+
+Netlink also provides natural support for "dumping", i.e. communicating
+to user space all objects of a certain type (e.g. dumping all network
+interfaces).
+
+.. code-block:: c
+
+ fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
+
+ /* format the dump request */
+ send(fd, &request, sizeof(request));
+ while (1) {
+ n = recv(fd, &buffer, RSP_BUFFER_SIZE);
+ /* one recv() call can read multiple messages, hence the loop below */
+ for (nl_msg in buffer) {
+ if (nl_msg.nlmsg_type == NLMSG_DONE)
+ goto dump_finished;
+ /* process the object */
+ }
+ }
+ dump_finished:
+
+The first two arguments of the socket() call require little explanation -
+it is opening a Netlink socket, with all headers provided by the user
+(hence NETLINK, RAW). The last argument is the protocol within Netlink.
+This field used to identify the subsystem with which the socket will
+communicate.
+
+Classic vs Generic Netlink
+--------------------------
+
+Initial implementation of Netlink depended on a static allocation
+of IDs to subsystems and provided little supporting infrastructure.
+Let us refer to those protocols collectively as **Classic Netlink**.
+The list of them is defined on top of the ``include/uapi/linux/netlink.h``
+file, they include among others - general networking (NETLINK_ROUTE),
+iSCSI (NETLINK_ISCSI), and audit (NETLINK_AUDIT).
+
+**Generic Netlink** (introduced in 2005) allows for dynamic registration of
+subsystems (and subsystem ID allocation), introspection and simplifies
+implementing the kernel side of the interface.
+
+The following section describes how to use Generic Netlink, as the
+number of subsystems using Generic Netlink outnumbers the older
+protocols by an order of magnitude. There are also no plans for adding
+more Classic Netlink protocols to the kernel.
+Basic information on how communicating with core networking parts of
+the Linux kernel (or another of the 20 subsystems using Classic
+Netlink) differs from Generic Netlink is provided later in this document.
+
+Generic Netlink
+===============
+
+In addition to the Netlink fixed metadata header each Netlink protocol
+defines its own fixed metadata header. (Similarly to how network
+headers stack - Ethernet > IP > TCP we have Netlink > Generic N. > Family.)
+
+A Netlink message always starts with struct nlmsghdr, which is followed
+by a protocol-specific header. In case of Generic Netlink the protocol
+header is struct genlmsghdr.
+
+The practical meaning of the fields in case of Generic Netlink is as follows:
+
+.. code-block:: c
+
+ struct nlmsghdr {
+ __u32 nlmsg_len; /* Length of message including headers */
+ __u16 nlmsg_type; /* Generic Netlink Family (subsystem) ID */
+ __u16 nlmsg_flags; /* Flags - request or dump */
+ __u32 nlmsg_seq; /* Sequence number */
+ __u32 nlmsg_pid; /* Port ID, set to 0 */
+ };
+ struct genlmsghdr {
+ __u8 cmd; /* Command, as defined by the Family */
+ __u8 version; /* Irrelevant, set to 1 */
+ __u16 reserved; /* Reserved, set to 0 */
+ };
+ /* TLV attributes follow... */
+
+In Classic Netlink :c:member:`nlmsghdr.nlmsg_type` used to identify
+which operation within the subsystem the message was referring to
+(e.g. get information about a netdev). Generic Netlink needs to mux
+multiple subsystems in a single protocol so it uses this field to
+identify the subsystem, and :c:member:`genlmsghdr.cmd` identifies
+the operation instead. (See :ref:`res_fam` for
+information on how to find the Family ID of the subsystem of interest.)
+Note that the first 16 values (0 - 15) of this field are reserved for
+control messages both in Classic Netlink and Generic Netlink.
+See :ref:`nl_msg_type` for more details.
+
+There are 3 usual types of message exchanges on a Netlink socket:
+
+ - performing a single action (``do``);
+ - dumping information (``dump``);
+ - getting asynchronous notifications (``multicast``).
+
+Classic Netlink is very flexible and presumably allows other types
+of exchanges to happen, but in practice those are the three that get
+used.
+
+Asynchronous notifications are sent by the kernel and received by
+the user sockets which subscribed to them. ``do`` and ``dump`` requests
+are initiated by the user. :c:member:`nlmsghdr.nlmsg_flags` should
+be set as follows:
+
+ - for ``do``: ``NLM_F_REQUEST | NLM_F_ACK``
+ - for ``dump``: ``NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP``
+
+:c:member:`nlmsghdr.nlmsg_seq` should be a set to a monotonically
+increasing value. The value gets echoed back in responses and doesn't
+matter in practice, but setting it to an increasing value for each
+message sent is considered good hygiene. The purpose of the field is
+matching responses to requests. Asynchronous notifications will have
+:c:member:`nlmsghdr.nlmsg_seq` of ``0``.
+
+:c:member:`nlmsghdr.nlmsg_pid` is the Netlink equivalent of an address.
+This field can be set to ``0`` when talking to the kernel.
+See :ref:`nlmsg_pid` for the (uncommon) uses of the field.
+
+The expected use for :c:member:`genlmsghdr.version` was to allow
+versioning of the APIs provided by the subsystems. No subsystem to
+date made significant use of this field, so setting it to ``1`` seems
+like a safe bet.
+
+.. _nl_msg_type:
+
+Netlink message types
+---------------------
+
+As previously mentioned :c:member:`nlmsghdr.nlmsg_type` carries
+protocol specific values but the first 16 identifiers are reserved
+(first subsystem specific message type should be equal to
+``NLMSG_MIN_TYPE`` which is ``0x10``).
+
+There are only 4 Netlink control messages defined:
+
+ - ``NLMSG_NOOP`` - ignore the message, not used in practice;
+ - ``NLMSG_ERROR`` - carries the return code of an operation;
+ - ``NLMSG_DONE`` - marks the end of a dump;
+ - ``NLMSG_OVERRUN`` - socket buffer has overflown, not used to date.
+
+``NLMSG_ERROR`` and ``NLMSG_DONE`` are of practical importance.
+They carry return codes for operations. Note that unless
+the ``NLM_F_ACK`` flag is set on the request Netlink will not respond
+with ``NLMSG_ERROR`` if there is no error. To avoid having to special-case
+this quirk it is recommended to always set ``NLM_F_ACK``.
+
+The format of ``NLMSG_ERROR`` is described by struct nlmsgerr::
+
+ ----------------------------------------------
+ | struct nlmsghdr - response header |
+ ----------------------------------------------
+ | int error |
+ ----------------------------------------------
+ | struct nlmsghdr - original request header |
+ ----------------------------------------------
+ | ** optionally (1) payload of the request |
+ ----------------------------------------------
+ | ** optionally (2) extended ACK |
+ ----------------------------------------------
+
+There are two instances of struct nlmsghdr here, first of the response
+and second of the request. ``NLMSG_ERROR`` carries the information about
+the request which led to the error. This could be useful when trying
+to match requests to responses or re-parse the request to dump it into
+logs.
+
+The payload of the request is not echoed in messages reporting success
+(``error == 0``) or if ``NETLINK_CAP_ACK`` setsockopt() was set.
+The latter is common
+and perhaps recommended as having to read a copy of every request back
+from the kernel is rather wasteful. The absence of request payload
+is indicated by ``NLM_F_CAPPED`` in :c:member:`nlmsghdr.nlmsg_flags`.
+
+The second optional element of ``NLMSG_ERROR`` are the extended ACK
+attributes. See :ref:`ext_ack` for more details. The presence
+of extended ACK is indicated by ``NLM_F_ACK_TLVS`` in
+:c:member:`nlmsghdr.nlmsg_flags`.
+
+``NLMSG_DONE`` is simpler, the request is never echoed but the extended
+ACK attributes may be present::
+
+ ----------------------------------------------
+ | struct nlmsghdr - response header |
+ ----------------------------------------------
+ | int error |
+ ----------------------------------------------
+ | ** optionally extended ACK |
+ ----------------------------------------------
+
+.. _res_fam:
+
+Resolving the Family ID
+-----------------------
+
+This section explains how to find the Family ID of a subsystem.
+It also serves as an example of Generic Netlink communication.
+
+Generic Netlink is itself a subsystem exposed via the Generic Netlink API.
+To avoid a circular dependency Generic Netlink has a statically allocated
+Family ID (``GENL_ID_CTRL`` which is equal to ``NLMSG_MIN_TYPE``).
+The Generic Netlink family implements a command used to find out information
+about other families (``CTRL_CMD_GETFAMILY``).
+
+To get information about the Generic Netlink family named for example
+``"test1"`` we need to send a message on the previously opened Generic Netlink
+socket. The message should target the Generic Netlink Family (1), be a
+``do`` (2) call to ``CTRL_CMD_GETFAMILY`` (3). A ``dump`` version of this
+call would make the kernel respond with information about *all* the families
+it knows about. Last but not least the name of the family in question has
+to be specified (4) as an attribute with the appropriate type::
+
+ struct nlmsghdr:
+ __u32 nlmsg_len: 32
+ __u16 nlmsg_type: GENL_ID_CTRL // (1)
+ __u16 nlmsg_flags: NLM_F_REQUEST | NLM_F_ACK // (2)
+ __u32 nlmsg_seq: 1
+ __u32 nlmsg_pid: 0
+
+ struct genlmsghdr:
+ __u8 cmd: CTRL_CMD_GETFAMILY // (3)
+ __u8 version: 2 /* or 1, doesn't matter */
+ __u16 reserved: 0
+
+ struct nlattr: // (4)
+ __u16 nla_len: 10
+ __u16 nla_type: CTRL_ATTR_FAMILY_NAME
+ char data: test1\0
+
+ (padding:)
+ char data: \0\0
+
+The length fields in Netlink (:c:member:`nlmsghdr.nlmsg_len`
+and :c:member:`nlattr.nla_len`) always *include* the header.
+Attribute headers in netlink must be aligned to 4 bytes from the start
+of the message, hence the extra ``\0\0`` after ``CTRL_ATTR_FAMILY_NAME``.
+The attribute lengths *exclude* the padding.
+
+If the family is found kernel will reply with two messages, the response
+with all the information about the family::
+
+ /* Message #1 - reply */
+ struct nlmsghdr:
+ __u32 nlmsg_len: 136
+ __u16 nlmsg_type: GENL_ID_CTRL
+ __u16 nlmsg_flags: 0
+ __u32 nlmsg_seq: 1 /* echoed from our request */
+ __u32 nlmsg_pid: 5831 /* The PID of our user space process */
+
+ struct genlmsghdr:
+ __u8 cmd: CTRL_CMD_GETFAMILY
+ __u8 version: 2
+ __u16 reserved: 0
+
+ struct nlattr:
+ __u16 nla_len: 10
+ __u16 nla_type: CTRL_ATTR_FAMILY_NAME
+ char data: test1\0
+
+ (padding:)
+ data: \0\0
+
+ struct nlattr:
+ __u16 nla_len: 6
+ __u16 nla_type: CTRL_ATTR_FAMILY_ID
+ __u16: 123 /* The Family ID we are after */
+
+ (padding:)
+ char data: \0\0
+
+ struct nlattr:
+ __u16 nla_len: 9
+ __u16 nla_type: CTRL_ATTR_FAMILY_VERSION
+ __u16: 1
+
+ /* ... etc, more attributes will follow. */
+
+And the error code (success) since ``NLM_F_ACK`` had been set on the request::
+
+ /* Message #2 - the ACK */
+ struct nlmsghdr:
+ __u32 nlmsg_len: 36
+ __u16 nlmsg_type: NLMSG_ERROR
+ __u16 nlmsg_flags: NLM_F_CAPPED /* There won't be a payload */
+ __u32 nlmsg_seq: 1 /* echoed from our request */
+ __u32 nlmsg_pid: 5831 /* The PID of our user space process */
+
+ int error: 0
+
+ struct nlmsghdr: /* Copy of the request header as we sent it */
+ __u32 nlmsg_len: 32
+ __u16 nlmsg_type: GENL_ID_CTRL
+ __u16 nlmsg_flags: NLM_F_REQUEST | NLM_F_ACK
+ __u32 nlmsg_seq: 1
+ __u32 nlmsg_pid: 0
+
+The order of attributes (struct nlattr) is not guaranteed so the user
+has to walk the attributes and parse them.
+
+Note that Generic Netlink sockets are not associated or bound to a single
+family. A socket can be used to exchange messages with many different
+families, selecting the recipient family on message-by-message basis using
+the :c:member:`nlmsghdr.nlmsg_type` field.
+
+.. _ext_ack:
+
+Extended ACK
+------------
+
+Extended ACK controls reporting of additional error/warning TLVs
+in ``NLMSG_ERROR`` and ``NLMSG_DONE`` messages. To maintain backward
+compatibility this feature has to be explicitly enabled by setting
+the ``NETLINK_EXT_ACK`` setsockopt() to ``1``.
+
+Types of extended ack attributes are defined in enum nlmsgerr_attrs.
+The most commonly used attributes are ``NLMSGERR_ATTR_MSG``,
+``NLMSGERR_ATTR_OFFS`` and ``NLMSGERR_ATTR_MISS_*``.
+
+``NLMSGERR_ATTR_MSG`` carries a message in English describing
+the encountered problem. These messages are far more detailed
+than what can be expressed thru standard UNIX error codes.
+
+``NLMSGERR_ATTR_OFFS`` points to the attribute which caused the problem.
+
+``NLMSGERR_ATTR_MISS_TYPE`` and ``NLMSGERR_ATTR_MISS_NEST``
+inform about a missing attribute.
+
+Extended ACKs can be reported on errors as well as in case of success.
+The latter should be treated as a warning.
+
+Extended ACKs greatly improve the usability of Netlink and should
+always be enabled, appropriately parsed and reported to the user.
+
+Advanced topics
+===============
+
+Dump consistency
+----------------
+
+Some of the data structures kernel uses for storing objects make
+it hard to provide an atomic snapshot of all the objects in a dump
+(without impacting the fast-paths updating them).
+
+Kernel may set the ``NLM_F_DUMP_INTR`` flag on any message in a dump
+(including the ``NLMSG_DONE`` message) if the dump was interrupted and
+may be inconsistent (e.g. missing objects). User space should retry
+the dump if it sees the flag set.
+
+Introspection
+-------------
+
+The basic introspection abilities are enabled by access to the Family
+object as reported in :ref:`res_fam`. User can query information about
+the Generic Netlink family, including which operations are supported
+by the kernel and what attributes the kernel understands.
+Family information includes the highest ID of an attribute kernel can parse,
+a separate command (``CTRL_CMD_GETPOLICY``) provides detailed information
+about supported attributes, including ranges of values the kernel accepts.
+
+Querying family information is useful in cases when user space needs
+to make sure that the kernel has support for a feature before issuing
+a request.
+
+.. _nlmsg_pid:
+
+nlmsg_pid
+---------
+
+:c:member:`nlmsghdr.nlmsg_pid` is the Netlink equivalent of an address.
+It is referred to as Port ID, sometimes Process ID because for historical
+reasons if the application does not select (bind() to) an explicit Port ID
+kernel will automatically assign it the ID equal to its Process ID
+(as reported by the getpid() system call).
+
+Similarly to the bind() semantics of the TCP/IP network protocols the value
+of zero means "assign automatically", hence it is common for applications
+to leave the :c:member:`nlmsghdr.nlmsg_pid` field initialized to ``0``.
+
+The field is still used today in rare cases when kernel needs to send
+a unicast notification. User space application can use bind() to associate
+its socket with a specific PID, it then communicates its PID to the kernel.
+This way the kernel can reach the specific user space process.
+
+This sort of communication is utilized in UMH (User Mode Helper)-like
+scenarios when kernel needs to trigger user space processing or ask user
+space for a policy decision.
+
+Multicast notifications
+-----------------------
+
+One of the strengths of Netlink is the ability to send event notifications
+to user space. This is a unidirectional form of communication (kernel ->
+user) and does not involve any control messages like ``NLMSG_ERROR`` or
+``NLMSG_DONE``.
+
+For example the Generic Netlink family itself defines a set of multicast
+notifications about registered families. When a new family is added the
+sockets subscribed to the notifications will get the following message::
+
+ struct nlmsghdr:
+ __u32 nlmsg_len: 136
+ __u16 nlmsg_type: GENL_ID_CTRL
+ __u16 nlmsg_flags: 0
+ __u32 nlmsg_seq: 0
+ __u32 nlmsg_pid: 0
+
+ struct genlmsghdr:
+ __u8 cmd: CTRL_CMD_NEWFAMILY
+ __u8 version: 2
+ __u16 reserved: 0
+
+ struct nlattr:
+ __u16 nla_len: 10
+ __u16 nla_type: CTRL_ATTR_FAMILY_NAME
+ char data: test1\0
+
+ (padding:)
+ data: \0\0
+
+ struct nlattr:
+ __u16 nla_len: 6
+ __u16 nla_type: CTRL_ATTR_FAMILY_ID
+ __u16: 123 /* The Family ID we are after */
+
+ (padding:)
+ char data: \0\0
+
+ struct nlattr:
+ __u16 nla_len: 9
+ __u16 nla_type: CTRL_ATTR_FAMILY_VERSION
+ __u16: 1
+
+ /* ... etc, more attributes will follow. */
+
+The notification contains the same information as the response
+to the ``CTRL_CMD_GETFAMILY`` request.
+
+The Netlink headers of the notification are mostly 0 and irrelevant.
+The :c:member:`nlmsghdr.nlmsg_seq` may be either zero or a monotonically
+increasing notification sequence number maintained by the family.
+
+To receive notifications the user socket must subscribe to the relevant
+notification group. Much like the Family ID, the Group ID for a given
+multicast group is dynamic and can be found inside the Family information.
+The ``CTRL_ATTR_MCAST_GROUPS`` attribute contains nests with names
+(``CTRL_ATTR_MCAST_GRP_NAME``) and IDs (``CTRL_ATTR_MCAST_GRP_ID``) of
+the groups family.
+
+Once the Group ID is known a setsockopt() call adds the socket to the group:
+
+.. code-block:: c
+
+ unsigned int group_id;
+
+ /* .. find the group ID... */
+
+ setsockopt(fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP,
+ &group_id, sizeof(group_id));
+
+The socket will now receive notifications.
+
+It is recommended to use separate sockets for receiving notifications
+and sending requests to the kernel. The asynchronous nature of notifications
+means that they may get mixed in with the responses making the message
+handling much harder.
+
+Buffer sizing
+-------------
+
+Netlink sockets are datagram sockets rather than stream sockets,
+meaning that each message must be received in its entirety by a single
+recv()/recvmsg() system call. If the buffer provided by the user is too
+short, the message will be truncated and the ``MSG_TRUNC`` flag set
+in struct msghdr (struct msghdr is the second argument
+of the recvmsg() system call, *not* a Netlink header).
+
+Upon truncation the remaining part of the message is discarded.
+
+Netlink expects that the user buffer will be at least 8kB or a page
+size of the CPU architecture, whichever is bigger. Particular Netlink
+families may, however, require a larger buffer. 32kB buffer is recommended
+for most efficient handling of dumps (larger buffer fits more dumped
+objects and therefore fewer recvmsg() calls are needed).
+
+Classic Netlink
+===============
+
+The main differences between Classic and Generic Netlink are the dynamic
+allocation of subsystem identifiers and availability of introspection.
+In theory the protocol does not differ significantly, however, in practice
+Classic Netlink experimented with concepts which were abandoned in Generic
+Netlink (really, they usually only found use in a small corner of a single
+subsystem). This section is meant as an explainer of a few of such concepts,
+with the explicit goal of giving the Generic Netlink
+users the confidence to ignore them when reading the uAPI headers.
+
+Most of the concepts and examples here refer to the ``NETLINK_ROUTE`` family,
+which covers much of the configuration of the Linux networking stack.
+Real documentation of that family, deserves a chapter (or a book) of its own.
+
+Families
+--------
+
+Netlink refers to subsystems as families. This is a remnant of using
+sockets and the concept of protocol families, which are part of message
+demultiplexing in ``NETLINK_ROUTE``.
+
+Sadly every layer of encapsulation likes to refer to whatever it's carrying
+as "families" making the term very confusing:
+
+ 1. AF_NETLINK is a bona fide socket protocol family
+ 2. AF_NETLINK's documentation refers to what comes after its own
+ header (struct nlmsghdr) in a message as a "Family Header"
+ 3. Generic Netlink is a family for AF_NETLINK (struct genlmsghdr follows
+ struct nlmsghdr), yet it also calls its users "Families".
+
+Note that the Generic Netlink Family IDs are in a different "ID space"
+and overlap with Classic Netlink protocol numbers (e.g. ``NETLINK_CRYPTO``
+has the Classic Netlink protocol ID of 21 which Generic Netlink will
+happily allocate to one of its families as well).
+
+Strict checking
+---------------
+
+The ``NETLINK_GET_STRICT_CHK`` socket option enables strict input checking
+in ``NETLINK_ROUTE``. It was needed because historically kernel did not
+validate the fields of structures it didn't process. This made it impossible
+to start using those fields later without risking regressions in applications
+which initialized them incorrectly or not at all.
+
+``NETLINK_GET_STRICT_CHK`` declares that the application is initializing
+all fields correctly. It also opts into validating that message does not
+contain trailing data and requests that kernel rejects attributes with
+type higher than largest attribute type known to the kernel.
+
+``NETLINK_GET_STRICT_CHK`` is not used outside of ``NETLINK_ROUTE``.
+
+Unknown attributes
+------------------
+
+Historically Netlink ignored all unknown attributes. The thinking was that
+it would free the application from having to probe what kernel supports.
+The application could make a request to change the state and check which
+parts of the request "stuck".
+
+This is no longer the case for new Generic Netlink families and those opting
+in to strict checking. See enum netlink_validation for validation types
+performed.
+
+Fixed metadata and structures
+-----------------------------
+
+Classic Netlink made liberal use of fixed-format structures within
+the messages. Messages would commonly have a structure with
+a considerable number of fields after struct nlmsghdr. It was also
+common to put structures with multiple members inside attributes,
+without breaking each member into an attribute of its own.
+
+This has caused problems with validation and extensibility and
+therefore using binary structures is actively discouraged for new
+attributes.
+
+Request types
+-------------
+
+``NETLINK_ROUTE`` categorized requests into 4 types ``NEW``, ``DEL``, ``GET``,
+and ``SET``. Each object can handle all or some of those requests
+(objects being netdevs, routes, addresses, qdiscs etc.) Request type
+is defined by the 2 lowest bits of the message type, so commands for
+new objects would always be allocated with a stride of 4.
+
+Each object would also have it's own fixed metadata shared by all request
+types (e.g. struct ifinfomsg for netdev requests, struct ifaddrmsg for address
+requests, struct tcmsg for qdisc requests).
+
+Even though other protocols and Generic Netlink commands often use
+the same verbs in their message names (``GET``, ``SET``) the concept
+of request types did not find wider adoption.
+
+Notification echo
+-----------------
+
+``NLM_F_ECHO`` requests for notifications resulting from the request
+to be queued onto the requesting socket. This is useful to discover
+the impact of the request.
+
+Note that this feature is not universally implemented.
+
+Other request-type-specific flags
+---------------------------------
+
+Classic Netlink defined various flags for its ``GET``, ``NEW``
+and ``DEL`` requests in the upper byte of nlmsg_flags in struct nlmsghdr.
+Since request types have not been generalized the request type specific
+flags are rarely used (and considered deprecated for new families).
+
+For ``GET`` - ``NLM_F_ROOT`` and ``NLM_F_MATCH`` are combined into
+``NLM_F_DUMP``, and not used separately. ``NLM_F_ATOMIC`` is never used.
+
+For ``DEL`` - ``NLM_F_NONREC`` is only used by nftables and ``NLM_F_BULK``
+only by FDB some operations.
+
+The flags for ``NEW`` are used most commonly in classic Netlink. Unfortunately,
+the meaning is not crystal clear. The following description is based on the
+best guess of the intention of the authors, and in practice all families
+stray from it in one way or another. ``NLM_F_REPLACE`` asks to replace
+an existing object, if no matching object exists the operation should fail.
+``NLM_F_EXCL`` has the opposite semantics and only succeeds if object already
+existed.
+``NLM_F_CREATE`` asks for the object to be created if it does not
+exist, it can be combined with ``NLM_F_REPLACE`` and ``NLM_F_EXCL``.
+
+A comment in the main Netlink uAPI header states::
+
+ 4.4BSD ADD NLM_F_CREATE|NLM_F_EXCL
+ 4.4BSD CHANGE NLM_F_REPLACE
+
+ True CHANGE NLM_F_CREATE|NLM_F_REPLACE
+ Append NLM_F_CREATE
+ Check NLM_F_EXCL
+
+which seems to indicate that those flags predate request types.
+``NLM_F_REPLACE`` without ``NLM_F_CREATE`` was initially used instead
+of ``SET`` commands.
+``NLM_F_EXCL`` without ``NLM_F_CREATE`` was used to check if object exists
+without creating it, presumably predating ``GET`` commands.
+
+``NLM_F_APPEND`` indicates that if one key can have multiple objects associated
+with it (e.g. multiple next-hop objects for a route) the new object should be
+added to the list rather than replacing the entire list.
+
+uAPI reference
+==============
+
+.. kernel-doc:: include/uapi/linux/netlink.h