diff options
Diffstat (limited to 'Documentation')
25 files changed, 699 insertions, 524 deletions
diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst index 6a6d67acaf69..5d54b27c6eba 100644 --- a/Documentation/core-api/xarray.rst +++ b/Documentation/core-api/xarray.rst @@ -108,12 +108,13 @@ some, but not all of the other indices changing. Sometimes you need to ensure that a subsequent call to :c:func:`xa_store` will not need to allocate memory. The :c:func:`xa_reserve` function -will store a reserved entry at the indicated index. Users of the normal -API will see this entry as containing ``NULL``. If you do not need to -use the reserved entry, you can call :c:func:`xa_release` to remove the -unused entry. If another user has stored to the entry in the meantime, -:c:func:`xa_release` will do nothing; if instead you want the entry to -become ``NULL``, you should use :c:func:`xa_erase`. +will store a reserved entry at the indicated index. Users of the +normal API will see this entry as containing ``NULL``. If you do +not need to use the reserved entry, you can call :c:func:`xa_release` +to remove the unused entry. If another user has stored to the entry +in the meantime, :c:func:`xa_release` will do nothing; if instead you +want the entry to become ``NULL``, you should use :c:func:`xa_erase`. +Using :c:func:`xa_insert` on a reserved entry will fail. If all entries in the array are ``NULL``, the :c:func:`xa_empty` function will return ``true``. @@ -183,6 +184,8 @@ Takes xa_lock internally: * :c:func:`xa_store_bh` * :c:func:`xa_store_irq` * :c:func:`xa_insert` + * :c:func:`xa_insert_bh` + * :c:func:`xa_insert_irq` * :c:func:`xa_erase` * :c:func:`xa_erase_bh` * :c:func:`xa_erase_irq` diff --git a/Documentation/devicetree/bindings/arm/cpu-capacity.txt b/Documentation/devicetree/bindings/arm/cpu-capacity.txt index 84262cdb8d29..96fa46cb133c 100644 --- a/Documentation/devicetree/bindings/arm/cpu-capacity.txt +++ b/Documentation/devicetree/bindings/arm/cpu-capacity.txt @@ -235,4 +235,4 @@ cpus { =========================================== [1] ARM Linux Kernel documentation - CPUs bindings - Documentation/devicetree/bindings/arm/cpus.txt + Documentation/devicetree/bindings/arm/cpus.yaml diff --git a/Documentation/devicetree/bindings/arm/idle-states.txt b/Documentation/devicetree/bindings/arm/idle-states.txt index 8f0937db55c5..45730ba60af5 100644 --- a/Documentation/devicetree/bindings/arm/idle-states.txt +++ b/Documentation/devicetree/bindings/arm/idle-states.txt @@ -684,7 +684,7 @@ cpus { =========================================== [1] ARM Linux Kernel documentation - CPUs bindings - Documentation/devicetree/bindings/arm/cpus.txt + Documentation/devicetree/bindings/arm/cpus.yaml [2] ARM Linux Kernel documentation - PSCI bindings Documentation/devicetree/bindings/arm/psci.txt diff --git a/Documentation/devicetree/bindings/arm/sp810.txt b/Documentation/devicetree/bindings/arm/sp810.txt index 1b2ab1ff5587..46652bf65147 100644 --- a/Documentation/devicetree/bindings/arm/sp810.txt +++ b/Documentation/devicetree/bindings/arm/sp810.txt @@ -4,7 +4,7 @@ SP810 System Controller Required properties: - compatible: standard compatible string for a Primecell peripheral, - see Documentation/devicetree/bindings/arm/primecell.txt + see Documentation/devicetree/bindings/arm/primecell.yaml for more details should be: "arm,sp810", "arm,primecell" diff --git a/Documentation/devicetree/bindings/arm/topology.txt b/Documentation/devicetree/bindings/arm/topology.txt index de9eb0486630..b0d80c0fb265 100644 --- a/Documentation/devicetree/bindings/arm/topology.txt +++ b/Documentation/devicetree/bindings/arm/topology.txt @@ -472,4 +472,4 @@ cpus { =============================================================================== [1] ARM Linux kernel documentation - Documentation/devicetree/bindings/arm/cpus.txt + Documentation/devicetree/bindings/arm/cpus.yaml diff --git a/Documentation/devicetree/bindings/clock/marvell,mmp2.txt b/Documentation/devicetree/bindings/clock/marvell,mmp2.txt index af376a01f2b7..23b52dc02266 100644 --- a/Documentation/devicetree/bindings/clock/marvell,mmp2.txt +++ b/Documentation/devicetree/bindings/clock/marvell,mmp2.txt @@ -18,4 +18,4 @@ Required Properties: Each clock is assigned an identifier and client nodes use this identifier to specify the clock which they consume. -All these identifier could be found in <dt-bindings/clock/marvell-mmp2.h>. +All these identifiers could be found in <dt-bindings/clock/marvell,mmp2.h>. diff --git a/Documentation/devicetree/bindings/display/arm,pl11x.txt b/Documentation/devicetree/bindings/display/arm,pl11x.txt index ef89ab46b2c9..572fa2773ec4 100644 --- a/Documentation/devicetree/bindings/display/arm,pl11x.txt +++ b/Documentation/devicetree/bindings/display/arm,pl11x.txt @@ -1,6 +1,6 @@ * ARM PrimeCell Color LCD Controller PL110/PL111 -See also Documentation/devicetree/bindings/arm/primecell.txt +See also Documentation/devicetree/bindings/arm/primecell.yaml Required properties: diff --git a/Documentation/devicetree/bindings/display/msm/gpu.txt b/Documentation/devicetree/bindings/display/msm/gpu.txt index ac8df3b871f9..f8759145ce1a 100644 --- a/Documentation/devicetree/bindings/display/msm/gpu.txt +++ b/Documentation/devicetree/bindings/display/msm/gpu.txt @@ -27,7 +27,6 @@ Example: reg = <0x04300000 0x20000>; reg-names = "kgsl_3d0_reg_memory"; interrupts = <GIC_SPI 80 0>; - interrupt-names = "kgsl_3d0_irq"; clock-names = "core", "iface", diff --git a/Documentation/devicetree/bindings/gpio/gpio-mvebu.txt b/Documentation/devicetree/bindings/gpio/gpio-mvebu.txt index 38ca2201e8ae..2e097b57f170 100644 --- a/Documentation/devicetree/bindings/gpio/gpio-mvebu.txt +++ b/Documentation/devicetree/bindings/gpio/gpio-mvebu.txt @@ -14,8 +14,6 @@ Required properties: "marvell,armada-8k-gpio" should be used for the Armada 7K and 8K SoCs (either from AP or CP), see - Documentation/devicetree/bindings/arm/marvell/cp110-system-controller0.txt - and Documentation/devicetree/bindings/arm/marvell/ap806-system-controller.txt for specific details about the offset property. diff --git a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt index b83bb8249074..a3be5298a5eb 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt @@ -78,7 +78,7 @@ Sub-nodes: PPI affinity can be expressed as a single "ppi-partitions" node, containing a set of sub-nodes, each with the following property: - affinity: Should be a list of phandles to CPU nodes (as described in -Documentation/devicetree/bindings/arm/cpus.txt). + Documentation/devicetree/bindings/arm/cpus.yaml). GICv3 has one or more Interrupt Translation Services (ITS) that are used to route Message Signalled Interrupts (MSI) to the CPUs. diff --git a/Documentation/devicetree/bindings/net/qcom,ethqos.txt b/Documentation/devicetree/bindings/net/qcom,ethqos.txt new file mode 100644 index 000000000000..fcf5035810b5 --- /dev/null +++ b/Documentation/devicetree/bindings/net/qcom,ethqos.txt @@ -0,0 +1,64 @@ +Qualcomm Ethernet ETHQOS device + +This documents dwmmac based ethernet device which supports Gigabit +ethernet for version v2.3.0 onwards. + +This device has following properties: + +Required properties: + +- compatible: Should be qcom,qcs404-ethqos" + +- reg: Address and length of the register set for the device + +- reg-names: Should contain register names "stmmaceth", "rgmii" + +- clocks: Should contain phandle to clocks + +- clock-names: Should contain clock names "stmmaceth", "pclk", + "ptp_ref", "rgmii" + +- interrupts: Should contain phandle to interrupts + +- interrupt-names: Should contain interrupt names "macirq", "eth_lpi" + +Rest of the properties are defined in stmmac.txt file in same directory + + +Example: + +ethernet: ethernet@7a80000 { + compatible = "qcom,qcs404-ethqos"; + reg = <0x07a80000 0x10000>, + <0x07a96000 0x100>; + reg-names = "stmmaceth", "rgmii"; + clock-names = "stmmaceth", "pclk", "ptp_ref", "rgmii"; + clocks = <&gcc GCC_ETH_AXI_CLK>, + <&gcc GCC_ETH_SLAVE_AHB_CLK>, + <&gcc GCC_ETH_PTP_CLK>, + <&gcc GCC_ETH_RGMII_CLK>; + interrupts = <GIC_SPI 56 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 55 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "macirq", "eth_lpi"; + snps,reset-gpio = <&tlmm 60 GPIO_ACTIVE_LOW>; + snps,reset-active-low; + + snps,txpbl = <8>; + snps,rxpbl = <2>; + snps,aal; + snps,tso; + + phy-handle = <&phy1>; + phy-mode = "rgmii"; + + mdio { + #address-cells = <0x1>; + #size-cells = <0x0>; + compatible = "snps,dwmac-mdio"; + phy1: phy@4 { + device_type = "ethernet-phy"; + reg = <0x4>; + }; + }; + +}; diff --git a/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt b/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt index c5d0e7998e2b..8e7f8551d190 100644 --- a/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt +++ b/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt @@ -17,6 +17,8 @@ Clock Properties: - fsl,tmr-fiper1 Fixed interval period pulse generator. - fsl,tmr-fiper2 Fixed interval period pulse generator. - fsl,max-adj Maximum frequency adjustment in parts per billion. + - fsl,extts-fifo The presence of this property indicates hardware + support for the external trigger stamp FIFO. These properties set the operational parameters for the PTP clock. You must choose these carefully for the clock to work right. diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt index 0b8cc533ca83..cf759e5f9b10 100644 --- a/Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,glink.txt @@ -55,7 +55,7 @@ of these nodes are defined by the individual bindings for the specific function = EXAMPLE The following example represents the GLINK RPM node on a MSM8996 device, with the function for the "rpm_request" channel defined, which is used for -regualtors and root clocks. +regulators and root clocks. apcs_glb: mailbox@9820000 { compatible = "qcom,msm8996-apcs-hmss-global"; diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.txt index a35af2dafdad..49e1d72d3648 100644 --- a/Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.txt +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.txt @@ -41,12 +41,12 @@ processor ID) and a string identifier. - qcom,local-pid: Usage: required Value type: <u32> - Definition: specifies the identfier of the local endpoint of this edge + Definition: specifies the identifier of the local endpoint of this edge - qcom,remote-pid: Usage: required Value type: <u32> - Definition: specifies the identfier of the remote endpoint of this edge + Definition: specifies the identifier of the remote endpoint of this edge = SUBNODES Each SMP2P pair contain a set of inbound and outbound entries, these are diff --git a/Documentation/fb/fbcon.txt b/Documentation/fb/fbcon.txt index 62af30511a95..60a5ec04e8f0 100644 --- a/Documentation/fb/fbcon.txt +++ b/Documentation/fb/fbcon.txt @@ -163,6 +163,14 @@ C. Boot options be preserved until there actually is some text is output to the console. This option causes fbcon to bind immediately to the fbdev device. +7. fbcon=logo-pos:<location> + + The only possible 'location' is 'center' (without quotes), and when + given, the bootup logo is moved from the default top-left corner + location to the center of the framebuffer. If more than one logo is + displayed due to multiple CPUs, the collected line of logos is moved + as a whole. + C. Attaching, Detaching and Unloading Before going on to how to attach, detach and unload the framebuffer console, an diff --git a/Documentation/networking/devlink-params-mlxsw.txt b/Documentation/networking/devlink-params-mlxsw.txt new file mode 100644 index 000000000000..2c5c67a920c9 --- /dev/null +++ b/Documentation/networking/devlink-params-mlxsw.txt @@ -0,0 +1,2 @@ +fw_load_policy [DEVICE, GENERIC] + Configuration mode: driverinit diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt index 25170ad7d25b..1000b821681c 100644 --- a/Documentation/networking/dsa/dsa.txt +++ b/Documentation/networking/dsa/dsa.txt @@ -236,19 +236,6 @@ description. Design limitations ================== -DSA is a platform device driver -------------------------------- - -DSA is implemented as a DSA platform device driver which is convenient because -it will register the entire DSA switch tree attached to a master network device -in one-shot, facilitating the device creation and simplifying the device driver -model a bit, this comes however with a number of limitations: - -- building DSA and its switch drivers as modules is currently not working -- the device driver parenting does not necessarily reflect the original - bus/device the switch can be created from -- supporting non-MDIO and non-MMIO (platform) switches is not possible - Limits on the number of devices and ports ----------------------------------------- diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 6a47629ef8ed..f1627ca2a0ea 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -11,24 +11,25 @@ Contents: batman-adv can can_ucan_protocol - dpaa2/index - e100 - e1000 - e1000e - fm10k - igb - igbvf - ixgb - ixgbe - ixgbevf - i40e - iavf - ice + device_drivers/freescale/dpaa2/index + device_drivers/intel/e100 + device_drivers/intel/e1000 + device_drivers/intel/e1000e + device_drivers/intel/fm10k + device_drivers/intel/igb + device_drivers/intel/igbvf + device_drivers/intel/ixgb + device_drivers/intel/ixgbe + device_drivers/intel/ixgbevf + device_drivers/intel/i40e + device_drivers/intel/iavf + device_drivers/intel/ice kapi z8530book msg_zerocopy failover net_failover + phy alias bridge snmp_counter diff --git a/Documentation/networking/phy.rst b/Documentation/networking/phy.rst new file mode 100644 index 000000000000..0dd90d7df5ec --- /dev/null +++ b/Documentation/networking/phy.rst @@ -0,0 +1,447 @@ +===================== +PHY Abstraction Layer +===================== + +Purpose +======= + +Most network devices consist of set of registers which provide an interface +to a MAC layer, which communicates with the physical connection through a +PHY. The PHY concerns itself with negotiating link parameters with the link +partner on the other side of the network connection (typically, an ethernet +cable), and provides a register interface to allow drivers to determine what +settings were chosen, and to configure what settings are allowed. + +While these devices are distinct from the network devices, and conform to a +standard layout for the registers, it has been common practice to integrate +the PHY management code with the network driver. This has resulted in large +amounts of redundant code. Also, on embedded systems with multiple (and +sometimes quite different) ethernet controllers connected to the same +management bus, it is difficult to ensure safe use of the bus. + +Since the PHYs are devices, and the management busses through which they are +accessed are, in fact, busses, the PHY Abstraction Layer treats them as such. +In doing so, it has these goals: + +#. Increase code-reuse +#. Increase overall code-maintainability +#. Speed development time for new network drivers, and for new systems + +Basically, this layer is meant to provide an interface to PHY devices which +allows network driver writers to write as little code as possible, while +still providing a full feature set. + +The MDIO bus +============ + +Most network devices are connected to a PHY by means of a management bus. +Different devices use different busses (though some share common interfaces). +In order to take advantage of the PAL, each bus interface needs to be +registered as a distinct device. + +#. read and write functions must be implemented. Their prototypes are:: + + int write(struct mii_bus *bus, int mii_id, int regnum, u16 value); + int read(struct mii_bus *bus, int mii_id, int regnum); + + mii_id is the address on the bus for the PHY, and regnum is the register + number. These functions are guaranteed not to be called from interrupt + time, so it is safe for them to block, waiting for an interrupt to signal + the operation is complete + +#. A reset function is optional. This is used to return the bus to an + initialized state. + +#. A probe function is needed. This function should set up anything the bus + driver needs, setup the mii_bus structure, and register with the PAL using + mdiobus_register. Similarly, there's a remove function to undo all of + that (use mdiobus_unregister). + +#. Like any driver, the device_driver structure must be configured, and init + exit functions are used to register the driver. + +#. The bus must also be declared somewhere as a device, and registered. + +As an example for how one driver implemented an mdio bus driver, see +drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file +for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/") + +(RG)MII/electrical interface considerations +=========================================== + +The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin +electrical signal interface using a synchronous 125Mhz clock signal and several +data lines. Due to this design decision, a 1.5ns to 2ns delay must be added +between the clock line (RXC or TXC) and the data lines to let the PHY (clock +sink) have enough setup and hold times to sample the data lines correctly. The +PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let +the PHY driver and optionally the MAC driver, implement the required delay. The +values of phy_interface_t must be understood from the perspective of the PHY +device itself, leading to the following: + +* PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any + internal delay by itself, it assumes that either the Ethernet MAC (if capable + or the PCB traces) insert the correct 1.5-2ns delay + +* PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay + for the transmit data lines (TXD[3:0]) processed by the PHY device + +* PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay + for the receive data lines (RXD[3:0]) processed by the PHY device + +* PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for + both transmit AND receive data lines from/to the PHY device + +Whenever possible, use the PHY side RGMII delay for these reasons: + +* PHY devices may offer sub-nanosecond granularity in how they allow a + receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such + precision may be required to account for differences in PCB trace lengths + +* PHY devices are typically qualified for a large range of applications + (industrial, medical, automotive...), and they provide a constant and + reliable delay across temperature/pressure/voltage ranges + +* PHY device drivers in PHYLIB being reusable by nature, being able to + configure correctly a specified delay enables more designs with similar delay + requirements to be operate correctly + +For cases where the PHY is not capable of providing this delay, but the +Ethernet MAC driver is capable of doing so, the correct phy_interface_t value +should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be +configured correctly in order to provide the required transmit and/or receive +side delay from the perspective of the PHY device. Conversely, if the Ethernet +MAC driver looks at the phy_interface_t value, for any other mode but +PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are +disabled. + +In case neither the Ethernet MAC, nor the PHY are capable of providing the +required delays, as defined per the RGMII standard, several options may be +available: + +* Some SoCs may offer a pin pad/mux/controller capable of configuring a given + set of pins'strength, delays, and voltage; and it may be a suitable + option to insert the expected 2ns RGMII delay. + +* Modifying the PCB design to include a fixed delay (e.g: using a specifically + designed serpentine), which may not require software configuration at all. + +Common problems with RGMII delay mismatch +----------------------------------------- + +When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this +will most likely result in the clock and data line signals to be unstable when +the PHY or MAC take a snapshot of these signals to translate them into logical +1 or 0 states and reconstruct the data being transmitted/received. Typical +symptoms include: + +* Transmission/reception partially works, and there is frequent or occasional + packet loss observed + +* Ethernet MAC may report some or all packets ingressing with a FCS/CRC error, + or just discard them all + +* Switching to lower speeds such as 10/100Mbits/sec makes the problem go away + (since there is enough setup/hold time in that case) + +Connecting to a PHY +=================== + +Sometime during startup, the network driver needs to establish a connection +between the PHY device, and the network device. At this time, the PHY's bus +and drivers need to all have been loaded, so it is ready for the connection. +At this point, there are several ways to connect to the PHY: + +#. The PAL handles everything, and only calls the network driver when + the link state changes, so it can react. + +#. The PAL handles everything except interrupts (usually because the + controller has the interrupt registers). + +#. The PAL handles everything, but checks in with the driver every second, + allowing the network driver to react first to any changes before the PAL + does. + +#. The PAL serves only as a library of functions, with the network device + manually calling functions to update status, and configure the PHY + + +Letting the PHY Abstraction Layer do Everything +=============================================== + +If you choose option 1 (The hope is that every driver can, but to still be +useful to drivers that can't), connecting to the PHY is simple: + +First, you need a function to react to changes in the link state. This +function follows this protocol:: + + static void adjust_link(struct net_device *dev); + +Next, you need to know the device name of the PHY connected to this device. +The name will look something like, "0:00", where the first number is the +bus id, and the second is the PHY's address on that bus. Typically, +the bus is responsible for making its ID unique. + +Now, to connect, just call this function:: + + phydev = phy_connect(dev, phy_name, &adjust_link, interface); + +*phydev* is a pointer to the phy_device structure which represents the PHY. +If phy_connect is successful, it will return the pointer. dev, here, is the +pointer to your net_device. Once done, this function will have started the +PHY's software state machine, and registered for the PHY's interrupt, if it +has one. The phydev structure will be populated with information about the +current state, though the PHY will not yet be truly operational at this +point. + +PHY-specific flags should be set in phydev->dev_flags prior to the call +to phy_connect() such that the underlying PHY driver can check for flags +and perform specific operations based on them. +This is useful if the system has put hardware restrictions on +the PHY/controller, of which the PHY needs to be aware. + +*interface* is a u32 which specifies the connection type used +between the controller and the PHY. Examples are GMII, MII, +RGMII, and SGMII. For a full list, see include/linux/phy.h + +Now just make sure that phydev->supported and phydev->advertising have any +values pruned from them which don't make sense for your controller (a 10/100 +controller may be connected to a gigabit capable PHY, so you would need to +mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions +for these bitfields. Note that you should not SET any bits, except the +SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get +put into an unsupported state. + +Lastly, once the controller is ready to handle network traffic, you call +phy_start(phydev). This tells the PAL that you are ready, and configures the +PHY to connect to the network. If the MAC interrupt of your network driver +also handles PHY status changes, just set phydev->irq to PHY_IGNORE_INTERRUPT +before you call phy_start and use phy_mac_interrupt() from the network +driver. If you don't want to use interrupts, set phydev->irq to PHY_POLL. +phy_start() enables the PHY interrupts (if applicable) and starts the +phylib state machine. + +When you want to disconnect from the network (even if just briefly), you call +phy_stop(phydev). This function also stops the phylib state machine and +disables PHY interrupts. + +Pause frames / flow control +=========================== + +The PHY does not participate directly in flow control/pause frames except by +making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in +MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC +controller supports such a thing. Since flow control/pause frames generation +involves the Ethernet MAC driver, it is recommended that this driver takes care +of properly indicating advertisement and support for such features by setting +the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done +either before or after phy_connect() and/or as a result of implementing the +ethtool::set_pauseparam feature. + + +Keeping Close Tabs on the PAL +============================= + +It is possible that the PAL's built-in state machine needs a little help to +keep your network device and the PHY properly in sync. If so, you can +register a helper function when connecting to the PHY, which will be called +every second before the state machine reacts to any changes. To do this, you +need to manually call phy_attach() and phy_prepare_link(), and then call +phy_start_machine() with the second argument set to point to your special +handler. + +Currently there are no examples of how to use this functionality, and testing +on it has been limited because the author does not have any drivers which use +it (they all use option 1). So Caveat Emptor. + +Doing it all yourself +===================== + +There's a remote chance that the PAL's built-in state machine cannot track +the complex interactions between the PHY and your network device. If this is +so, you can simply call phy_attach(), and not call phy_start_machine or +phy_prepare_link(). This will mean that phydev->state is entirely yours to +handle (phy_start and phy_stop toggle between some of the states, so you +might need to avoid them). + +An effort has been made to make sure that useful functionality can be +accessed without the state-machine running, and most of these functions are +descended from functions which did not interact with a complex state-machine. +However, again, no effort has been made so far to test running without the +state machine, so tryer beware. + +Here is a brief rundown of the functions:: + + int phy_read(struct phy_device *phydev, u16 regnum); + int phy_write(struct phy_device *phydev, u16 regnum, u16 val); + +Simple read/write primitives. They invoke the bus's read/write function +pointers. +:: + + void phy_print_status(struct phy_device *phydev); + +A convenience function to print out the PHY status neatly. +:: + + void phy_request_interrupt(struct phy_device *phydev); + +Requests the IRQ for the PHY interrupts. +:: + + struct phy_device * phy_attach(struct net_device *dev, const char *phy_id, + phy_interface_t interface); + +Attaches a network device to a particular PHY, binding the PHY to a generic +driver if none was found during bus initialization. +:: + + int phy_start_aneg(struct phy_device *phydev); + +Using variables inside the phydev structure, either configures advertising +and resets autonegotiation, or disables autonegotiation, and configures +forced settings. +:: + + static inline int phy_read_status(struct phy_device *phydev); + +Fills the phydev structure with up-to-date information about the current +settings in the PHY. +:: + + int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd); + +Ethtool convenience functions. +:: + + int phy_mii_ioctl(struct phy_device *phydev, + struct mii_ioctl_data *mii_data, int cmd); + +The MII ioctl. Note that this function will completely screw up the state +machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to +use this only to write registers which are not standard, and don't set off +a renegotiation. + +PHY Device Drivers +================== + +With the PHY Abstraction Layer, adding support for new PHYs is +quite easy. In some cases, no work is required at all! However, +many PHYs require a little hand-holding to get up-and-running. + +Generic PHY driver +------------------ + +If the desired PHY doesn't have any errata, quirks, or special +features you want to support, then it may be best to not add +support, and let the PHY Abstraction Layer's Generic PHY Driver +do all of the work. + +Writing a PHY driver +-------------------- + +If you do need to write a PHY driver, the first thing to do is +make sure it can be matched with an appropriate PHY device. +This is done during bus initialization by reading the device's +UID (stored in registers 2 and 3), then comparing it to each +driver's phy_id field by ANDing it with each driver's +phy_id_mask field. Also, it needs a name. Here's an example:: + + static struct phy_driver dm9161_driver = { + .phy_id = 0x0181b880, + .name = "Davicom DM9161E", + .phy_id_mask = 0x0ffffff0, + ... + } + +Next, you need to specify what features (speed, duplex, autoneg, +etc) your PHY device and driver support. Most PHYs support +PHY_BASIC_FEATURES, but you can look in include/mii.h for other +features. + +Each driver consists of a number of function pointers, documented +in include/linux/phy.h under the phy_driver structure. + +Of these, only config_aneg and read_status are required to be +assigned by the driver code. The rest are optional. Also, it is +preferred to use the generic phy driver's versions of these two +functions if at all possible: genphy_read_status and +genphy_config_aneg. If this is not possible, it is likely that +you only need to perform some actions before and after invoking +these functions, and so your functions will wrap the generic +ones. + +Feel free to look at the Marvell, Cicada, and Davicom drivers in +drivers/net/phy/ for examples (the lxt and qsemi drivers have +not been tested as of this writing). + +The PHY's MMD register accesses are handled by the PAL framework +by default, but can be overridden by a specific PHY driver if +required. This could be the case if a PHY was released for +manufacturing before the MMD PHY register definitions were +standardized by the IEEE. Most modern PHYs will be able to use +the generic PAL framework for accessing the PHY's MMD registers. +An example of such usage is for Energy Efficient Ethernet support, +implemented in the PAL. This support uses the PAL to access MMD +registers for EEE query and configuration if the PHY supports +the IEEE standard access mechanisms, or can use the PHY's specific +access interfaces if overridden by the specific PHY driver. See +the Micrel driver in drivers/net/phy/ for an example of how this +can be implemented. + +Board Fixups +============ + +Sometimes the specific interaction between the platform and the PHY requires +special handling. For instance, to change where the PHY's clock input is, +or to add a delay to account for latency issues in the data path. In order +to support such contingencies, the PHY Layer allows platform code to register +fixups to be run when the PHY is brought up (or subsequently reset). + +When the PHY Layer brings up a PHY it checks to see if there are any fixups +registered for it, matching based on UID (contained in the PHY device's phy_id +field) and the bus identifier (contained in phydev->dev.bus_id). Both must +match, however two constants, PHY_ANY_ID and PHY_ANY_UID, are provided as +wildcards for the bus ID and UID, respectively. + +When a match is found, the PHY layer will invoke the run function associated +with the fixup. This function is passed a pointer to the phy_device of +interest. It should therefore only operate on that PHY. + +The platform code can either register the fixup using phy_register_fixup():: + + int phy_register_fixup(const char *phy_id, + u32 phy_uid, u32 phy_uid_mask, + int (*run)(struct phy_device *)); + +Or using one of the two stubs, phy_register_fixup_for_uid() and +phy_register_fixup_for_id():: + + int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask, + int (*run)(struct phy_device *)); + int phy_register_fixup_for_id(const char *phy_id, + int (*run)(struct phy_device *)); + +The stubs set one of the two matching criteria, and set the other one to +match anything. + +When phy_register_fixup() or \*_for_uid()/\*_for_id() is called at module, +unregister fixup and free allocate memory are required. + +Call one of following function before unloading module:: + + int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask); + int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask); + int phy_register_fixup_for_id(const char *phy_id); + +Standards +========= + +IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two: +http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf + +RGMII v1.3: +http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf + +RGMII v2.0: +http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt deleted file mode 100644 index bdec0f700bc1..000000000000 --- a/Documentation/networking/phy.txt +++ /dev/null @@ -1,427 +0,0 @@ - -------- -PHY Abstraction Layer -(Updated 2008-04-08) - -Purpose - - Most network devices consist of set of registers which provide an interface - to a MAC layer, which communicates with the physical connection through a - PHY. The PHY concerns itself with negotiating link parameters with the link - partner on the other side of the network connection (typically, an ethernet - cable), and provides a register interface to allow drivers to determine what - settings were chosen, and to configure what settings are allowed. - - While these devices are distinct from the network devices, and conform to a - standard layout for the registers, it has been common practice to integrate - the PHY management code with the network driver. This has resulted in large - amounts of redundant code. Also, on embedded systems with multiple (and - sometimes quite different) ethernet controllers connected to the same - management bus, it is difficult to ensure safe use of the bus. - - Since the PHYs are devices, and the management busses through which they are - accessed are, in fact, busses, the PHY Abstraction Layer treats them as such. - In doing so, it has these goals: - - 1) Increase code-reuse - 2) Increase overall code-maintainability - 3) Speed development time for new network drivers, and for new systems - - Basically, this layer is meant to provide an interface to PHY devices which - allows network driver writers to write as little code as possible, while - still providing a full feature set. - -The MDIO bus - - Most network devices are connected to a PHY by means of a management bus. - Different devices use different busses (though some share common interfaces). - In order to take advantage of the PAL, each bus interface needs to be - registered as a distinct device. - - 1) read and write functions must be implemented. Their prototypes are: - - int write(struct mii_bus *bus, int mii_id, int regnum, u16 value); - int read(struct mii_bus *bus, int mii_id, int regnum); - - mii_id is the address on the bus for the PHY, and regnum is the register - number. These functions are guaranteed not to be called from interrupt - time, so it is safe for them to block, waiting for an interrupt to signal - the operation is complete - - 2) A reset function is optional. This is used to return the bus to an - initialized state. - - 3) A probe function is needed. This function should set up anything the bus - driver needs, setup the mii_bus structure, and register with the PAL using - mdiobus_register. Similarly, there's a remove function to undo all of - that (use mdiobus_unregister). - - 4) Like any driver, the device_driver structure must be configured, and init - exit functions are used to register the driver. - - 5) The bus must also be declared somewhere as a device, and registered. - - As an example for how one driver implemented an mdio bus driver, see - drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file - for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/") - -(RG)MII/electrical interface considerations - - The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin - electrical signal interface using a synchronous 125Mhz clock signal and several - data lines. Due to this design decision, a 1.5ns to 2ns delay must be added - between the clock line (RXC or TXC) and the data lines to let the PHY (clock - sink) have enough setup and hold times to sample the data lines correctly. The - PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let - the PHY driver and optionally the MAC driver, implement the required delay. The - values of phy_interface_t must be understood from the perspective of the PHY - device itself, leading to the following: - - * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any - internal delay by itself, it assumes that either the Ethernet MAC (if capable - or the PCB traces) insert the correct 1.5-2ns delay - - * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay - for the transmit data lines (TXD[3:0]) processed by the PHY device - - * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay - for the receive data lines (RXD[3:0]) processed by the PHY device - - * PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for - both transmit AND receive data lines from/to the PHY device - - Whenever possible, use the PHY side RGMII delay for these reasons: - - * PHY devices may offer sub-nanosecond granularity in how they allow a - receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such - precision may be required to account for differences in PCB trace lengths - - * PHY devices are typically qualified for a large range of applications - (industrial, medical, automotive...), and they provide a constant and - reliable delay across temperature/pressure/voltage ranges - - * PHY device drivers in PHYLIB being reusable by nature, being able to - configure correctly a specified delay enables more designs with similar delay - requirements to be operate correctly - - For cases where the PHY is not capable of providing this delay, but the - Ethernet MAC driver is capable of doing so, the correct phy_interface_t value - should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be - configured correctly in order to provide the required transmit and/or receive - side delay from the perspective of the PHY device. Conversely, if the Ethernet - MAC driver looks at the phy_interface_t value, for any other mode but - PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are - disabled. - - In case neither the Ethernet MAC, nor the PHY are capable of providing the - required delays, as defined per the RGMII standard, several options may be - available: - - * Some SoCs may offer a pin pad/mux/controller capable of configuring a given - set of pins'strength, delays, and voltage; and it may be a suitable - option to insert the expected 2ns RGMII delay. - - * Modifying the PCB design to include a fixed delay (e.g: using a specifically - designed serpentine), which may not require software configuration at all. - -Common problems with RGMII delay mismatch - - When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this - will most likely result in the clock and data line signals to be unstable when - the PHY or MAC take a snapshot of these signals to translate them into logical - 1 or 0 states and reconstruct the data being transmitted/received. Typical - symptoms include: - - * Transmission/reception partially works, and there is frequent or occasional - packet loss observed - - * Ethernet MAC may report some or all packets ingressing with a FCS/CRC error, - or just discard them all - - * Switching to lower speeds such as 10/100Mbits/sec makes the problem go away - (since there is enough setup/hold time in that case) - - -Connecting to a PHY - - Sometime during startup, the network driver needs to establish a connection - between the PHY device, and the network device. At this time, the PHY's bus - and drivers need to all have been loaded, so it is ready for the connection. - At this point, there are several ways to connect to the PHY: - - 1) The PAL handles everything, and only calls the network driver when - the link state changes, so it can react. - - 2) The PAL handles everything except interrupts (usually because the - controller has the interrupt registers). - - 3) The PAL handles everything, but checks in with the driver every second, - allowing the network driver to react first to any changes before the PAL - does. - - 4) The PAL serves only as a library of functions, with the network device - manually calling functions to update status, and configure the PHY - - -Letting the PHY Abstraction Layer do Everything - - If you choose option 1 (The hope is that every driver can, but to still be - useful to drivers that can't), connecting to the PHY is simple: - - First, you need a function to react to changes in the link state. This - function follows this protocol: - - static void adjust_link(struct net_device *dev); - - Next, you need to know the device name of the PHY connected to this device. - The name will look something like, "0:00", where the first number is the - bus id, and the second is the PHY's address on that bus. Typically, - the bus is responsible for making its ID unique. - - Now, to connect, just call this function: - - phydev = phy_connect(dev, phy_name, &adjust_link, interface); - - phydev is a pointer to the phy_device structure which represents the PHY. If - phy_connect is successful, it will return the pointer. dev, here, is the - pointer to your net_device. Once done, this function will have started the - PHY's software state machine, and registered for the PHY's interrupt, if it - has one. The phydev structure will be populated with information about the - current state, though the PHY will not yet be truly operational at this - point. - - PHY-specific flags should be set in phydev->dev_flags prior to the call - to phy_connect() such that the underlying PHY driver can check for flags - and perform specific operations based on them. - This is useful if the system has put hardware restrictions on - the PHY/controller, of which the PHY needs to be aware. - - interface is a u32 which specifies the connection type used - between the controller and the PHY. Examples are GMII, MII, - RGMII, and SGMII. For a full list, see include/linux/phy.h - - Now just make sure that phydev->supported and phydev->advertising have any - values pruned from them which don't make sense for your controller (a 10/100 - controller may be connected to a gigabit capable PHY, so you would need to - mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions - for these bitfields. Note that you should not SET any bits, except the - SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get - put into an unsupported state. - - Lastly, once the controller is ready to handle network traffic, you call - phy_start(phydev). This tells the PAL that you are ready, and configures the - PHY to connect to the network. If you want to handle your own interrupts, - just set phydev->irq to PHY_IGNORE_INTERRUPT before you call phy_start. - Similarly, if you don't want to use interrupts, set phydev->irq to PHY_POLL. - - When you want to disconnect from the network (even if just briefly), you call - phy_stop(phydev). - -Pause frames / flow control - - The PHY does not participate directly in flow control/pause frames except by - making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in - MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC - controller supports such a thing. Since flow control/pause frames generation - involves the Ethernet MAC driver, it is recommended that this driver takes care - of properly indicating advertisement and support for such features by setting - the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done - either before or after phy_connect() and/or as a result of implementing the - ethtool::set_pauseparam feature. - - -Keeping Close Tabs on the PAL - - It is possible that the PAL's built-in state machine needs a little help to - keep your network device and the PHY properly in sync. If so, you can - register a helper function when connecting to the PHY, which will be called - every second before the state machine reacts to any changes. To do this, you - need to manually call phy_attach() and phy_prepare_link(), and then call - phy_start_machine() with the second argument set to point to your special - handler. - - Currently there are no examples of how to use this functionality, and testing - on it has been limited because the author does not have any drivers which use - it (they all use option 1). So Caveat Emptor. - -Doing it all yourself - - There's a remote chance that the PAL's built-in state machine cannot track - the complex interactions between the PHY and your network device. If this is - so, you can simply call phy_attach(), and not call phy_start_machine or - phy_prepare_link(). This will mean that phydev->state is entirely yours to - handle (phy_start and phy_stop toggle between some of the states, so you - might need to avoid them). - - An effort has been made to make sure that useful functionality can be - accessed without the state-machine running, and most of these functions are - descended from functions which did not interact with a complex state-machine. - However, again, no effort has been made so far to test running without the - state machine, so tryer beware. - - Here is a brief rundown of the functions: - - int phy_read(struct phy_device *phydev, u16 regnum); - int phy_write(struct phy_device *phydev, u16 regnum, u16 val); - - Simple read/write primitives. They invoke the bus's read/write function - pointers. - - void phy_print_status(struct phy_device *phydev); - - A convenience function to print out the PHY status neatly. - - int phy_start_interrupts(struct phy_device *phydev); - int phy_stop_interrupts(struct phy_device *phydev); - - Requests the IRQ for the PHY interrupts, then enables them for - start, or disables then frees them for stop. - - struct phy_device * phy_attach(struct net_device *dev, const char *phy_id, - phy_interface_t interface); - - Attaches a network device to a particular PHY, binding the PHY to a generic - driver if none was found during bus initialization. - - int phy_start_aneg(struct phy_device *phydev); - - Using variables inside the phydev structure, either configures advertising - and resets autonegotiation, or disables autonegotiation, and configures - forced settings. - - static inline int phy_read_status(struct phy_device *phydev); - - Fills the phydev structure with up-to-date information about the current - settings in the PHY. - - int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd); - - Ethtool convenience functions. - - int phy_mii_ioctl(struct phy_device *phydev, - struct mii_ioctl_data *mii_data, int cmd); - - The MII ioctl. Note that this function will completely screw up the state - machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to - use this only to write registers which are not standard, and don't set off - a renegotiation. - - -PHY Device Drivers - - With the PHY Abstraction Layer, adding support for new PHYs is - quite easy. In some cases, no work is required at all! However, - many PHYs require a little hand-holding to get up-and-running. - -Generic PHY driver - - If the desired PHY doesn't have any errata, quirks, or special - features you want to support, then it may be best to not add - support, and let the PHY Abstraction Layer's Generic PHY Driver - do all of the work. - -Writing a PHY driver - - If you do need to write a PHY driver, the first thing to do is - make sure it can be matched with an appropriate PHY device. - This is done during bus initialization by reading the device's - UID (stored in registers 2 and 3), then comparing it to each - driver's phy_id field by ANDing it with each driver's - phy_id_mask field. Also, it needs a name. Here's an example: - - static struct phy_driver dm9161_driver = { - .phy_id = 0x0181b880, - .name = "Davicom DM9161E", - .phy_id_mask = 0x0ffffff0, - ... - } - - Next, you need to specify what features (speed, duplex, autoneg, - etc) your PHY device and driver support. Most PHYs support - PHY_BASIC_FEATURES, but you can look in include/mii.h for other - features. - - Each driver consists of a number of function pointers, documented - in include/linux/phy.h under the phy_driver structure. - - Of these, only config_aneg and read_status are required to be - assigned by the driver code. The rest are optional. Also, it is - preferred to use the generic phy driver's versions of these two - functions if at all possible: genphy_read_status and - genphy_config_aneg. If this is not possible, it is likely that - you only need to perform some actions before and after invoking - these functions, and so your functions will wrap the generic - ones. - - Feel free to look at the Marvell, Cicada, and Davicom drivers in - drivers/net/phy/ for examples (the lxt and qsemi drivers have - not been tested as of this writing). - - The PHY's MMD register accesses are handled by the PAL framework - by default, but can be overridden by a specific PHY driver if - required. This could be the case if a PHY was released for - manufacturing before the MMD PHY register definitions were - standardized by the IEEE. Most modern PHYs will be able to use - the generic PAL framework for accessing the PHY's MMD registers. - An example of such usage is for Energy Efficient Ethernet support, - implemented in the PAL. This support uses the PAL to access MMD - registers for EEE query and configuration if the PHY supports - the IEEE standard access mechanisms, or can use the PHY's specific - access interfaces if overridden by the specific PHY driver. See - the Micrel driver in drivers/net/phy/ for an example of how this - can be implemented. - -Board Fixups - - Sometimes the specific interaction between the platform and the PHY requires - special handling. For instance, to change where the PHY's clock input is, - or to add a delay to account for latency issues in the data path. In order - to support such contingencies, the PHY Layer allows platform code to register - fixups to be run when the PHY is brought up (or subsequently reset). - - When the PHY Layer brings up a PHY it checks to see if there are any fixups - registered for it, matching based on UID (contained in the PHY device's phy_id - field) and the bus identifier (contained in phydev->dev.bus_id). Both must - match, however two constants, PHY_ANY_ID and PHY_ANY_UID, are provided as - wildcards for the bus ID and UID, respectively. - - When a match is found, the PHY layer will invoke the run function associated - with the fixup. This function is passed a pointer to the phy_device of - interest. It should therefore only operate on that PHY. - - The platform code can either register the fixup using phy_register_fixup(): - - int phy_register_fixup(const char *phy_id, - u32 phy_uid, u32 phy_uid_mask, - int (*run)(struct phy_device *)); - - Or using one of the two stubs, phy_register_fixup_for_uid() and - phy_register_fixup_for_id(): - - int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask, - int (*run)(struct phy_device *)); - int phy_register_fixup_for_id(const char *phy_id, - int (*run)(struct phy_device *)); - - The stubs set one of the two matching criteria, and set the other one to - match anything. - - When phy_register_fixup() or *_for_uid()/*_for_id() is called at module, - unregister fixup and free allocate memory are required. - - Call one of following function before unloading module. - - int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask); - int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask); - int phy_register_fixup_for_id(const char *phy_id); - -Standards - - IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two: - http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf - - RGMII v1.3: - http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf - - RGMII v2.0: - http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index c9d052e0cf51..2df5894353d6 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -1000,51 +1000,6 @@ The kernel interface functions are as follows: size should be set when the call is begun. tx_total_len may not be less than zero. - (*) Check to see the completion state of a call so that the caller can assess - whether it needs to be retried. - - enum rxrpc_call_completion { - RXRPC_CALL_SUCCEEDED, - RXRPC_CALL_REMOTELY_ABORTED, - RXRPC_CALL_LOCALLY_ABORTED, - RXRPC_CALL_LOCAL_ERROR, - RXRPC_CALL_NETWORK_ERROR, - }; - - int rxrpc_kernel_check_call(struct socket *sock, struct rxrpc_call *call, - enum rxrpc_call_completion *_compl, - u32 *_abort_code); - - On return, -EINPROGRESS will be returned if the call is still ongoing; if - it is finished, *_compl will be set to indicate the manner of completion, - *_abort_code will be set to any abort code that occurred. 0 will be - returned on a successful completion, -ECONNABORTED will be returned if the - client failed due to a remote abort and anything else will return an - appropriate error code. - - The caller should look at this information to decide if it's worth - retrying the call. - - (*) Retry a client call. - - int rxrpc_kernel_retry_call(struct socket *sock, - struct rxrpc_call *call, - struct sockaddr_rxrpc *srx, - struct key *key); - - This attempts to partially reinitialise a call and submit it again while - reusing the original call's Tx queue to avoid the need to repackage and - re-encrypt the data to be sent. call indicates the call to retry, srx the - new address to send it to and key the encryption key to use for signing or - encrypting the packets. - - For this to work, the first Tx data packet must still be in the transmit - queue, and currently this is only permitted for local and network errors - and the call must not have been aborted. Any partially constructed Tx - packet is left as is and can continue being filled afterwards. - - It returns 0 if the call was requeued and an error otherwise. - (*) Get call RTT. u64 rxrpc_kernel_get_rtt(struct socket *sock, struct rxrpc_call *call); diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst index 486ab33acc3a..c5642f430d2e 100644 --- a/Documentation/networking/snmp_counter.rst +++ b/Documentation/networking/snmp_counter.rst @@ -366,6 +366,27 @@ to the accept queue. TCP Fast Open ============= +* TcpEstabResets +Defined in `RFC1213 tcpEstabResets`_. + +.. _RFC1213 tcpEstabResets: https://tools.ietf.org/html/rfc1213#page-48 + +* TcpAttemptFails +Defined in `RFC1213 tcpAttemptFails`_. + +.. _RFC1213 tcpAttemptFails: https://tools.ietf.org/html/rfc1213#page-48 + +* TcpOutRsts +Defined in `RFC1213 tcpOutRsts`_. The RFC says this counter indicates +the 'segments sent containing the RST flag', but in linux kernel, this +couner indicates the segments kerenl tried to send. The sending +process might be failed due to some errors (e.g. memory alloc failed). + +.. _RFC1213 tcpOutRsts: https://tools.ietf.org/html/rfc1213#page-52 + + +TCP Fast Path +============ When kernel receives a TCP packet, it has two paths to handler the packet, one is fast path, another is slow path. The comment in kernel code provides a good explanation of them, I pasted them below:: @@ -413,7 +434,6 @@ increase 1. TCP abort ========= - * TcpExtTCPAbortOnData It means TCP layer has data in flight, but need to close the @@ -589,7 +609,6 @@ packet yet, the sender would know packet 4 is out of order. The TCP stack of kernel will increase TcpExtTCPSACKReorder for both of the above scenarios. - DSACK ===== The DSACK is defined in `RFC2883`_. The receiver uses DSACK to report @@ -612,8 +631,7 @@ The TCP stack receives an out of order duplicate packet, so it sends a DSACK to the sender. * TcpExtTCPDSACKRecv - -The TCP stack receives a DSACK, which indicate an acknowledged +The TCP stack receives a DSACK, which indicates an acknowledged duplicate packet is received. * TcpExtTCPDSACKOfoRecv @@ -621,6 +639,56 @@ duplicate packet is received. The TCP stack receives a DSACK, which indicate an out of order duplicate packet is received. +invalid SACK and DSACK +==================== +When a SACK (or DSACK) block is invalid, a corresponding counter would +be updated. The validation method is base on the start/end sequence +number of the SACK block. For more details, please refer the comment +of the function tcp_is_sackblock_valid in the kernel source code. A +SACK option could have up to 4 blocks, they are checked +individually. E.g., if 3 blocks of a SACk is invalid, the +corresponding counter would be updated 3 times. The comment of the +`Add counters for discarded SACK blocks`_ patch has additional +explaination: + +.. _Add counters for discarded SACK blocks: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18f02545a9a16c9a89778b91a162ad16d510bb32 + +* TcpExtTCPSACKDiscard +This counter indicates how many SACK blocks are invalid. If the invalid +SACK block is caused by ACK recording, the TCP stack will only ignore +it and won't update this counter. + +* TcpExtTCPDSACKIgnoredOld and TcpExtTCPDSACKIgnoredNoUndo +When a DSACK block is invalid, one of these two counters would be +updated. Which counter will be updated depends on the undo_marker flag +of the TCP socket. If the undo_marker is not set, the TCP stack isn't +likely to re-transmit any packets, and we still receive an invalid +DSACK block, the reason might be that the packet is duplicated in the +middle of the network. In such scenario, TcpExtTCPDSACKIgnoredNoUndo +will be updated. If the undo_marker is set, TcpExtTCPDSACKIgnoredOld +will be updated. As implied in its name, it might be an old packet. + +SACK shift +========= +The linux networking stack stores data in sk_buff struct (skb for +short). If a SACK block acrosses multiple skb, the TCP stack will try +to re-arrange data in these skb. E.g. if a SACK block acknowledges seq +10 to 15, skb1 has seq 10 to 13, skb2 has seq 14 to 20. The seq 14 and +15 in skb2 would be moved to skb1. This operation is 'shift'. If a +SACK block acknowledges seq 10 to 20, skb1 has seq 10 to 13, skb2 has +seq 14 to 20. All data in skb2 will be moved to skb1, and skb2 will be +discard, this operation is 'merge'. + +* TcpExtTCPSackShifted +A skb is shifted + +* TcpExtTCPSackMerged +A skb is merged + +* TcpExtTCPSackShiftFallback +A skb should be shifted or merged, but the TCP stack doesn't do it for +some reasons. + TCP out of order ================ * TcpExtTCPOFOQueue @@ -721,6 +789,60 @@ unacknowledged number (more strict than `RFC 5961 section 5.2`_). .. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9 .. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11 +TCP receive window +================= +* TcpExtTCPWantZeroWindowAdv +Depending on current memory usage, the TCP stack tries to set receive +window to zero. But the receive window might still be a no-zero +value. For example, if the previous window size is 10, and the TCP +stack receives 3 bytes, the current window size would be 7 even if the +window size calculated by the memory usage is zero. + +* TcpExtTCPToZeroWindowAdv +The TCP receive window is set to zero from a no-zero value. + +* TcpExtTCPFromZeroWindowAdv +The TCP receive window is set to no-zero value from zero. + + +Delayed ACK +========== +The TCP Delayed ACK is a technique which is used for reducing the +packet count in the network. For more details, please refer the +`Delayed ACK wiki`_ + +.. _Delayed ACK wiki: https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment + +* TcpExtDelayedACKs +A delayed ACK timer expires. The TCP stack will send a pure ACK packet +and exit the delayed ACK mode. + +* TcpExtDelayedACKLocked +A delayed ACK timer expires, but the TCP stack can't send an ACK +immediately due to the socket is locked by a userspace program. The +TCP stack will send a pure ACK later (after the userspace program +unlock the socket). When the TCP stack sends the pure ACK later, the +TCP stack will also update TcpExtDelayedACKs and exit the delayed ACK +mode. + +* TcpExtDelayedACKLost +It will be updated when the TCP stack receives a packet which has been +ACKed. A Delayed ACK loss might cause this issue, but it would also be +triggered by other reasons, such as a packet is duplicated in the +network. + +Tail Loss Probe (TLP) +=================== +TLP is an algorithm which is used to detect TCP packet loss. For more +details, please refer the `TLP paper`_. + +.. _TLP paper: https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 + +* TcpExtTCPLossProbes +A TLP probe packet is sent. + +* TcpExtTCPLossProbeRecovery +A packet loss is detected and recovered by TLP. examples ======== diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt index 82236a17b5e6..f3244d87512a 100644 --- a/Documentation/networking/switchdev.txt +++ b/Documentation/networking/switchdev.txt @@ -196,7 +196,7 @@ The switch device will learn/forget source MAC address/VLAN on ingress packets and notify the switch driver of the mac/vlan/port tuples. The switch driver, in turn, will notify the bridge driver using the switchdev notifier call: - err = call_switchdev_notifiers(val, dev, info); + err = call_switchdev_notifiers(val, dev, info, extack); Where val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when forgetting, and info points to a struct switchdev_notifier_fdb_info. On diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 1be0b6f9e0cb..9d1432e0aaa8 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -417,7 +417,7 @@ is again deprecated and ts[2] holds a hardware timestamp if set. Hardware time stamping must also be initialized for each device driver that is expected to do hardware time stamping. The parameter is defined in -/include/linux/net_tstamp.h as: +include/uapi/linux/net_tstamp.h as: struct hwtstamp_config { int flags; /* no flags defined right now, must be zero */ @@ -487,7 +487,7 @@ enum { HWTSTAMP_FILTER_PTP_V1_L4_EVENT, /* for the complete list of values, please check - * the include file /include/linux/net_tstamp.h + * the include file include/uapi/linux/net_tstamp.h */ }; diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt index 2793d4eac55f..bc0680706870 100644 --- a/Documentation/sysctl/net.txt +++ b/Documentation/sysctl/net.txt @@ -291,6 +291,20 @@ user space is responsible for creating them if needed. Default : 0 (for compatibility reasons) +devconf_inherit_init_net +---------------------------- + +Controls if a new network namespace should inherit all current +settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By +default, we keep the current behavior: for IPv4 we inherit all current +settings from init_net and for IPv6 we reset all settings to default. + +If set to 1, both IPv4 and IPv6 settings are forced to inherit from +current ones in init_net. If set to 2, both IPv4 and IPv6 settings are +forced to reset to their default values. + +Default : 0 (for compatibility reasons) + 2. /proc/sys/net/unix - Parameters for Unix domain sockets ------------------------------------------------------- |