diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-06-04 02:27:18 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-06-04 02:27:18 +0300 |
commit | cb8e59cc87201af93dfbb6c3dccc8fcad72a09c2 (patch) | |
tree | a334db9022f89654b777bbce8c4c6632e65b9031 /Documentation/networking/netdevices.rst | |
parent | 2e63f6ce7ed2c4ff83ba30ad9ccad422289a6c63 (diff) | |
parent | 065fcfd49763ec71ae345bb5c5a74f961031e70e (diff) | |
download | linux-cb8e59cc87201af93dfbb6c3dccc8fcad72a09c2.tar.xz |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:
1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
Augusto von Dentz.
2) Add GSO partial support to igc, from Sasha Neftin.
3) Several cleanups and improvements to r8169 from Heiner Kallweit.
4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
device self-test. From Andrew Lunn.
5) Start moving away from custom driver versions, use the globally
defined kernel version instead, from Leon Romanovsky.
6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.
7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.
8) Add sriov and vf support to hinic, from Luo bin.
9) Support Media Redundancy Protocol (MRP) in the bridging code, from
Horatiu Vultur.
10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.
11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
Dubroca. Also add ipv6 support for espintcp.
12) Lots of ReST conversions of the networking documentation, from Mauro
Carvalho Chehab.
13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
from Doug Berger.
14) Allow to dump cgroup id and filter by it in inet_diag code, from
Dmitry Yakunin.
15) Add infrastructure to export netlink attribute policies to
userspace, from Johannes Berg.
16) Several optimizations to sch_fq scheduler, from Eric Dumazet.
17) Fallback to the default qdisc if qdisc init fails because otherwise
a packet scheduler init failure will make a device inoperative. From
Jesper Dangaard Brouer.
18) Several RISCV bpf jit optimizations, from Luke Nelson.
19) Correct the return type of the ->ndo_start_xmit() method in several
drivers, it's netdev_tx_t but many drivers were using
'int'. From Yunjian Wang.
20) Add an ethtool interface for PHY master/slave config, from Oleksij
Rempel.
21) Add BPF iterators, from Yonghang Song.
22) Add cable test infrastructure, including ethool interfaces, from
Andrew Lunn. Marvell PHY driver is the first to support this
facility.
23) Remove zero-length arrays all over, from Gustavo A. R. Silva.
24) Calculate and maintain an explicit frame size in XDP, from Jesper
Dangaard Brouer.
25) Add CAP_BPF, from Alexei Starovoitov.
26) Support terse dumps in the packet scheduler, from Vlad Buslov.
27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.
28) Add devm_register_netdev(), from Bartosz Golaszewski.
29) Minimize qdisc resets, from Cong Wang.
30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
eliminate set_fs/get_fs calls. From Christoph Hellwig.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
selftests: net: ip_defrag: ignore EPERM
net_failover: fixed rollback in net_failover_open()
Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
vmxnet3: allow rx flow hash ops only when rss is enabled
hinic: add set_channels ethtool_ops support
selftests/bpf: Add a default $(CXX) value
tools/bpf: Don't use $(COMPILE.c)
bpf, selftests: Use bpf_probe_read_kernel
s390/bpf: Use bcr 0,%0 as tail call nop filler
s390/bpf: Maintain 8-byte stack alignment
selftests/bpf: Fix verifier test
selftests/bpf: Fix sample_cnt shared between two threads
bpf, selftests: Adapt cls_redirect to call csum_level helper
bpf: Add csum_level helper for fixing up csum levels
bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
crypto/chtls: IPv6 support for inline TLS
Crypto/chcr: Fixes a coccinile check error
Crypto/chcr: Fixes compilations warnings
...
Diffstat (limited to 'Documentation/networking/netdevices.rst')
-rw-r--r-- | Documentation/networking/netdevices.rst | 111 |
1 files changed, 111 insertions, 0 deletions
diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst new file mode 100644 index 000000000000..5a85fcc80c76 --- /dev/null +++ b/Documentation/networking/netdevices.rst @@ -0,0 +1,111 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================== +Network Devices, the Kernel, and You! +===================================== + + +Introduction +============ +The following is a random collection of documentation regarding +network devices. + +struct net_device allocation rules +================================== +Network device structures need to persist even after module is unloaded and +must be allocated with alloc_netdev_mqs() and friends. +If device has registered successfully, it will be freed on last use +by free_netdev(). This is required to handle the pathologic case cleanly +(example: rmmod mydriver </sys/class/net/myeth/mtu ) + +alloc_netdev_mqs()/alloc_netdev() reserve extra space for driver +private data which gets freed when the network device is freed. If +separately allocated data is attached to the network device +(netdev_priv(dev)) then it is up to the module exit handler to free that. + +MTU +=== +Each network device has a Maximum Transfer Unit. The MTU does not +include any link layer protocol overhead. Upper layer protocols must +not pass a socket buffer (skb) to a device to transmit with more data +than the mtu. The MTU does not include link layer header overhead, so +for example on Ethernet if the standard MTU is 1500 bytes used, the +actual skb will contain up to 1514 bytes because of the Ethernet +header. Devices should allow for the 4 byte VLAN header as well. + +Segmentation Offload (GSO, TSO) is an exception to this rule. The +upper layer protocol may pass a large socket buffer to the device +transmit routine, and the device will break that up into separate +packets based on the current MTU. + +MTU is symmetrical and applies both to receive and transmit. A device +must be able to receive at least the maximum size packet allowed by +the MTU. A network device may use the MTU as mechanism to size receive +buffers, but the device should allow packets with VLAN header. With +standard Ethernet mtu of 1500 bytes, the device should allow up to +1518 byte packets (1500 + 14 header + 4 tag). The device may either: +drop, truncate, or pass up oversize packets, but dropping oversize +packets is preferred. + + +struct net_device synchronization rules +======================================= +ndo_open: + Synchronization: rtnl_lock() semaphore. + Context: process + +ndo_stop: + Synchronization: rtnl_lock() semaphore. + Context: process + Note: netif_running() is guaranteed false + +ndo_do_ioctl: + Synchronization: rtnl_lock() semaphore. + Context: process + +ndo_get_stats: + Synchronization: dev_base_lock rwlock. + Context: nominally process, but don't sleep inside an rwlock + +ndo_start_xmit: + Synchronization: __netif_tx_lock spinlock. + + When the driver sets NETIF_F_LLTX in dev->features this will be + called without holding netif_tx_lock. In this case the driver + has to lock by itself when needed. + The locking there should also properly protect against + set_rx_mode. WARNING: use of NETIF_F_LLTX is deprecated. + Don't use it for new drivers. + + Context: Process with BHs disabled or BH (timer), + will be called with interrupts disabled by netconsole. + + Return codes: + + * NETDEV_TX_OK everything ok. + * NETDEV_TX_BUSY Cannot transmit packet, try later + Usually a bug, means queue start/stop flow control is broken in + the driver. Note: the driver must NOT put the skb in its DMA ring. + +ndo_tx_timeout: + Synchronization: netif_tx_lock spinlock; all TX queues frozen. + Context: BHs disabled + Notes: netif_queue_stopped() is guaranteed true + +ndo_set_rx_mode: + Synchronization: netif_addr_lock spinlock. + Context: BHs disabled + +struct napi_struct synchronization rules +======================================== +napi->poll: + Synchronization: + NAPI_STATE_SCHED bit in napi->state. Device + driver's ndo_stop method will invoke napi_disable() on + all NAPI instances which will do a sleeping poll on the + NAPI_STATE_SCHED napi->state bit, waiting for all pending + NAPI activity to cease. + + Context: + softirq + will be called with interrupts disabled by netconsole. |