diff options
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/bonding.txt | 31 | ||||
-rw-r--r-- | Documentation/networking/filter.txt | 12 | ||||
-rw-r--r-- | Documentation/networking/i40e.txt | 7 | ||||
-rw-r--r-- | Documentation/networking/ip-sysctl.txt | 38 | ||||
-rw-r--r-- | Documentation/networking/packet_mmap.txt | 18 | ||||
-rw-r--r-- | Documentation/networking/phy.txt | 18 | ||||
-rw-r--r-- | Documentation/networking/pktgen.txt | 28 | ||||
-rw-r--r-- | Documentation/networking/timestamping.txt | 16 | ||||
-rw-r--r-- | Documentation/networking/timestamping/timestamping.c | 7 |
9 files changed, 112 insertions, 63 deletions
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 9c723ecd0025..eeb5b2e97bed 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -542,10 +542,10 @@ mode XOR policy: Transmit based on the selected transmit hash policy. The default policy is a simple [(source - MAC address XOR'd with destination MAC address) modulo - slave count]. Alternate transmit policies may be - selected via the xmit_hash_policy option, described - below. + MAC address XOR'd with destination MAC address XOR + packet type ID) modulo slave count]. Alternate transmit + policies may be selected via the xmit_hash_policy option, + described below. This mode provides load balancing and fault tolerance. @@ -801,10 +801,11 @@ xmit_hash_policy layer2 - Uses XOR of hardware MAC addresses to generate the - hash. The formula is + Uses XOR of hardware MAC addresses and packet type ID + field to generate the hash. The formula is - (source MAC XOR destination MAC) modulo slave count + hash = source MAC XOR destination MAC XOR packet type ID + slave number = hash modulo slave count This algorithm will place all traffic to a particular network peer on the same slave. @@ -819,7 +820,7 @@ xmit_hash_policy Uses XOR of hardware MAC addresses and IP addresses to generate the hash. The formula is - hash = source MAC XOR destination MAC + hash = source MAC XOR destination MAC XOR packet type ID hash = hash XOR source IP XOR destination IP hash = hash XOR (hash RSHIFT 16) hash = hash XOR (hash RSHIFT 8) @@ -2301,13 +2302,13 @@ broadcast: Like active-backup, there is not much advantage to this bandwidth. Additionally, the linux bonding 802.3ad implementation - distributes traffic by peer (using an XOR of MAC addresses), - so in a "gatewayed" configuration, all outgoing traffic will - generally use the same device. Incoming traffic may also end - up on a single device, but that is dependent upon the - balancing policy of the peer's 8023.ad implementation. In a - "local" configuration, traffic will be distributed across the - devices in the bond. + distributes traffic by peer (using an XOR of MAC addresses + and packet type ID), so in a "gatewayed" configuration, all + outgoing traffic will generally use the same device. Incoming + traffic may also end up on a single device, but that is + dependent upon the balancing policy of the peer's 8023.ad + implementation. In a "local" configuration, traffic will be + distributed across the devices in the bond. Finally, the 802.3ad mode mandates the use of the MII monitor, therefore, the ARP monitor is not available in this mode. diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index ee78eba78a9d..c48a9704bda8 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -586,12 +586,12 @@ team driver's classifier for its load-balancing mode, netfilter's xt_bpf extension, PTP dissector/classifier, and much more. They are all internally converted by the kernel into the new instruction set representation and run in the eBPF interpreter. For in-kernel handlers, this all works transparently -by using sk_unattached_filter_create() for setting up the filter, resp. -sk_unattached_filter_destroy() for destroying it. The macro -SK_RUN_FILTER(filter, ctx) transparently invokes eBPF interpreter or JITed -code to run the filter. 'filter' is a pointer to struct sk_filter that we -got from sk_unattached_filter_create(), and 'ctx' the given context (e.g. -skb pointer). All constraints and restrictions from sk_chk_filter() apply +by using bpf_prog_create() for setting up the filter, resp. +bpf_prog_destroy() for destroying it. The macro +BPF_PROG_RUN(filter, ctx) transparently invokes eBPF interpreter or JITed +code to run the filter. 'filter' is a pointer to struct bpf_prog that we +got from bpf_prog_create(), and 'ctx' the given context (e.g. +skb pointer). All constraints and restrictions from bpf_check_classic() apply before a conversion to the new layout is being done behind the scenes! Currently, the classic BPF format is being used for JITing on most of the diff --git a/Documentation/networking/i40e.txt b/Documentation/networking/i40e.txt index f737273c6dc1..a251bf4fe9c9 100644 --- a/Documentation/networking/i40e.txt +++ b/Documentation/networking/i40e.txt @@ -69,8 +69,11 @@ Additional Configurations FCoE ---- - Fiber Channel over Ethernet (FCoE) hardware offload is not currently - supported. + The driver supports Fiber Channel over Ethernet (FCoE) and Data Center + Bridging (DCB) functionality. Configuring DCB and FCoE is outside the scope + of this driver doc. Refer to http://www.open-fcoe.org/ for FCoE project + information and http://www.open-lldp.org/ or email list + e1000-eedc@lists.sourceforge.net for DCB information. MAC and VLAN anti-spoofing feature ---------------------------------- diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ab42c95f9985..29a93518bf18 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -101,19 +101,17 @@ ipfrag_high_thresh - INTEGER Maximum memory used to reassemble IP fragments. When ipfrag_high_thresh bytes of memory is allocated for this purpose, the fragment handler will toss packets until ipfrag_low_thresh - is reached. + is reached. This also serves as a maximum limit to namespaces + different from the initial one. ipfrag_low_thresh - INTEGER - See ipfrag_high_thresh + Maximum memory used to reassemble IP fragments before the kernel + begins to remove incomplete fragment queues to free up resources. + The kernel still accepts new fragments for defragmentation. ipfrag_time - INTEGER Time in seconds to keep an IP fragment in memory. -ipfrag_secret_interval - INTEGER - Regeneration interval (in seconds) of the hash secret (or lifetime - for the hash secret) for IP fragments. - Default: 600 - ipfrag_max_dist - INTEGER ipfrag_max_dist is a non-negative integer value which defines the maximum "disorder" which is allowed among fragments which share a @@ -1132,6 +1130,15 @@ flowlabel_consistency - BOOLEAN FALSE: disabled Default: TRUE +auto_flowlabels - BOOLEAN + Automatically generate flow labels based based on a flow hash + of the packet. This allows intermediate devices, such as routers, + to idenfify packet flows for mechanisms like Equal Cost Multipath + Routing (see RFC 6438). + TRUE: enabled + FALSE: disabled + Default: false + anycast_src_echo_reply - BOOLEAN Controls the use of anycast addresses as source addresses for ICMPv6 echo reply @@ -1153,11 +1160,6 @@ ip6frag_low_thresh - INTEGER ip6frag_time - INTEGER Time in seconds to keep an IPv6 fragment in memory. -ip6frag_secret_interval - INTEGER - Regeneration interval (in seconds) of the hash secret (or lifetime - for the hash secret) for IPv6 fragments. - Default: 600 - conf/default/*: Change the interface-specific default settings. @@ -1210,6 +1212,18 @@ accept_ra_defrtr - BOOLEAN Functional default: enabled if accept_ra is enabled. disabled if accept_ra is disabled. +accept_ra_from_local - BOOLEAN + Accept RA with source-address that is found on local machine + if the RA is otherwise proper and able to be accepted. + Default is to NOT accept these as it may be an un-intended + network loop. + + Functional default: + enabled if accept_ra_from_local is enabled + on a specific interface. + disabled if accept_ra_from_local is disabled + on a specific interface. + accept_ra_pinfo - BOOLEAN Learn Prefix Information in Router Advertisement. diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index 38112d512f47..a6d7cb91069e 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt @@ -1008,14 +1008,9 @@ hardware timestamps to be used. Note: you may need to enable the generation of hardware timestamps with SIOCSHWTSTAMP (see related information from Documentation/networking/timestamping.txt). -PACKET_TIMESTAMP accepts the same integer bit field as -SO_TIMESTAMPING. However, only the SOF_TIMESTAMPING_SYS_HARDWARE -and SOF_TIMESTAMPING_RAW_HARDWARE values are recognized by -PACKET_TIMESTAMP. SOF_TIMESTAMPING_SYS_HARDWARE takes precedence over -SOF_TIMESTAMPING_RAW_HARDWARE if both bits are set. - - int req = 0; - req |= SOF_TIMESTAMPING_SYS_HARDWARE; +PACKET_TIMESTAMP accepts the same integer bit field as SO_TIMESTAMPING: + + int req = SOF_TIMESTAMPING_RAW_HARDWARE; setsockopt(fd, SOL_PACKET, PACKET_TIMESTAMP, (void *) &req, sizeof(req)) For the mmap(2)ed ring buffers, such timestamps are stored in the @@ -1023,14 +1018,13 @@ tpacket{,2,3}_hdr structure's tp_sec and tp_{n,u}sec members. To determine what kind of timestamp has been reported, the tp_status field is binary |'ed with the following possible bits ... - TP_STATUS_TS_SYS_HARDWARE TP_STATUS_TS_RAW_HARDWARE TP_STATUS_TS_SOFTWARE ... that are equivalent to its SOF_TIMESTAMPING_* counterparts. For the -RX_RING, if none of those 3 are set (i.e. PACKET_TIMESTAMP is not set), -then this means that a software fallback was invoked *within* PF_PACKET's -processing code (less precise). +RX_RING, if neither is set (i.e. PACKET_TIMESTAMP is not set), then a +software fallback was invoked *within* PF_PACKET's processing code (less +precise). Getting timestamps for the TX_RING works as follows: i) fill the ring frames, ii) call sendto() e.g. in blocking mode, iii) wait for status of relevant diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index 3544c98401fd..e839e7efc835 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -272,6 +272,8 @@ Writing a PHY driver txtsamp: Requests a transmit timestamp at the PHY level for a 'skb' set_wol: Enable Wake-on-LAN at the PHY level get_wol: Get the Wake-on-LAN status at the PHY level + read_mmd_indirect: Read PHY MMD indirect register + write_mmd_indirect: Write PHY MMD indirect register Of these, only config_aneg and read_status are required to be assigned by the driver code. The rest are optional. Also, it is @@ -284,7 +286,21 @@ Writing a PHY driver Feel free to look at the Marvell, Cicada, and Davicom drivers in drivers/net/phy/ for examples (the lxt and qsemi drivers have - not been tested as of this writing) + not been tested as of this writing). + + The PHY's MMD register accesses are handled by the PAL framework + by default, but can be overridden by a specific PHY driver if + required. This could be the case if a PHY was released for + manufacturing before the MMD PHY register definitions were + standardized by the IEEE. Most modern PHYs will be able to use + the generic PAL framework for accessing the PHY's MMD registers. + An example of such usage is for Energy Efficient Ethernet support, + implemented in the PAL. This support uses the PAL to access MMD + registers for EEE query and configuration if the PHY supports + the IEEE standard access mechanisms, or can use the PHY's specific + access interfaces if overridden by the specific PHY driver. See + the Micrel driver in drivers/net/phy/ for an example of how this + can be implemented. Board Fixups diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt index 0e30c7845b2b..0dffc6e37902 100644 --- a/Documentation/networking/pktgen.txt +++ b/Documentation/networking/pktgen.txt @@ -24,6 +24,34 @@ For monitoring and control pktgen creates: /proc/net/pktgen/ethX +Tuning NIC for max performance +============================== + +The default NIC setting are (likely) not tuned for pktgen's artificial +overload type of benchmarking, as this could hurt the normal use-case. + +Specifically increasing the TX ring buffer in the NIC: + # ethtool -G ethX tx 1024 + +A larger TX ring can improve pktgen's performance, while it can hurt +in the general case, 1) because the TX ring buffer might get larger +than the CPUs L1/L2 cache, 2) because it allow more queueing in the +NIC HW layer (which is bad for bufferbloat). + +One should be careful to conclude, that packets/descriptors in the HW +TX ring cause delay. Drivers usually delay cleaning up the +ring-buffers (for various performance reasons), thus packets stalling +the TX ring, might just be waiting for cleanup. + +This cleanup issues is specifically the case, for the driver ixgbe +(Intel 82599 chip). This driver (ixgbe) combine TX+RX ring cleanups, +and the cleanup interval is affected by the ethtool --coalesce setting +of parameter "rx-usecs". + +For ixgbe use e.g "30" resulting in approx 33K interrupts/sec (1/30*10^6): + # ethtool -C ethX rx-usecs 30 + + Viewing threads =============== /proc/net/pktgen/kpktgend_0 diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index bc3554124903..897f942b976b 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -40,7 +40,7 @@ the set bits correspond to data that is available, then the control message will not be generated: SOF_TIMESTAMPING_SOFTWARE: report systime if available -SOF_TIMESTAMPING_SYS_HARDWARE: report hwtimetrans if available +SOF_TIMESTAMPING_SYS_HARDWARE: report hwtimetrans if available (deprecated) SOF_TIMESTAMPING_RAW_HARDWARE: report hwtimeraw if available It is worth noting that timestamps may be collected for reasons other @@ -88,13 +88,12 @@ hwtimeraw is the original hardware time stamp. Filled in if SOF_TIMESTAMPING_RAW_HARDWARE is set. No assumptions about its relation to system time should be made. -hwtimetrans is the hardware time stamp transformed so that it -corresponds as good as possible to system time. This correlation is -not perfect; as a consequence, sorting packets received via different -NICs by their hwtimetrans may differ from the order in which they were -received. hwtimetrans may be non-monotonic even for the same NIC. -Filled in if SOF_TIMESTAMPING_SYS_HARDWARE is set. Requires support -by the network device and will be empty without that support. +hwtimetrans is always zero. This field is deprecated. It used to hold +hw timestamps converted to system time. Instead, expose the hardware +clock device on the NIC directly as a HW PTP clock source, to allow +time conversion in userspace and optionally synchronize system time +with a userspace PTP stack such as linuxptp. For the PTP clock API, +see Documentation/ptp/ptp.txt. SIOCSHWTSTAMP, SIOCGHWTSTAMP: @@ -185,7 +184,6 @@ struct skb_shared_hwtstamps { * since arbitrary point in time */ ktime_t hwtstamp; - ktime_t syststamp; /* hwtstamp transformed to system time base */ }; Time stamps for outgoing packets are to be generated as follows: diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c index 8ba82bfe6a33..5cdfd743447b 100644 --- a/Documentation/networking/timestamping/timestamping.c +++ b/Documentation/networking/timestamping/timestamping.c @@ -76,7 +76,6 @@ static void usage(const char *error) " SOF_TIMESTAMPING_RX_HARDWARE - hardware time stamping of incoming packets\n" " SOF_TIMESTAMPING_RX_SOFTWARE - software fallback for incoming packets\n" " SOF_TIMESTAMPING_SOFTWARE - request reporting of software time stamps\n" - " SOF_TIMESTAMPING_SYS_HARDWARE - request reporting of transformed HW time stamps\n" " SOF_TIMESTAMPING_RAW_HARDWARE - request reporting of raw HW time stamps\n" " SIOCGSTAMP - check last socket time stamp\n" " SIOCGSTAMPNS - more accurate socket time stamp\n"); @@ -202,9 +201,7 @@ static void printpacket(struct msghdr *msg, int res, (long)stamp->tv_sec, (long)stamp->tv_nsec); stamp++; - printf("HW transformed %ld.%09ld ", - (long)stamp->tv_sec, - (long)stamp->tv_nsec); + /* skip deprecated HW transformed */ stamp++; printf("HW raw %ld.%09ld", (long)stamp->tv_sec, @@ -361,8 +358,6 @@ int main(int argc, char **argv) so_timestamping_flags |= SOF_TIMESTAMPING_RX_SOFTWARE; else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SOFTWARE")) so_timestamping_flags |= SOF_TIMESTAMPING_SOFTWARE; - else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SYS_HARDWARE")) - so_timestamping_flags |= SOF_TIMESTAMPING_SYS_HARDWARE; else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RAW_HARDWARE")) so_timestamping_flags |= SOF_TIMESTAMPING_RAW_HARDWARE; else |