diff options
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/altera_tse.txt | 263 | ||||
-rw-r--r-- | Documentation/networking/bonding.txt | 96 | ||||
-rw-r--r-- | Documentation/networking/can.txt | 2 | ||||
-rw-r--r-- | Documentation/networking/filter.txt | 127 | ||||
-rw-r--r-- | Documentation/networking/gianfar.txt | 30 | ||||
-rw-r--r-- | Documentation/networking/igb.txt | 48 | ||||
-rw-r--r-- | Documentation/networking/packet_mmap.txt | 2 | ||||
-rw-r--r-- | Documentation/networking/phy.txt | 11 | ||||
-rw-r--r-- | Documentation/networking/pktgen.txt | 24 | ||||
-rw-r--r-- | Documentation/networking/rxrpc.txt | 81 | ||||
-rw-r--r-- | Documentation/networking/scaling.txt | 2 | ||||
-rw-r--r-- | Documentation/networking/tcp.txt | 2 | ||||
-rw-r--r-- | Documentation/networking/timestamping.txt | 6 |
13 files changed, 572 insertions, 122 deletions
diff --git a/Documentation/networking/altera_tse.txt b/Documentation/networking/altera_tse.txt new file mode 100644 index 000000000000..3f24df8c6e65 --- /dev/null +++ b/Documentation/networking/altera_tse.txt @@ -0,0 +1,263 @@ + Altera Triple-Speed Ethernet MAC driver + +Copyright (C) 2008-2014 Altera Corporation + +This is the driver for the Altera Triple-Speed Ethernet (TSE) controllers +using the SGDMA and MSGDMA soft DMA IP components. The driver uses the +platform bus to obtain component resources. The designs used to test this +driver were built for a Cyclone(R) V SOC FPGA board, a Cyclone(R) V FPGA board, +and tested with ARM and NIOS processor hosts seperately. The anticipated use +cases are simple communications between an embedded system and an external peer +for status and simple configuration of the embedded system. + +For more information visit www.altera.com and www.rocketboards.org. Support +forums for the driver may be found on www.rocketboards.org, and a design used +to test this driver may be found there as well. Support is also available from +the maintainer of this driver, found in MAINTAINERS. + +The Triple-Speed Ethernet, SGDMA, and MSGDMA components are all soft IP +components that can be assembled and built into an FPGA using the Altera +Quartus toolchain. Quartus 13.1 and 14.0 were used to build the design that +this driver was tested against. The sopc2dts tool is used to create the +device tree for the driver, and may be found at rocketboards.org. + +The driver probe function examines the device tree and determines if the +Triple-Speed Ethernet instance is using an SGDMA or MSGDMA component. The +probe function then installs the appropriate set of DMA routines to +initialize, setup transmits, receives, and interrupt handling primitives for +the respective configurations. + +The SGDMA component is to be deprecated in the near future (over the next 1-2 +years as of this writing in early 2014) in favor of the MSGDMA component. +SGDMA support is included for existing designs and reference in case a +developer wishes to support their own soft DMA logic and driver support. Any +new designs should not use the SGDMA. + +The SGDMA supports only a single transmit or receive operation at a time, and +therefore will not perform as well compared to the MSGDMA soft IP. Please +visit www.altera.com for known, documented SGDMA errata. + +Scatter-gather DMA is not supported by the SGDMA or MSGDMA at this time. +Scatter-gather DMA will be added to a future maintenance update to this +driver. + +Jumbo frames are not supported at this time. + +The driver limits PHY operations to 10/100Mbps, and has not yet been fully +tested for 1Gbps. This support will be added in a future maintenance update. + +1) Kernel Configuration +The kernel configuration option is ALTERA_TSE: + Device Drivers ---> Network device support ---> Ethernet driver support ---> + Altera Triple-Speed Ethernet MAC support (ALTERA_TSE) + +2) Driver parameters list: + debug: message level (0: no output, 16: all); + dma_rx_num: Number of descriptors in the RX list (default is 64); + dma_tx_num: Number of descriptors in the TX list (default is 64). + +3) Command line options +Driver parameters can be also passed in command line by using: + altera_tse=dma_rx_num:128,dma_tx_num:512 + +4) Driver information and notes + +4.1) Transmit process +When the driver's transmit routine is called by the kernel, it sets up a +transmit descriptor by calling the underlying DMA transmit routine (SGDMA or +MSGDMA), and initites a transmit operation. Once the transmit is complete, an +interrupt is driven by the transmit DMA logic. The driver handles the transmit +completion in the context of the interrupt handling chain by recycling +resource required to send and track the requested transmit operation. + +4.2) Receive process +The driver will post receive buffers to the receive DMA logic during driver +intialization. Receive buffers may or may not be queued depending upon the +underlying DMA logic (MSGDMA is able queue receive buffers, SGDMA is not able +to queue receive buffers to the SGDMA receive logic). When a packet is +received, the DMA logic generates an interrupt. The driver handles a receive +interrupt by obtaining the DMA receive logic status, reaping receive +completions until no more receive completions are available. + +4.3) Interrupt Mitigation +The driver is able to mitigate the number of its DMA interrupts +using NAPI for receive operations. Interrupt mitigation is not yet supported +for transmit operations, but will be added in a future maintenance release. + +4.4) Ethtool support +Ethtool is supported. Driver statistics and internal errors can be taken using: +ethtool -S ethX command. It is possible to dump registers etc. + +4.5) PHY Support +The driver is compatible with PAL to work with PHY and GPHY devices. + +4.7) List of source files: + o Kconfig + o Makefile + o altera_tse_main.c: main network device driver + o altera_tse_ethtool.c: ethtool support + o altera_tse.h: private driver structure and common definitions + o altera_msgdma.h: MSGDMA implementation function definitions + o altera_sgdma.h: SGDMA implementation function definitions + o altera_msgdma.c: MSGDMA implementation + o altera_sgdma.c: SGDMA implementation + o altera_sgdmahw.h: SGDMA register and descriptor definitions + o altera_msgdmahw.h: MSGDMA register and descriptor definitions + o altera_utils.c: Driver utility functions + o altera_utils.h: Driver utility function definitions + +5) Debug Information + +The driver exports debug information such as internal statistics, +debug information, MAC and DMA registers etc. + +A user may use the ethtool support to get statistics: +e.g. using: ethtool -S ethX (that shows the statistics counters) +or sees the MAC registers: e.g. using: ethtool -d ethX + +The developer can also use the "debug" module parameter to get +further debug information. + +6) Statistics Support + +The controller and driver support a mix of IEEE standard defined statistics, +RFC defined statistics, and driver or Altera defined statistics. The four +specifications containing the standard definitions for these statistics are +as follows: + + o IEEE 802.3-2012 - IEEE Standard for Ethernet. + o RFC 2863 found at http://www.rfc-editor.org/rfc/rfc2863.txt. + o RFC 2819 found at http://www.rfc-editor.org/rfc/rfc2819.txt. + o Altera Triple Speed Ethernet User Guide, found at http://www.altera.com + +The statistics supported by the TSE and the device driver are as follows: + +"tx_packets" is equivalent to aFramesTransmittedOK defined in IEEE 802.3-2012, +Section 5.2.2.1.2. This statistics is the count of frames that are successfully +transmitted. + +"rx_packets" is equivalent to aFramesReceivedOK defined in IEEE 802.3-2012, +Section 5.2.2.1.5. This statistic is the count of frames that are successfully +received. This count does not include any error packets such as CRC errors, +length errors, or alignment errors. + +"rx_crc_errors" is equivalent to aFrameCheckSequenceErrors defined in IEEE +802.3-2012, Section 5.2.2.1.6. This statistic is the count of frames that are +an integral number of bytes in length and do not pass the CRC test as the frame +is received. + +"rx_align_errors" is equivalent to aAlignmentErrors defined in IEEE 802.3-2012, +Section 5.2.2.1.7. This statistic is the count of frames that are not an +integral number of bytes in length and do not pass the CRC test as the frame is +received. + +"tx_bytes" is equivalent to aOctetsTransmittedOK defined in IEEE 802.3-2012, +Section 5.2.2.1.8. This statistic is the count of data and pad bytes +successfully transmitted from the interface. + +"rx_bytes" is equivalent to aOctetsReceivedOK defined in IEEE 802.3-2012, +Section 5.2.2.1.14. This statistic is the count of data and pad bytes +successfully received by the controller. + +"tx_pause" is equivalent to aPAUSEMACCtrlFramesTransmitted defined in IEEE +802.3-2012, Section 30.3.4.2. This statistic is a count of PAUSE frames +transmitted from the network controller. + +"rx_pause" is equivalent to aPAUSEMACCtrlFramesReceived defined in IEEE +802.3-2012, Section 30.3.4.3. This statistic is a count of PAUSE frames +received by the network controller. + +"rx_errors" is equivalent to ifInErrors defined in RFC 2863. This statistic is +a count of the number of packets received containing errors that prevented the +packet from being delivered to a higher level protocol. + +"tx_errors" is equivalent to ifOutErrors defined in RFC 2863. This statistic +is a count of the number of packets that could not be transmitted due to errors. + +"rx_unicast" is equivalent to ifInUcastPkts defined in RFC 2863. This +statistic is a count of the number of packets received that were not addressed +to the broadcast address or a multicast group. + +"rx_multicast" is equivalent to ifInMulticastPkts defined in RFC 2863. This +statistic is a count of the number of packets received that were addressed to +a multicast address group. + +"rx_broadcast" is equivalent to ifInBroadcastPkts defined in RFC 2863. This +statistic is a count of the number of packets received that were addressed to +the broadcast address. + +"tx_discards" is equivalent to ifOutDiscards defined in RFC 2863. This +statistic is the number of outbound packets not transmitted even though an +error was not detected. An example of a reason this might occur is to free up +internal buffer space. + +"tx_unicast" is equivalent to ifOutUcastPkts defined in RFC 2863. This +statistic counts the number of packets transmitted that were not addressed to +a multicast group or broadcast address. + +"tx_multicast" is equivalent to ifOutMulticastPkts defined in RFC 2863. This +statistic counts the number of packets transmitted that were addressed to a +multicast group. + +"tx_broadcast" is equivalent to ifOutBroadcastPkts defined in RFC 2863. This +statistic counts the number of packets transmitted that were addressed to a +broadcast address. + +"ether_drops" is equivalent to etherStatsDropEvents defined in RFC 2819. +This statistic counts the number of packets dropped due to lack of internal +controller resources. + +"rx_total_bytes" is equivalent to etherStatsOctets defined in RFC 2819. +This statistic counts the total number of bytes received by the controller, +including error and discarded packets. + +"rx_total_packets" is equivalent to etherStatsPkts defined in RFC 2819. +This statistic counts the total number of packets received by the controller, +including error, discarded, unicast, multicast, and broadcast packets. + +"rx_undersize" is equivalent to etherStatsUndersizePkts defined in RFC 2819. +This statistic counts the number of correctly formed packets received less +than 64 bytes long. + +"rx_oversize" is equivalent to etherStatsOversizePkts defined in RFC 2819. +This statistic counts the number of correctly formed packets greater than 1518 +bytes long. + +"rx_64_bytes" is equivalent to etherStatsPkts64Octets defined in RFC 2819. +This statistic counts the total number of packets received that were 64 octets +in length. + +"rx_65_127_bytes" is equivalent to etherStatsPkts65to127Octets defined in RFC +2819. This statistic counts the total number of packets received that were +between 65 and 127 octets in length inclusive. + +"rx_128_255_bytes" is equivalent to etherStatsPkts128to255Octets defined in +RFC 2819. This statistic is the total number of packets received that were +between 128 and 255 octets in length inclusive. + +"rx_256_511_bytes" is equivalent to etherStatsPkts256to511Octets defined in +RFC 2819. This statistic is the total number of packets received that were +between 256 and 511 octets in length inclusive. + +"rx_512_1023_bytes" is equivalent to etherStatsPkts512to1023Octets defined in +RFC 2819. This statistic is the total number of packets received that were +between 512 and 1023 octets in length inclusive. + +"rx_1024_1518_bytes" is equivalent to etherStatsPkts1024to1518Octets define +in RFC 2819. This statistic is the total number of packets received that were +between 1024 and 1518 octets in length inclusive. + +"rx_gte_1519_bytes" is a statistic defined specific to the behavior of the +Altera TSE. This statistics counts the number of received good and errored +frames between the length of 1519 and the maximum frame length configured +in the frm_length register. See the Altera TSE User Guide for More details. + +"rx_jabbers" is equivalent to etherStatsJabbers defined in RFC 2819. This +statistic is the total number of packets received that were longer than 1518 +octets, and had either a bad CRC with an integral number of octets (CRC Error) +or a bad CRC with a non-integral number of octets (Alignment Error). + +"rx_runts" is equivalent to etherStatsFragments defined in RFC 2819. This +statistic is the total number of packets received that were less than 64 octets +in length and had either a bad CRC with an integral number of octets (CRC +error) or a bad CRC with a non-integral number of octets (Alignment Error). diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 5cdb22971d19..a383c00392d0 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -270,16 +270,15 @@ arp_ip_target arp_validate Specifies whether or not ARP probes and replies should be - validated in the active-backup mode. This causes the ARP - monitor to examine the incoming ARP requests and replies, and - only consider a slave to be up if it is receiving the - appropriate ARP traffic. + validated in any mode that supports arp monitoring, or whether + non-ARP traffic should be filtered (disregarded) for link + monitoring purposes. Possible values are: none or 0 - No validation is performed. This is the default. + No validation or filtering is performed. active or 1 @@ -293,31 +292,68 @@ arp_validate Validation is performed for all slaves. - For the active slave, the validation checks ARP replies to - confirm that they were generated by an arp_ip_target. Since - backup slaves do not typically receive these replies, the - validation performed for backup slaves is on the ARP request - sent out via the active slave. It is possible that some - switch or network configurations may result in situations - wherein the backup slaves do not receive the ARP requests; in - such a situation, validation of backup slaves must be - disabled. - - The validation of ARP requests on backup slaves is mainly - helping bonding to decide which slaves are more likely to - work in case of the active slave failure, it doesn't really - guarantee that the backup slave will work if it's selected - as the next active slave. - - This option is useful in network configurations in which - multiple bonding hosts are concurrently issuing ARPs to one or - more targets beyond a common switch. Should the link between - the switch and target fail (but not the switch itself), the - probe traffic generated by the multiple bonding instances will - fool the standard ARP monitor into considering the links as - still up. Use of the arp_validate option can resolve this, as - the ARP monitor will only consider ARP requests and replies - associated with its own instance of bonding. + filter or 4 + + Filtering is applied to all slaves. No validation is + performed. + + filter_active or 5 + + Filtering is applied to all slaves, validation is performed + only for the active slave. + + filter_backup or 6 + + Filtering is applied to all slaves, validation is performed + only for backup slaves. + + Validation: + + Enabling validation causes the ARP monitor to examine the incoming + ARP requests and replies, and only consider a slave to be up if it + is receiving the appropriate ARP traffic. + + For an active slave, the validation checks ARP replies to confirm + that they were generated by an arp_ip_target. Since backup slaves + do not typically receive these replies, the validation performed + for backup slaves is on the broadcast ARP request sent out via the + active slave. It is possible that some switch or network + configurations may result in situations wherein the backup slaves + do not receive the ARP requests; in such a situation, validation + of backup slaves must be disabled. + + The validation of ARP requests on backup slaves is mainly helping + bonding to decide which slaves are more likely to work in case of + the active slave failure, it doesn't really guarantee that the + backup slave will work if it's selected as the next active slave. + + Validation is useful in network configurations in which multiple + bonding hosts are concurrently issuing ARPs to one or more targets + beyond a common switch. Should the link between the switch and + target fail (but not the switch itself), the probe traffic + generated by the multiple bonding instances will fool the standard + ARP monitor into considering the links as still up. Use of + validation can resolve this, as the ARP monitor will only consider + ARP requests and replies associated with its own instance of + bonding. + + Filtering: + + Enabling filtering causes the ARP monitor to only use incoming ARP + packets for link availability purposes. Arriving packets that are + not ARPs are delivered normally, but do not count when determining + if a slave is available. + + Filtering operates by only considering the reception of ARP + packets (any ARP packet, regardless of source or destination) when + determining if a slave has received traffic for link availability + purposes. + + Filtering is useful in network configurations in which significant + levels of third party broadcast traffic would fool the standard + ARP monitor into considering the links as still up. Use of + filtering can resolve this, as only ARP traffic is considered for + link availability purposes. This option was added in bonding version 3.1.0. diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt index 988be279a102..4f7ae5261364 100644 --- a/Documentation/networking/can.txt +++ b/Documentation/networking/can.txt @@ -1017,7 +1017,7 @@ solution for a couple of reasons: in case of a bus-off condition after the specified delay time in milliseconds. By default it's off. - "bitrate 125000 sample_point 0.875" + "bitrate 125000 sample-point 0.875" Shows the real bit-rate in bits/sec and the sample-point in the range 0.000..0.999. If the calculation of bit-timing parameters is enabled in the kernel (CONFIG_CAN_CALC_BITTIMING=y), the diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index a06b48d2f5cc..e3ba753cb714 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -277,7 +277,7 @@ Possible BPF extensions are shown in the following table: mark skb->mark queue skb->queue_mapping hatype skb->dev->type - rxhash skb->rxhash + rxhash skb->hash cpu raw_smp_processor_id() vlan_tci vlan_tx_tag_get(skb) vlan_pr vlan_tx_tag_present(skb) @@ -546,6 +546,130 @@ ffffffffa0069c8f + <x>: For BPF JIT developers, bpf_jit_disasm, bpf_asm and bpf_dbg provides a useful toolchain for developing and testing the kernel's JIT compiler. +BPF kernel internals +-------------------- +Internally, for the kernel interpreter, a different BPF instruction set +format with similar underlying principles from BPF described in previous +paragraphs is being used. However, the instruction set format is modelled +closer to the underlying architecture to mimic native instruction sets, so +that a better performance can be achieved (more details later). + +It is designed to be JITed with one to one mapping, which can also open up +the possibility for GCC/LLVM compilers to generate optimized BPF code through +a BPF backend that performs almost as fast as natively compiled code. + +The new instruction set was originally designed with the possible goal in +mind to write programs in "restricted C" and compile into BPF with a optional +GCC/LLVM backend, so that it can just-in-time map to modern 64-bit CPUs with +minimal performance overhead over two steps, that is, C -> BPF -> native code. + +Currently, the new format is being used for running user BPF programs, which +includes seccomp BPF, classic socket filters, cls_bpf traffic classifier, +team driver's classifier for its load-balancing mode, netfilter's xt_bpf +extension, PTP dissector/classifier, and much more. They are all internally +converted by the kernel into the new instruction set representation and run +in the extended interpreter. For in-kernel handlers, this all works +transparently by using sk_unattached_filter_create() for setting up the +filter, resp. sk_unattached_filter_destroy() for destroying it. The macro +SK_RUN_FILTER(filter, ctx) transparently invokes the right BPF function to +run the filter. 'filter' is a pointer to struct sk_filter that we got from +sk_unattached_filter_create(), and 'ctx' the given context (e.g. skb pointer). +All constraints and restrictions from sk_chk_filter() apply before a +conversion to the new layout is being done behind the scenes! + +Currently, for JITing, the user BPF format is being used and current BPF JIT +compilers reused whenever possible. In other words, we do not (yet!) perform +a JIT compilation in the new layout, however, future work will successively +migrate traditional JIT compilers into the new instruction format as well, so +that they will profit from the very same benefits. Thus, when speaking about +JIT in the following, a JIT compiler (TBD) for the new instruction format is +meant in this context. + +Some core changes of the new internal format: + +- Number of registers increase from 2 to 10: + + The old format had two registers A and X, and a hidden frame pointer. The + new layout extends this to be 10 internal registers and a read-only frame + pointer. Since 64-bit CPUs are passing arguments to functions via registers + the number of args from BPF program to in-kernel function is restricted + to 5 and one register is used to accept return value from an in-kernel + function. Natively, x86_64 passes first 6 arguments in registers, aarch64/ + sparcv9/mips64 have 7 - 8 registers for arguments; x86_64 has 6 callee saved + registers, and aarch64/sparcv9/mips64 have 11 or more callee saved registers. + + Therefore, BPF calling convention is defined as: + + * R0 - return value from in-kernel function + * R1 - R5 - arguments from BPF program to in-kernel function + * R6 - R9 - callee saved registers that in-kernel function will preserve + * R10 - read-only frame pointer to access stack + + Thus, all BPF registers map one to one to HW registers on x86_64, aarch64, + etc, and BPF calling convention maps directly to ABIs used by the kernel on + 64-bit architectures. + + On 32-bit architectures JIT may map programs that use only 32-bit arithmetic + and may let more complex programs to be interpreted. + + R0 - R5 are scratch registers and BPF program needs spill/fill them if + necessary across calls. Note that there is only one BPF program (== one BPF + main routine) and it cannot call other BPF functions, it can only call + predefined in-kernel functions, though. + +- Register width increases from 32-bit to 64-bit: + + Still, the semantics of the original 32-bit ALU operations are preserved + via 32-bit subregisters. All BPF registers are 64-bit with 32-bit lower + subregisters that zero-extend into 64-bit if they are being written to. + That behavior maps directly to x86_64 and arm64 subregister definition, but + makes other JITs more difficult. + + 32-bit architectures run 64-bit internal BPF programs via interpreter. + Their JITs may convert BPF programs that only use 32-bit subregisters into + native instruction set and let the rest being interpreted. + + Operation is 64-bit, because on 64-bit architectures, pointers are also + 64-bit wide, and we want to pass 64-bit values in/out of kernel functions, + so 32-bit BPF registers would otherwise require to define register-pair + ABI, thus, there won't be able to use a direct BPF register to HW register + mapping and JIT would need to do combine/split/move operations for every + register in and out of the function, which is complex, bug prone and slow. + Another reason is the use of atomic 64-bit counters. + +- Conditional jt/jf targets replaced with jt/fall-through: + + While the original design has constructs such as "if (cond) jump_true; + else jump_false;", they are being replaced into alternative constructs like + "if (cond) jump_true; /* else fall-through */". + +- Introduces bpf_call insn and register passing convention for zero overhead + calls from/to other kernel functions: + + After a kernel function call, R1 - R5 are reset to unreadable and R0 has a + return type of the function. Since R6 - R9 are callee saved, their state is + preserved across the call. + +Also in the new design, BPF is limited to 4096 insns, which means that any +program will terminate quickly and will only call a fixed number of kernel +functions. Original BPF and the new format are two operand instructions, +which helps to do one-to-one mapping between BPF insn and x86 insn during JIT. + +The input context pointer for invoking the interpreter function is generic, +its content is defined by a specific use case. For seccomp register R1 points +to seccomp_data, for converted BPF filters R1 points to a skb. + +A program, that is translated internally consists of the following elements: + + op:16, jt:8, jf:8, k:32 ==> op:8, a_reg:4, x_reg:4, off:16, imm:32 + +Just like the original BPF, the new format runs within a controlled environment, +is deterministic and the kernel can easily prove that. The safety of the program +can be determined in two steps: first step does depth-first-search to disallow +loops and other CFG validation; second step starts from the first insn and +descends all possible paths. It simulates execution of every insn and observes +the state change of registers and stack. + Misc ---- @@ -561,3 +685,4 @@ the underlying architecture. Jay Schulist <jschlst@samba.org> Daniel Borkmann <dborkman@redhat.com> +Alexei Starovoitov <ast@plumgrid.com> diff --git a/Documentation/networking/gianfar.txt b/Documentation/networking/gianfar.txt index ad474ea07d07..ba1daea7f2e4 100644 --- a/Documentation/networking/gianfar.txt +++ b/Documentation/networking/gianfar.txt @@ -1,38 +1,8 @@ The Gianfar Ethernet Driver -Sysfs File description Author: Andy Fleming <afleming@freescale.com> Updated: 2005-07-28 -SYSFS - -Several of the features of the gianfar driver are controlled -through sysfs files. These are: - -bd_stash: -To stash RX Buffer Descriptors in the L2, echo 'on' or '1' to -bd_stash, echo 'off' or '0' to disable - -rx_stash_len: -To stash the first n bytes of the packet in L2, echo the number -of bytes to buf_stash_len. echo 0 to disable. - -WARNING: You could really screw these up if you set them too low or high! -fifo_threshold: -To change the number of bytes the controller needs in the -fifo before it starts transmission, echo the number of bytes to -fifo_thresh. Range should be 0-511. - -fifo_starve: -When the FIFO has less than this many bytes during a transmit, it -enters starve mode, and increases the priority of TX memory -transactions. To change, echo the number of bytes to -fifo_starve. Range should be 0-511. - -fifo_starve_off: -Once in starve mode, the FIFO remains there until it has this -many bytes. To change, echo the number of bytes to -fifo_starve_off. Range should be 0-511. CHECKSUM OFFLOADING diff --git a/Documentation/networking/igb.txt b/Documentation/networking/igb.txt index 4ebbd659256f..43d3549366a0 100644 --- a/Documentation/networking/igb.txt +++ b/Documentation/networking/igb.txt @@ -36,54 +36,6 @@ Default Value: 0 This parameter adds support for SR-IOV. It causes the driver to spawn up to max_vfs worth of virtual function. -QueuePairs ----------- -Valid Range: 0-1 -Default Value: 1 (TX and RX will be paired onto one interrupt vector) - -If set to 0, when MSI-X is enabled, the TX and RX will attempt to occupy -separate vectors. - -This option can be overridden to 1 if there are not sufficient interrupts -available. This can occur if any combination of RSS, VMDQ, and max_vfs -results in more than 4 queues being used. - -Node ----- -Valid Range: 0-n -Default Value: -1 (off) - - 0 - n: where n is the number of the NUMA node that should be used to - allocate memory for this adapter port. - -1: uses the driver default of allocating memory on whichever processor is - running insmod/modprobe. - - The Node parameter will allow you to pick which NUMA node you want to have - the adapter allocate memory from. All driver structures, in-memory queues, - and receive buffers will be allocated on the node specified. This parameter - is only useful when interrupt affinity is specified, otherwise some portion - of the time the interrupt could run on a different core than the memory is - allocated on, causing slower memory access and impacting throughput, CPU, or - both. - -EEE ---- -Valid Range: 0-1 -Default Value: 1 (enabled) - - A link between two EEE-compliant devices will result in periodic bursts of - data followed by long periods where in the link is in an idle state. This Low - Power Idle (LPI) state is supported in both 1Gbps and 100Mbps link speeds. - NOTE: EEE support requires autonegotiation. - -DMAC ----- -Valid Range: 0-1 -Default Value: 1 (enabled) - Enables or disables DMA Coalescing feature. - - - Additional Configurations ========================= diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index 6fea79efb4cb..38112d512f47 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt @@ -578,7 +578,7 @@ processes. This also works in combination with mmap(2) on packet sockets. Currently implemented fanout policies are: - - PACKET_FANOUT_HASH: schedule to socket by skb's rxhash + - PACKET_FANOUT_HASH: schedule to socket by skb's packet hash - PACKET_FANOUT_LB: schedule to socket by round-robin - PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on - PACKET_FANOUT_RND: schedule to socket by random selection diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index ebf270719402..3544c98401fd 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -48,7 +48,7 @@ The MDIO bus time, so it is safe for them to block, waiting for an interrupt to signal the operation is complete - 2) A reset function is necessary. This is used to return the bus to an + 2) A reset function is optional. This is used to return the bus to an initialized state. 3) A probe function is needed. This function should set up anything the bus @@ -253,16 +253,25 @@ Writing a PHY driver Each driver consists of a number of function pointers: + soft_reset: perform a PHY software reset config_init: configures PHY into a sane state after a reset. For instance, a Davicom PHY requires descrambling disabled. probe: Allocate phy->priv, optionally refuse to bind. PHY may not have been reset or had fixups run yet. suspend/resume: power management config_aneg: Changes the speed/duplex/negotiation settings + aneg_done: Determines the auto-negotiation result read_status: Reads the current speed/duplex/negotiation settings ack_interrupt: Clear a pending interrupt + did_interrupt: Checks if the PHY generated an interrupt config_intr: Enable or disable interrupts remove: Does any driver take-down + ts_info: Queries about the HW timestamping status + hwtstamp: Set the PHY HW timestamping configuration + rxtstamp: Requests a receive timestamp at the PHY level for a 'skb' + txtsamp: Requests a transmit timestamp at the PHY level for a 'skb' + set_wol: Enable Wake-on-LAN at the PHY level + get_wol: Get the Wake-on-LAN status at the PHY level Of these, only config_aneg and read_status are required to be assigned by the driver code. The rest are optional. Also, it is diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt index 5a61a240a652..0e30c7845b2b 100644 --- a/Documentation/networking/pktgen.txt +++ b/Documentation/networking/pktgen.txt @@ -102,13 +102,18 @@ Examples: The 'minimum' MAC is what you set with dstmac. pgset "flag [name]" Set a flag to determine behaviour. Current flags - are: IPSRC_RND #IP Source is random (between min/max), - IPDST_RND, UDPSRC_RND, - UDPDST_RND, MACSRC_RND, MACDST_RND + are: IPSRC_RND # IP source is random (between min/max) + IPDST_RND # IP destination is random + UDPSRC_RND, UDPDST_RND, + MACSRC_RND, MACDST_RND + TXSIZE_RND, IPV6, MPLS_RND, VID_RND, SVID_RND + FLOW_SEQ, QUEUE_MAP_RND # queue map random QUEUE_MAP_CPU # queue map mirrors smp_processor_id() - IPSEC # Make IPsec encapsulation for packet + UDPCSUM, + IPSEC # IPsec encapsulation (needs CONFIG_XFRM) + NODE_ALLOC # node specific memory allocation pgset spi SPI_VALUE Set specific SA used to transform packet. @@ -233,13 +238,22 @@ udp_dst_max flag IPSRC_RND - TXSIZE_RND IPDST_RND UDPSRC_RND UDPDST_RND MACSRC_RND MACDST_RND + TXSIZE_RND + IPV6 + MPLS_RND + VID_RND + SVID_RND + FLOW_SEQ + QUEUE_MAP_RND + QUEUE_MAP_CPU + UDPCSUM IPSEC + NODE_ALLOC dst_min dst_max diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index b89bc82eed46..16a924c486bf 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -27,6 +27,8 @@ Contents of this document: (*) AF_RXRPC kernel interface. + (*) Configurable parameters. + ======== OVERVIEW @@ -864,3 +866,82 @@ The kernel interface functions are as follows: This is used to allocate a null RxRPC key that can be used to indicate anonymous security for a particular domain. + + +======================= +CONFIGURABLE PARAMETERS +======================= + +The RxRPC protocol driver has a number of configurable parameters that can be +adjusted through sysctls in /proc/net/rxrpc/: + + (*) req_ack_delay + + The amount of time in milliseconds after receiving a packet with the + request-ack flag set before we honour the flag and actually send the + requested ack. + + Usually the other side won't stop sending packets until the advertised + reception window is full (to a maximum of 255 packets), so delaying the + ACK permits several packets to be ACK'd in one go. + + (*) soft_ack_delay + + The amount of time in milliseconds after receiving a new packet before we + generate a soft-ACK to tell the sender that it doesn't need to resend. + + (*) idle_ack_delay + + The amount of time in milliseconds after all the packets currently in the + received queue have been consumed before we generate a hard-ACK to tell + the sender it can free its buffers, assuming no other reason occurs that + we would send an ACK. + + (*) resend_timeout + + The amount of time in milliseconds after transmitting a packet before we + transmit it again, assuming no ACK is received from the receiver telling + us they got it. + + (*) max_call_lifetime + + The maximum amount of time in seconds that a call may be in progress + before we preemptively kill it. + + (*) dead_call_expiry + + The amount of time in seconds before we remove a dead call from the call + list. Dead calls are kept around for a little while for the purpose of + repeating ACK and ABORT packets. + + (*) connection_expiry + + The amount of time in seconds after a connection was last used before we + remove it from the connection list. Whilst a connection is in existence, + it serves as a placeholder for negotiated security; when it is deleted, + the security must be renegotiated. + + (*) transport_expiry + + The amount of time in seconds after a transport was last used before we + remove it from the transport list. Whilst a transport is in existence, it + serves to anchor the peer data and keeps the connection ID counter. + + (*) rxrpc_rx_window_size + + The size of the receive window in packets. This is the maximum number of + unconsumed received packets we're willing to hold in memory for any + particular call. + + (*) rxrpc_rx_mtu + + The maximum packet MTU size that we're willing to receive in bytes. This + indicates to the peer whether we're willing to accept jumbo packets. + + (*) rxrpc_rx_jumbo_max + + The maximum number of packets that we're willing to accept in a jumbo + packet. Non-terminal packets in a jumbo packet must contain a four byte + header plus exactly 1412 bytes of data. The terminal packet must contain + a four byte header plus any amount of data. In any event, a jumbo packet + may not exceed rxrpc_rx_mtu in size. diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt index ca6977f5b2ed..99ca40e8e810 100644 --- a/Documentation/networking/scaling.txt +++ b/Documentation/networking/scaling.txt @@ -429,7 +429,7 @@ RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into (therbert@google.com) Accelerated RFS was introduced in 2.6.35. Original patches were -submitted by Ben Hutchings (bhutchings@solarflare.com) +submitted by Ben Hutchings (bwh@kernel.org) Authors: Tom Herbert (therbert@google.com) diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt index 7d11bb5dc30a..bdc4c0db51e1 100644 --- a/Documentation/networking/tcp.txt +++ b/Documentation/networking/tcp.txt @@ -30,7 +30,7 @@ A congestion control mechanism can be registered through functions in tcp_cong.c. The functions used by the congestion control mechanism are registered via passing a tcp_congestion_ops struct to tcp_register_congestion_control. As a minimum name, ssthresh, -cong_avoid, min_cwnd must be valid. +cong_avoid must be valid. Private data for a congestion control mechanism is stored in tp->ca_priv. tcp_ca(tp) returns a pointer to this space. This is preallocated space - it diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 048c92b487f6..bc3554124903 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -202,6 +202,9 @@ Time stamps for outgoing packets are to be generated as follows: and not free the skb. A driver not supporting hardware time stamping doesn't do that. A driver must never touch sk_buff::tstamp! It is used to store software generated time stamps by the network subsystem. +- Driver should call skb_tx_timestamp() as close to passing sk_buff to hardware + as possible. skb_tx_timestamp() provides a software time stamp if requested + and hardware timestamping is not possible (SKBTX_IN_PROGRESS not set). - As soon as the driver has sent the packet and/or obtained a hardware time stamp for it, it passes the time stamp back by calling skb_hwtstamp_tx() with the original skb, the raw @@ -212,6 +215,3 @@ Time stamps for outgoing packets are to be generated as follows: this would occur at a later time in the processing pipeline than other software time stamping and therefore could lead to unexpected deltas between time stamps. -- If the driver did not set the SKBTX_IN_PROGRESS flag (see above), then - dev_hard_start_xmit() checks whether software time stamping - is wanted as fallback and potentially generates the time stamp. |