diff options
Diffstat (limited to 'Documentation/networking/packet_mmap.txt')
-rw-r--r-- | Documentation/networking/packet_mmap.txt | 49 |
1 files changed, 47 insertions, 2 deletions
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index 8e48e3b14227..1404674c0a02 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt @@ -98,6 +98,11 @@ by the kernel. The destruction of the socket and all associated resources is done by a simple call to close(fd). +Similarly as without PACKET_MMAP, it is possible to use one socket +for capture and transmission. This can be done by mapping the +allocated RX and TX buffer ring with a single mmap() call. +See "Mapping and use of the circular buffer (ring)". + Next I will describe PACKET_MMAP settings and its constraints, also the mapping of the circular buffer in the user process and the use of this buffer. @@ -414,6 +419,19 @@ tp_block_size/tp_frame_size frames there will be a gap between the frames. This is because a frame cannot be spawn across two blocks. +To use one socket for capture and transmission, the mapping of both the +RX and TX buffer ring has to be done with one call to mmap: + + ... + setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &foo, sizeof(foo)); + setsockopt(fd, SOL_PACKET, PACKET_TX_RING, &bar, sizeof(bar)); + ... + rx_ring = mmap(0, size * 2, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); + tx_ring = rx_ring + size; + +RX must be the first as the kernel maps the TX ring memory right +after the RX one. + At the beginning of each frame there is an status field (see struct tpacket_hdr). If this field is 0 means that the frame is ready to be used for the kernel, If not, there is a frame the user can read @@ -517,8 +535,6 @@ where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3. TPACKET_V1: - Default if not otherwise specified by setsockopt(2) - RX_RING, TX_RING available - - VLAN metadata information available for packets - (TP_STATUS_VLAN_VALID) TPACKET_V1 --> TPACKET_V2: - Made 64 bit clean due to unsigned long usage in TPACKET_V1 @@ -526,6 +542,13 @@ TPACKET_V1 --> TPACKET_V2: userspace and the like - Timestamp resolution in nanoseconds instead of microseconds - RX_RING, TX_RING available + - VLAN metadata information available for packets + (TP_STATUS_VLAN_VALID, TP_STATUS_VLAN_TPID_VALID), + in the tpacket2_hdr structure: + - TP_STATUS_VLAN_VALID bit being set into the tp_status field indicates + that the tp_vlan_tci field has valid VLAN TCI value + - TP_STATUS_VLAN_TPID_VALID bit being set into the tp_status field + indicates that the tp_vlan_tpid field has valid VLAN TPID value - How to switch to TPACKET_V2: 1. Replace struct tpacket_hdr by struct tpacket2_hdr 2. Query header len and save @@ -560,6 +583,7 @@ Currently implemented fanout policies are: - PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on - PACKET_FANOUT_RND: schedule to socket by random selection - PACKET_FANOUT_ROLLOVER: if one socket is full, rollover to another + - PACKET_FANOUT_QM: schedule to socket by skbs recorded queue_mapping Minimal example code by David S. Miller (try things like "./test eth0 hash", "./test eth0 lb", etc.): @@ -953,6 +977,27 @@ int main(int argc, char **argp) } ------------------------------------------------------------------------------- ++ PACKET_QDISC_BYPASS +------------------------------------------------------------------------------- + +If there is a requirement to load the network with many packets in a similar +fashion as pktgen does, you might set the following option after socket +creation: + + int one = 1; + setsockopt(fd, SOL_PACKET, PACKET_QDISC_BYPASS, &one, sizeof(one)); + +This has the side-effect, that packets sent through PF_PACKET will bypass the +kernel's qdisc layer and are forcedly pushed to the driver directly. Meaning, +packet are not buffered, tc disciplines are ignored, increased loss can occur +and such packets are also not visible to other PF_PACKET sockets anymore. So, +you have been warned; generally, this can be useful for stress testing various +components of a system. + +On default, PACKET_QDISC_BYPASS is disabled and needs to be explicitly enabled +on PF_PACKET sockets. + +------------------------------------------------------------------------------- + PACKET_TIMESTAMP ------------------------------------------------------------------------------- |