summaryrefslogtreecommitdiff
path: root/Documentation/networking/packet_mmap.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking/packet_mmap.txt')
-rw-r--r--Documentation/networking/packet_mmap.txt49
1 files changed, 47 insertions, 2 deletions
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index 8e48e3b14227..1404674c0a02 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -98,6 +98,11 @@ by the kernel.
The destruction of the socket and all associated resources
is done by a simple call to close(fd).
+Similarly as without PACKET_MMAP, it is possible to use one socket
+for capture and transmission. This can be done by mapping the
+allocated RX and TX buffer ring with a single mmap() call.
+See "Mapping and use of the circular buffer (ring)".
+
Next I will describe PACKET_MMAP settings and its constraints,
also the mapping of the circular buffer in the user process and
the use of this buffer.
@@ -414,6 +419,19 @@ tp_block_size/tp_frame_size frames there will be a gap between
the frames. This is because a frame cannot be spawn across two
blocks.
+To use one socket for capture and transmission, the mapping of both the
+RX and TX buffer ring has to be done with one call to mmap:
+
+ ...
+ setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &foo, sizeof(foo));
+ setsockopt(fd, SOL_PACKET, PACKET_TX_RING, &bar, sizeof(bar));
+ ...
+ rx_ring = mmap(0, size * 2, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+ tx_ring = rx_ring + size;
+
+RX must be the first as the kernel maps the TX ring memory right
+after the RX one.
+
At the beginning of each frame there is an status field (see
struct tpacket_hdr). If this field is 0 means that the frame is ready
to be used for the kernel, If not, there is a frame the user can read
@@ -517,8 +535,6 @@ where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3.
TPACKET_V1:
- Default if not otherwise specified by setsockopt(2)
- RX_RING, TX_RING available
- - VLAN metadata information available for packets
- (TP_STATUS_VLAN_VALID)
TPACKET_V1 --> TPACKET_V2:
- Made 64 bit clean due to unsigned long usage in TPACKET_V1
@@ -526,6 +542,13 @@ TPACKET_V1 --> TPACKET_V2:
userspace and the like
- Timestamp resolution in nanoseconds instead of microseconds
- RX_RING, TX_RING available
+ - VLAN metadata information available for packets
+ (TP_STATUS_VLAN_VALID, TP_STATUS_VLAN_TPID_VALID),
+ in the tpacket2_hdr structure:
+ - TP_STATUS_VLAN_VALID bit being set into the tp_status field indicates
+ that the tp_vlan_tci field has valid VLAN TCI value
+ - TP_STATUS_VLAN_TPID_VALID bit being set into the tp_status field
+ indicates that the tp_vlan_tpid field has valid VLAN TPID value
- How to switch to TPACKET_V2:
1. Replace struct tpacket_hdr by struct tpacket2_hdr
2. Query header len and save
@@ -560,6 +583,7 @@ Currently implemented fanout policies are:
- PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on
- PACKET_FANOUT_RND: schedule to socket by random selection
- PACKET_FANOUT_ROLLOVER: if one socket is full, rollover to another
+ - PACKET_FANOUT_QM: schedule to socket by skbs recorded queue_mapping
Minimal example code by David S. Miller (try things like "./test eth0 hash",
"./test eth0 lb", etc.):
@@ -953,6 +977,27 @@ int main(int argc, char **argp)
}
-------------------------------------------------------------------------------
++ PACKET_QDISC_BYPASS
+-------------------------------------------------------------------------------
+
+If there is a requirement to load the network with many packets in a similar
+fashion as pktgen does, you might set the following option after socket
+creation:
+
+ int one = 1;
+ setsockopt(fd, SOL_PACKET, PACKET_QDISC_BYPASS, &one, sizeof(one));
+
+This has the side-effect, that packets sent through PF_PACKET will bypass the
+kernel's qdisc layer and are forcedly pushed to the driver directly. Meaning,
+packet are not buffered, tc disciplines are ignored, increased loss can occur
+and such packets are also not visible to other PF_PACKET sockets anymore. So,
+you have been warned; generally, this can be useful for stress testing various
+components of a system.
+
+On default, PACKET_QDISC_BYPASS is disabled and needs to be explicitly enabled
+on PF_PACKET sockets.
+
+-------------------------------------------------------------------------------
+ PACKET_TIMESTAMP
-------------------------------------------------------------------------------