kernel/linux.git/include/linux/tcp.h, branch linux-2.6.28.y

tcp: kill pointless urg_mode

2008-10-07T21:43:06+00:00

It all started from me noticing that this urgent check in tcp_clean_rtx_queue is unnecessarily inside the loop. Then I took a longer look to it and found out that the users of urg_mode can trivially do without, well almost, there was one gotcha. Bonus: those funny people who use urg with >= 2^31 write_seq - snd_una could now rejoice too (that's the only purpose for the between being there, otherwise a simple compare would have done the thing). Not that I assume that the rest of the tcp code happily lives with such mind-boggling numbers :-). Alas, it turned out to be impossible to set wmem to such numbers anyway, yes I really tried a big sendfile after setting some wmem but nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf... So I hacked a bit variable to long and found out that it seems to work... :-) Signed-off-by: Ilpo Järvinen Signed-off-by: David S. Miller

tcp: reorganize retransmit code loops

2008-09-21T04:24:21+00:00

Both loops are quite similar, so they can be combined with little effort. As a result, forward_skb_hint becomes obsolete as well. Signed-off-by: Ilpo Järvinen Signed-off-by: David S. Miller

tcp: convert retransmit_cnt_hint to seqno

2008-09-21T04:20:20+00:00

Main benefit in this is that we can then freely point the retransmit_skb_hint to anywhere we want to because there's no longer need to know what would be the count changes involve, and since this is really used only as a terminator, unnecessary work is one time walk at most, and if some retransmissions are necessary after that point later on, the walk is not full waste of time anyway. Since retransmit_high must be kept valid, all lost markers must ensure that. Now I also have learned how those "holes" in the rexmittable skbs can appear, mtu probe does them. So I removed the misleading comment as well. Signed-off-by: Ilpo Järvinen Signed-off-by: David S. Miller

tcp: Remove redundant checks when setting eff_sacks

2008-07-19T07:07:02+00:00

Remove redundant checks when setting eff_sacks and make the number of SACKs a compile time constant. Now that the options code knows how many SACK blocks can fit in the header, we don't need to have the SACK code guessing at it. Signed-off-by: Adam Langley Signed-off-by: David S. Miller

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

2008-06-14T03:52:39+00:00

Conflicts: drivers/net/smc911x.c

tcp: Revert 'process defer accept as established' changes.

2008-06-12T23:34:35+00:00

This reverts two changesets, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and the follow-on bug fix 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38 ("tcp: Fix slab corruption with ipv6 and tcp6fuzz"). This change causes several problems, first reported by Ingo Molnar as a distcc-over-loopback regression where connections were getting stuck. Ilpo Järvinen first spotted the locking problems. The new function added by this code, tcp_defer_accept_check(), only has the child socket locked, yet it is modifying state of the parent listening socket. Fixing that is non-trivial at best, because we can't simply just grab the parent listening socket lock at this point, because it would create an ABBA deadlock. The normal ordering is parent listening socket --> child socket, but this code path would require the reverse lock ordering. Next is a problem noticed by Vitaliy Gusev, he noted: ---------------------------------------- >--- a/net/ipv4/tcp_timer.c >+++ b/net/ipv4/tcp_timer.c >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data) > goto death; > } > >+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) { >+ tcp_send_active_reset(sk, GFP_ATOMIC); >+ goto death; Here socket sk is not attached to listening socket's request queue. tcp_done() will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should release this sk) as socket is not DEAD. Therefore socket sk will be lost for freeing. ---------------------------------------- Finally, Alexey Kuznetsov argues that there might not even be any real value or advantage to these new semantics even if we fix all of the bugs: ---------------------------------------- Hiding from accept() sockets with only out-of-order data only is the only thing which is impossible with old approach. Is this really so valuable? My opinion: no, this is nothing but a new loophole to consume memory without control. ---------------------------------------- So revert this thing for now. Signed-off-by: David S. Miller

tcp: Reorganize tcp_sock to fill 64-bit holes & improve locality

2008-05-29T10:25:23+00:00

I tried to group recovery related fields nearby (non-CA_Open related variables, to be more accurate) so that one to three cachelines would not be necessary in CA_Open. These are now contiguously deployed: struct sk_buff_head out_of_order_queue; /* 1968 80 */ /* --- cacheline 32 boundary (2048 bytes) --- */ struct tcp_sack_block duplicate_sack[1]; /* 2048 8 */ struct tcp_sack_block selective_acks[4]; /* 2056 32 */ struct tcp_sack_block recv_sack_cache[4]; /* 2088 32 */ /* --- cacheline 33 boundary (2112 bytes) was 8 bytes ago --- */ struct sk_buff * highest_sack; /* 2120 8 */ int lost_cnt_hint; /* 2128 4 */ int retransmit_cnt_hint; /* 2132 4 */ u32 lost_retrans_low; /* 2136 4 */ u8 reordering; /* 2140 1 */ u8 keepalive_probes; /* 2141 1 */ /* XXX 2 bytes hole, try to pack */ u32 prior_ssthresh; /* 2144 4 */ u32 high_seq; /* 2148 4 */ u32 retrans_stamp; /* 2152 4 */ u32 undo_marker; /* 2156 4 */ int undo_retrans; /* 2160 4 */ u32 total_retrans; /* 2164 4 */ ...and they're then followed by URG slowpath & keepalive related variables. Head of the out_of_order_queue always needed for empty checks, if that's empty (and TCP is in CA_Open), following ~200 bytes (in 64-bit) shouldn't be necessary for anything. If only OFO queue exists but TCP is in CA_Open, selective_acks (and possibly duplicate_sack) are necessary besides the out_of_order_queue but the rest of the block again shouldn't be (ie., the other direction had losses). As the cacheline boundaries depend on many factors in the preceeding stuff, trying to align considering them doesn't make too much sense. Commented one ordering hazard. There are number of low utilized u8/16s that could be combined get 2 bytes less in total so that the hole could be made to vanish (includes at least ecn_flags, urg_data, urg_mode, frto_counter, nonagle). Signed-off-by: Ilpo Järvinen Acked-by: Eric Dumazet Signed-off-by: David S. Miller

tcp: Make prior_ssthresh a u32

2008-05-22T00:40:05+00:00

If previous window was above representable values of u16, strange things will happen if undo with the truncated value is called for. Alternatively, this could be fixed by some max trickery but that would limit undoing high-speed undos. Adds 16-bit hole but there isn't anything to fill it with. Signed-off-by: Ilpo Järvinen Signed-off-by: David S. Miller

[TCP]: TCP_DEFER_ACCEPT updates - process as established

2008-03-21T23:33:01+00:00

Change TCP_DEFER_ACCEPT implementation so that it transitions a connection to ESTABLISHED after handshake is complete instead of leaving it in SYN-RECV until some data arrvies. Place connection in accept queue when first data packet arrives from slow path. Benefits: - established connection is now reset if it never makes it to the accept queue - diagnostic state of established matches with the packet traces showing completed handshake - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be enforced with reasonable accuracy instead of rounding up to next exponential back-off of syn-ack retry. Signed-off-by: Patrick McManus Signed-off-by: David S. Miller

[TCP]: Rewrite SACK block processing & sack_recv_cache use

2008-01-28T22:54:07+00:00

Key points of this patch are: - In case new SACK information is advance only type, no skb processing below previously discovered highest point is done - Optimize cases below highest point too since there's no need to always go up to highest point (which is very likely still present in that SACK), this is not entirely true though because I'm dropping the fastpath_skb_hint which could previously optimize those cases even better. Whether that's significant, I'm not too sure. Currently it will provide skipping by walking. Combined with RB-tree, all skipping would become fast too regardless of window size (can be done incrementally later). Previously a number of cases in TCP SACK processing fails to take advantage of costly stored information in sack_recv_cache, most importantly, expected events such as cumulative ACK and new hole ACKs. Processing on such ACKs result in rather long walks building up latencies (which easily gets nasty when window is huge). Those latencies are often completely unnecessary compared with the amount of _new_ information received, usually for cumulative ACK there's no new information at all, yet TCP walks whole queue unnecessary potentially taking a number of costly cache misses on the way, etc.! Since the inclusion of highest_sack, there's a lot information that is very likely redundant (SACK fastpath hint stuff, fackets_out, highest_sack), though there's no ultimate guarantee that they'll remain the same whole the time (in all unearthly scenarios). Take advantage of this knowledge here and drop fastpath hint and use direct access to highest SACKed skb as a replacement. Effectively "special cased" fastpath is dropped. This change adds some complexity to introduce better coveraged "fastpath", though the added complexity should make TCP behave more cache friendly. The current ACK's SACK blocks are compared against each cached block individially and only ranges that are new are then scanned by the high constant walk. For other parts of write queue, even when in previously known part of the SACK blocks, a faster skip function is used (if necessary at all). In addition, whenever possible, TCP fast-forwards to highest_sack skb that was made available by an earlier patch. In typical case, no other things but this fast-forward and mandatory markings after that occur making the access pattern quite similar to the former fastpath "special case". DSACKs are special case that must always be walked. The local to recv_sack_cache copying could be more intelligent w.r.t DSACKs which are likely to be there only once but that is left to a separate patch. Signed-off-by: Ilpo Järvinen Signed-off-by: David S. Miller