From 375fe02c91792917aa26d68a87ab110d1937f44e Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Mon, 22 Jul 2013 16:20:45 -0700 Subject: tcp: consolidate SYNACK RTT sampling The first patch consolidates SYNACK and other RTT measurement to use a central function tcp_ack_update_rtt(). A (small) bonus is now SYNACK RTT measurement happens after PAWS check, potentially reducing the impact of RTO seeding on bad TCP timestamps values. Signed-off-by: Yuchung Cheng Signed-off-by: David S. Miller --- net/ipv6/tcp_ipv6.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'net/ipv6/tcp_ipv6.c') diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 6e1649d58533..80fe69ef2188 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1237,8 +1237,6 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb, newtp->advmss = tcp_sk(sk)->rx_opt.user_mss; tcp_initialize_rcv_mss(newsk); - tcp_synack_rtt_meas(newsk, req); - newtp->total_retrans = req->num_retrans; newinet->inet_daddr = newinet->inet_saddr = LOOPBACK4_IPV6; newinet->inet_rcv_saddr = LOOPBACK4_IPV6; -- cgit v1.2.3 From c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 22 Jul 2013 20:27:07 -0700 Subject: tcp: TCP_NOTSENT_LOWAT socket option Idea of this patch is to add optional limitation of number of unsent bytes in TCP sockets, to reduce usage of kernel memory. TCP receiver might announce a big window, and TCP sender autotuning might allow a large amount of bytes in write queue, but this has little performance impact if a large part of this buffering is wasted : Write queue needs to be large only to deal with large BDP, not necessarily to cope with scheduling delays (incoming ACKS make room for the application to queue more bytes) For most workloads, using a value of 128 KB or less is OK to give applications enough time to react to POLLOUT events in time (or being awaken in a blocking sendmsg()) This patch adds two ways to set the limit : 1) Per socket option TCP_NOTSENT_LOWAT 2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets not using TCP_NOTSENT_LOWAT socket option (or setting a zero value) Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect. This changes poll()/select()/epoll() to report POLLOUT only if number of unsent bytes is below tp->nosent_lowat Note this might increase number of sendmsg()/sendfile() calls when using non blocking sockets, and increase number of context switches for blocking sockets. Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is defined as : Specify the minimum number of bytes in the buffer until the socket layer will pass the data to the protocol) Tested: netperf sessions, and watching /proc/net/protocols "memory" column for TCP With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458) lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols TCPv6 1880 2 45458 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y TCP 1696 508 45458 no 208 yes kernel y y y y y y y y y y y y y n y y y y y lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols TCPv6 1880 2 20567 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y TCP 1696 508 20567 no 208 yes kernel y y y y y y y y y y y y y n y y y y y Using 128KB has no bad effect on the throughput or cpu usage of a single flow, although there is an increase of context switches. A bonus is that we hold socket lock for a shorter amount of time and should improve latencies of ACK processing. lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf. Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand Size Size Size (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1651584 6291456 16384 20.00 17447.90 10^6bits/s 3.13 S -1.00 U 0.353 -1.000 usec/KB Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3': 412,514 context-switches 200.034645535 seconds time elapsed lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf. Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand Size Size Size (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1593240 6291456 16384 20.00 17321.16 10^6bits/s 3.35 S -1.00 U 0.381 -1.000 usec/KB Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3': 2,675,818 context-switches 200.029651391 seconds time elapsed Signed-off-by: Eric Dumazet Cc: Neal Cardwell Cc: Yuchung Cheng Acked-By: Yuchung Cheng Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 13 +++++++++++++ include/linux/tcp.h | 1 + include/net/sock.h | 19 +++++++++++++------ include/net/tcp.h | 14 ++++++++++++++ include/uapi/linux/tcp.h | 1 + net/ipv4/sysctl_net_ipv4.c | 7 +++++++ net/ipv4/tcp.c | 7 +++++++ net/ipv4/tcp_ipv4.c | 1 + net/ipv4/tcp_output.c | 3 +++ net/ipv6/tcp_ipv6.c | 1 + 10 files changed, 61 insertions(+), 6 deletions(-) (limited to 'net/ipv6/tcp_ipv6.c') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 10742902146f..53cea9bcb14c 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -516,6 +516,19 @@ tcp_wmem - vector of 3 INTEGERs: min, default, max this value is ignored. Default: between 64K and 4MB, depending on RAM size. +tcp_notsent_lowat - UNSIGNED INTEGER + A TCP socket can control the amount of unsent bytes in its write queue, + thanks to TCP_NOTSENT_LOWAT socket option. poll()/select()/epoll() + reports POLLOUT events if the amount of unsent bytes is below a per + socket value, and if the write queue is not full. sendmsg() will + also not add new buffers if the limit is hit. + + This global variable controls the amount of unsent data for + sockets not using TCP_NOTSENT_LOWAT. For these sockets, a change + to the global variable has immediate effect. + + Default: UINT_MAX (0xFFFFFFFF) + tcp_workaround_signed_windows - BOOLEAN If set, assume no receipt of a window scaling option means the remote TCP is broken and treats the window as a signed quantity. diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 472120b4fac5..9640803a17a7 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -238,6 +238,7 @@ struct tcp_sock { u32 rcv_wnd; /* Current receiver window */ u32 write_seq; /* Tail(+1) of data held in tcp send buffer */ + u32 notsent_lowat; /* TCP_NOTSENT_LOWAT */ u32 pushed_seq; /* Last pushed seq, required to talk to windows */ u32 lost_out; /* Lost packets */ u32 sacked_out; /* SACK'd packets */ diff --git a/include/net/sock.h b/include/net/sock.h index d0b5fdee50a2..b9f2b095b1ab 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -746,11 +746,6 @@ static inline int sk_stream_wspace(const struct sock *sk) extern void sk_stream_write_space(struct sock *sk); -static inline bool sk_stream_memory_free(const struct sock *sk) -{ - return sk->sk_wmem_queued < sk->sk_sndbuf; -} - /* OOB backlog add */ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb) { @@ -950,6 +945,7 @@ struct proto { unsigned int inuse_idx; #endif + bool (*stream_memory_free)(const struct sock *sk); /* Memory pressure */ void (*enter_memory_pressure)(struct sock *sk); atomic_long_t *memory_allocated; /* Current allocated memory. */ @@ -1088,11 +1084,22 @@ static inline struct cg_proto *parent_cg_proto(struct proto *proto, } #endif +static inline bool sk_stream_memory_free(const struct sock *sk) +{ + if (sk->sk_wmem_queued >= sk->sk_sndbuf) + return false; + + return sk->sk_prot->stream_memory_free ? + sk->sk_prot->stream_memory_free(sk) : true; +} + static inline bool sk_stream_is_writeable(const struct sock *sk) { - return sk_stream_wspace(sk) >= sk_stream_min_wspace(sk); + return sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) && + sk_stream_memory_free(sk); } + static inline bool sk_has_memory_pressure(const struct sock *sk) { return sk->sk_prot->memory_pressure != NULL; diff --git a/include/net/tcp.h b/include/net/tcp.h index c5868471abae..18fc999dae3c 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -284,6 +284,7 @@ extern int sysctl_tcp_thin_dupack; extern int sysctl_tcp_early_retrans; extern int sysctl_tcp_limit_output_bytes; extern int sysctl_tcp_challenge_ack_limit; +extern unsigned int sysctl_tcp_notsent_lowat; extern atomic_long_t tcp_memory_allocated; extern struct percpu_counter tcp_sockets_allocated; @@ -1539,6 +1540,19 @@ extern int tcp_gro_complete(struct sk_buff *skb); extern void __tcp_v4_send_check(struct sk_buff *skb, __be32 saddr, __be32 daddr); +static inline u32 tcp_notsent_lowat(const struct tcp_sock *tp) +{ + return tp->notsent_lowat ?: sysctl_tcp_notsent_lowat; +} + +static inline bool tcp_stream_memory_free(const struct sock *sk) +{ + const struct tcp_sock *tp = tcp_sk(sk); + u32 notsent_bytes = tp->write_seq - tp->snd_nxt; + + return notsent_bytes < tcp_notsent_lowat(tp); +} + #ifdef CONFIG_PROC_FS extern int tcp4_proc_init(void); extern void tcp4_proc_exit(void); diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 8d776ebc4829..377f1e59411d 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -111,6 +111,7 @@ enum { #define TCP_REPAIR_OPTIONS 22 #define TCP_FASTOPEN 23 /* Enable FastOpen on listeners */ #define TCP_TIMESTAMP 24 +#define TCP_NOTSENT_LOWAT 25 /* limit number of unsent bytes in write queue */ struct tcp_repair_opt { __u32 opt_code; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index b2c123c44d69..69ed203802da 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -554,6 +554,13 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_dointvec_minmax, .extra1 = &one, }, + { + .procname = "tcp_notsent_lowat", + .data = &sysctl_tcp_notsent_lowat, + .maxlen = sizeof(sysctl_tcp_notsent_lowat), + .mode = 0644, + .proc_handler = proc_dointvec, + }, { .procname = "tcp_rmem", .data = &sysctl_tcp_rmem, diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5eca9060bb8e..c27e81392398 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2631,6 +2631,10 @@ static int do_tcp_setsockopt(struct sock *sk, int level, else tp->tsoffset = val - tcp_time_stamp; break; + case TCP_NOTSENT_LOWAT: + tp->notsent_lowat = val; + sk->sk_write_space(sk); + break; default: err = -ENOPROTOOPT; break; @@ -2847,6 +2851,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level, case TCP_TIMESTAMP: val = tcp_time_stamp + tp->tsoffset; break; + case TCP_NOTSENT_LOWAT: + val = tp->notsent_lowat; + break; default: return -ENOPROTOOPT; } diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 2e3f129df0eb..2a5d5c469d17 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2800,6 +2800,7 @@ struct proto tcp_prot = { .unhash = inet_unhash, .get_port = inet_csk_get_port, .enter_memory_pressure = tcp_enter_memory_pressure, + .stream_memory_free = tcp_stream_memory_free, .sockets_allocated = &tcp_sockets_allocated, .orphan_count = &tcp_orphan_count, .memory_allocated = &tcp_memory_allocated, diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 92fde8d1aa82..884efff5b531 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -65,6 +65,9 @@ int sysctl_tcp_base_mss __read_mostly = TCP_BASE_MSS; /* By default, RFC2861 behavior. */ int sysctl_tcp_slow_start_after_idle __read_mostly = 1; +unsigned int sysctl_tcp_notsent_lowat __read_mostly = UINT_MAX; +EXPORT_SYMBOL(sysctl_tcp_notsent_lowat); + static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, int push_one, gfp_t gfp); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 80fe69ef2188..b792e870686b 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1924,6 +1924,7 @@ struct proto tcpv6_prot = { .unhash = inet_unhash, .get_port = inet_csk_get_port, .enter_memory_pressure = tcp_enter_memory_pressure, + .stream_memory_free = tcp_stream_memory_free, .sockets_allocated = &tcp_sockets_allocated, .memory_allocated = &tcp_memory_allocated, .memory_pressure = &tcp_memory_pressure, -- cgit v1.2.3 From 5ad37d5deee1ff7150a2d0602370101de158ad86 Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Fri, 26 Jul 2013 17:43:23 +0200 Subject: tcp: add tcp_syncookies mode to allow unconditionally generation of syncookies | If you want to test which effects syncookies have to your | network connections you can set this knob to 2 to enable | unconditionally generation of syncookies. Original idea and first implementation by Eric Dumazet. Cc: Florian Westphal Cc: David Miller Signed-off-by: Eric Dumazet Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 4 ++++ net/ipv4/tcp_ipv4.c | 5 +++-- net/ipv6/tcp_ipv6.c | 3 ++- 3 files changed, 9 insertions(+), 3 deletions(-) (limited to 'net/ipv6/tcp_ipv6.c') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 53cea9bcb14c..36be26b2ef7a 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -440,6 +440,10 @@ tcp_syncookies - BOOLEAN SYN flood warnings in logs not being really flooded, your server is seriously misconfigured. + If you want to test which effects syncookies have to your + network connections you can set this knob to 2 to enable + unconditionally generation of syncookies. + tcp_fastopen - INTEGER Enable TCP Fast Open feature (draft-ietf-tcpm-fastopen) to send data in the opening SYN packet. To use this feature, the client application diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 2a5d5c469d17..280efe5f19c1 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -890,7 +890,7 @@ bool tcp_syn_flood_action(struct sock *sk, NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPREQQFULLDROP); lopt = inet_csk(sk)->icsk_accept_queue.listen_opt; - if (!lopt->synflood_warned) { + if (!lopt->synflood_warned && sysctl_tcp_syncookies != 2) { lopt->synflood_warned = 1; pr_info("%s: Possible SYN flooding on port %d. %s. Check SNMP counters.\n", proto, ntohs(tcp_hdr(skb)->dest), msg); @@ -1462,7 +1462,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) * limitations, they conserve resources and peer is * evidently real one. */ - if (inet_csk_reqsk_queue_is_full(sk) && !isn) { + if ((sysctl_tcp_syncookies == 2 || + inet_csk_reqsk_queue_is_full(sk)) && !isn) { want_cookie = tcp_syn_flood_action(sk, skb, "TCP"); if (!want_cookie) goto drop; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index b792e870686b..38c196ca6011 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -963,7 +963,8 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb) if (!ipv6_unicast_destination(skb)) goto drop; - if (inet_csk_reqsk_queue_is_full(sk) && !isn) { + if ((sysctl_tcp_syncookies == 2 || + inet_csk_reqsk_queue_is_full(sk)) && !isn) { want_cookie = tcp_syn_flood_action(sk, skb, "TCPv6"); if (!want_cookie) goto drop; -- cgit v1.2.3 From d14c5ab6bef6a46170b84c3589b27768e979f93d Mon Sep 17 00:00:00 2001 From: Francesco Fusco Date: Thu, 15 Aug 2013 13:42:14 +0200 Subject: net: proc_fs: trivial: print UIDs as unsigned int UIDs are printed in the proc_fs as signed int, whereas they are unsigned int. Signed-off-by: Francesco Fusco Signed-off-by: David S. Miller --- net/appletalk/atalk_proc.c | 2 +- net/ipv4/ping.c | 2 +- net/ipv4/raw.c | 2 +- net/ipv4/tcp_ipv4.c | 4 ++-- net/ipv4/udp.c | 2 +- net/ipv6/datagram.c | 2 +- net/ipv6/tcp_ipv6.c | 4 ++-- net/ipx/ipx_proc.c | 2 +- net/llc/llc_proc.c | 2 +- net/phonet/socket.c | 2 +- net/sctp/proc.c | 4 ++-- 11 files changed, 14 insertions(+), 14 deletions(-) (limited to 'net/ipv6/tcp_ipv6.c') diff --git a/net/appletalk/atalk_proc.c b/net/appletalk/atalk_proc.c index c30f3a0717fb..af46bc49e1e9 100644 --- a/net/appletalk/atalk_proc.c +++ b/net/appletalk/atalk_proc.c @@ -178,7 +178,7 @@ static int atalk_seq_socket_show(struct seq_file *seq, void *v) at = at_sk(s); seq_printf(seq, "%02X %04X:%02X:%02X %04X:%02X:%02X %08X:%08X " - "%02X %d\n", + "%02X %u\n", s->sk_type, ntohs(at->src_net), at->src_node, at->src_port, ntohs(at->dest_net), at->dest_node, at->dest_port, sk_wmem_alloc_get(s), diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 746427c9e719..d7d9882d4cae 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -1082,7 +1082,7 @@ static void ping_v4_format_sock(struct sock *sp, struct seq_file *f, __u16 srcp = ntohs(inet->inet_sport); seq_printf(f, "%5d: %08X:%04X %08X:%04X" - " %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %d%n", + " %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %d%n", bucket, src, srcp, dest, destp, sp->sk_state, sk_wmem_alloc_get(sp), sk_rmem_alloc_get(sp), diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index dd44e0ab600c..41d84505a922 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -987,7 +987,7 @@ static void raw_sock_seq_show(struct seq_file *seq, struct sock *sp, int i) srcp = inet->inet_num; seq_printf(seq, "%4d: %08X:%04X %08X:%04X" - " %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %d\n", + " %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %d\n", i, src, srcp, dest, destp, sp->sk_state, sk_wmem_alloc_get(sp), sk_rmem_alloc_get(sp), diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index ec2702882d8d..05a3d45d3102 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2608,7 +2608,7 @@ static void get_openreq4(const struct sock *sk, const struct request_sock *req, long delta = req->expires - jiffies; seq_printf(f, "%4d: %08X:%04X %08X:%04X" - " %02X %08X:%08X %02X:%08lX %08X %5d %8d %u %d %pK%n", + " %02X %08X:%08X %02X:%08lX %08X %5u %8d %u %d %pK%n", i, ireq->loc_addr, ntohs(inet_sk(sk)->inet_sport), @@ -2666,7 +2666,7 @@ static void get_tcp4_sock(struct sock *sk, struct seq_file *f, int i, int *len) rx_queue = max_t(int, tp->rcv_nxt - tp->copied_seq, 0); seq_printf(f, "%4d: %08X:%04X %08X:%04X %02X %08X:%08X %02X:%08lX " - "%08X %5d %8d %lu %d %pK %lu %lu %u %u %d%n", + "%08X %5u %8d %lu %d %pK %lu %lu %u %u %d%n", i, src, srcp, dest, destp, sk->sk_state, tp->write_seq - tp->snd_una, rx_queue, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 9e88af0e8ab0..0b24508bcdc4 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2159,7 +2159,7 @@ static void udp4_format_sock(struct sock *sp, struct seq_file *f, __u16 srcp = ntohs(inet->inet_sport); seq_printf(f, "%5d: %08X:%04X %08X:%04X" - " %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %d%n", + " %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %d%n", bucket, src, srcp, dest, destp, sp->sk_state, sk_wmem_alloc_get(sp), sk_rmem_alloc_get(sp), diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c index 197e6f4a2b74..48b6bd2a9a14 100644 --- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -890,7 +890,7 @@ void ip6_dgram_sock_seq_show(struct seq_file *seq, struct sock *sp, src = &np->rcv_saddr; seq_printf(seq, "%5d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X " - "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %d\n", + "%02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %d\n", bucket, src->s6_addr32[0], src->s6_addr32[1], src->s6_addr32[2], src->s6_addr32[3], srcp, diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 38c196ca6011..5bcfadf09e95 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1731,7 +1731,7 @@ static void get_openreq6(struct seq_file *seq, seq_printf(seq, "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X " - "%02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %pK\n", + "%02X %08X:%08X %02X:%08lX %08X %5u %8d %d %d %pK\n", i, src->s6_addr32[0], src->s6_addr32[1], src->s6_addr32[2], src->s6_addr32[3], @@ -1782,7 +1782,7 @@ static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i) seq_printf(seq, "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X " - "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %lu %lu %u %u %d\n", + "%02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %lu %lu %u %u %d\n", i, src->s6_addr32[0], src->s6_addr32[1], src->s6_addr32[2], src->s6_addr32[3], srcp, diff --git a/net/ipx/ipx_proc.c b/net/ipx/ipx_proc.c index 65e8833a2510..e15c16a517e7 100644 --- a/net/ipx/ipx_proc.c +++ b/net/ipx/ipx_proc.c @@ -213,7 +213,7 @@ static int ipx_seq_socket_show(struct seq_file *seq, void *v) ntohs(ipxs->dest_addr.sock)); } - seq_printf(seq, "%08X %08X %02X %03d\n", + seq_printf(seq, "%08X %08X %02X %03u\n", sk_wmem_alloc_get(s), sk_rmem_alloc_get(s), s->sk_state, diff --git a/net/llc/llc_proc.c b/net/llc/llc_proc.c index 7b4799cfbf8d..1a3c7e0f5d0d 100644 --- a/net/llc/llc_proc.c +++ b/net/llc/llc_proc.c @@ -147,7 +147,7 @@ static int llc_seq_socket_show(struct seq_file *seq, void *v) } seq_printf(seq, "@%02X ", llc->sap->laddr.lsap); llc_ui_format_mac(seq, llc->daddr.mac); - seq_printf(seq, "@%02X %8d %8d %2d %3d %4d\n", llc->daddr.lsap, + seq_printf(seq, "@%02X %8d %8d %2d %3u %4d\n", llc->daddr.lsap, sk_wmem_alloc_get(sk), sk_rmem_alloc_get(sk) - llc->copied_seq, sk->sk_state, diff --git a/net/phonet/socket.c b/net/phonet/socket.c index 1afd1381cdc7..77e38f733496 100644 --- a/net/phonet/socket.c +++ b/net/phonet/socket.c @@ -793,7 +793,7 @@ static int pn_res_seq_show(struct seq_file *seq, void *v) struct sock **psk = v; struct sock *sk = *psk; - seq_printf(seq, "%02X %5d %lu%n", + seq_printf(seq, "%02X %5u %lu%n", (int) (psk - pnres.sk), from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)), sock_i_ino(sk), &len); diff --git a/net/sctp/proc.c b/net/sctp/proc.c index 82432bfb742f..0c0642156842 100644 --- a/net/sctp/proc.c +++ b/net/sctp/proc.c @@ -226,7 +226,7 @@ static int sctp_eps_seq_show(struct seq_file *seq, void *v) sk = epb->sk; if (!net_eq(sock_net(sk), seq_file_net(seq))) continue; - seq_printf(seq, "%8pK %8pK %-3d %-3d %-4d %-5d %5d %5lu ", ep, sk, + seq_printf(seq, "%8pK %8pK %-3d %-3d %-4d %-5d %5u %5lu ", ep, sk, sctp_sk(sk)->type, sk->sk_state, hash, epb->bind_addr.port, from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)), @@ -336,7 +336,7 @@ static int sctp_assocs_seq_show(struct seq_file *seq, void *v) continue; seq_printf(seq, "%8pK %8pK %-3d %-3d %-2d %-4d " - "%4d %8d %8d %7d %5lu %-5d %5d ", + "%4d %8d %8d %7u %5lu %-5d %5d ", assoc, sk, sctp_sk(sk)->type, sk->sk_state, assoc->state, hash, assoc->assoc_id, -- cgit v1.2.3 From c995ae2259ee36caf48bbfacf40111998dacd4af Mon Sep 17 00:00:00 2001 From: Vijay Subramanian Date: Tue, 3 Sep 2013 12:23:22 -0700 Subject: tcp: Change return value of tcp_rcv_established() tcp_rcv_established() returns only one value namely 0. We change the return value to void (as suggested by David Miller). After commit 0c24604b (tcp: implement RFC 5961 4.2), we no longer send RSTs in response to SYNs. We can remove the check and processing on the return value of tcp_rcv_established(). We also fix jtcp_rcv_established() in tcp_probe.c to match that of tcp_rcv_established(). Signed-off-by: Vijay Subramanian Signed-off-by: David S. Miller --- include/net/tcp.h | 4 ++-- net/ipv4/tcp_input.c | 13 ++++++------- net/ipv4/tcp_ipv4.c | 5 +---- net/ipv4/tcp_probe.c | 5 ++--- net/ipv6/tcp_ipv6.c | 3 +-- 5 files changed, 12 insertions(+), 18 deletions(-) (limited to 'net/ipv6/tcp_ipv6.c') diff --git a/include/net/tcp.h b/include/net/tcp.h index 6a6a88db462d..b1aa324c5e65 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -371,8 +371,8 @@ extern void tcp_delack_timer_handler(struct sock *sk); extern int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg); extern int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb, const struct tcphdr *th, unsigned int len); -extern int tcp_rcv_established(struct sock *sk, struct sk_buff *skb, - const struct tcphdr *th, unsigned int len); +extern void tcp_rcv_established(struct sock *sk, struct sk_buff *skb, + const struct tcphdr *th, unsigned int len); extern void tcp_rcv_space_adjust(struct sock *sk); extern void tcp_cleanup_rbuf(struct sock *sk, int copied); extern int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 1a84fffe6993..93d7e9de4143 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5049,8 +5049,8 @@ discard: * the rest is checked inline. Fast processing is turned on in * tcp_data_queue when everything is OK. */ -int tcp_rcv_established(struct sock *sk, struct sk_buff *skb, - const struct tcphdr *th, unsigned int len) +void tcp_rcv_established(struct sock *sk, struct sk_buff *skb, + const struct tcphdr *th, unsigned int len) { struct tcp_sock *tp = tcp_sk(sk); @@ -5127,7 +5127,7 @@ int tcp_rcv_established(struct sock *sk, struct sk_buff *skb, tcp_ack(sk, skb, 0); __kfree_skb(skb); tcp_data_snd_check(sk); - return 0; + return; } else { /* Header too small */ TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_INERRS); goto discard; @@ -5220,7 +5220,7 @@ no_ack: if (eaten) kfree_skb_partial(skb, fragstolen); sk->sk_data_ready(sk, 0); - return 0; + return; } } @@ -5236,7 +5236,7 @@ slow_path: */ if (!tcp_validate_incoming(sk, skb, th, 1)) - return 0; + return; step5: if (tcp_ack(sk, skb, FLAG_SLOWPATH | FLAG_UPDATE_TS_RECENT) < 0) @@ -5252,7 +5252,7 @@ step5: tcp_data_snd_check(sk); tcp_ack_snd_check(sk); - return 0; + return; csum_error: TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_CSUMERRORS); @@ -5260,7 +5260,6 @@ csum_error: discard: __kfree_skb(skb); - return 0; } EXPORT_SYMBOL(tcp_rcv_established); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 09d45d718973..b14266bb91eb 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1799,10 +1799,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb) sk->sk_rx_dst = NULL; } } - if (tcp_rcv_established(sk, skb, tcp_hdr(skb), skb->len)) { - rsk = sk; - goto reset; - } + tcp_rcv_established(sk, skb, tcp_hdr(skb), skb->len); return 0; } diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c index 1f6aa543b64e..611beab38a00 100644 --- a/net/ipv4/tcp_probe.c +++ b/net/ipv4/tcp_probe.c @@ -122,8 +122,8 @@ static inline int tcp_probe_avail(void) * Hook inserted to be called before each receive packet. * Note: arguments must match tcp_rcv_established()! */ -static int jtcp_rcv_established(struct sock *sk, struct sk_buff *skb, - const struct tcphdr *th, unsigned int len) +static void jtcp_rcv_established(struct sock *sk, struct sk_buff *skb, + const struct tcphdr *th, unsigned int len) { const struct tcp_sock *tp = tcp_sk(sk); const struct inet_sock *inet = inet_sk(sk); @@ -172,7 +172,6 @@ static int jtcp_rcv_established(struct sock *sk, struct sk_buff *skb, } jprobe_return(); - return 0; } static struct jprobe tcp_jprobe = { diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 5bcfadf09e95..9acdcedf9a14 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1360,8 +1360,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb) } } - if (tcp_rcv_established(sk, skb, tcp_hdr(skb), skb->len)) - goto reset; + tcp_rcv_established(sk, skb, tcp_hdr(skb), skb->len); if (opt_skb) goto ipv6_pktoptions; return 0; -- cgit v1.2.3 From 3a1c756590633c0e86df606e5c618c190926a0df Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Tue, 3 Sep 2013 19:29:12 +0200 Subject: net: ipv6: tcp: fix potential use after free in tcp_v6_do_rcv In tcp_v6_do_rcv() code, when processing pkt options, we soley work on our skb clone opt_skb that we've created earlier before entering tcp_rcv_established() on our way. However, only in condition ... if (np->rxopt.bits.rxtclass) np->rcv_tclass = ipv6_get_dsfield(ipv6_hdr(skb)); ... we work on skb itself. As we extract every other information out of opt_skb in ipv6_pktoptions path, this seems wrong, since skb can already be released by tcp_rcv_established() earlier on. When we try to access it in ipv6_hdr(), we will dereference freed skb. [ Bug added by commit 4c507d2897bd9b ("net: implement IP_RECVTOS for IP_PKTOPTIONS") ] Signed-off-by: Daniel Borkmann Cc: Eric Dumazet Acked-by: Eric Dumazet Acked-by: Jiri Benc Signed-off-by: David S. Miller --- net/ipv6/tcp_ipv6.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/ipv6/tcp_ipv6.c') diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 6e1649d58533..eeb4cb0c57c1 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1427,7 +1427,7 @@ ipv6_pktoptions: if (np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) np->mcast_hops = ipv6_hdr(opt_skb)->hop_limit; if (np->rxopt.bits.rxtclass) - np->rcv_tclass = ipv6_get_dsfield(ipv6_hdr(skb)); + np->rcv_tclass = ipv6_get_dsfield(ipv6_hdr(opt_skb)); if (ipv6_opt_accepted(sk, opt_skb)) { skb_set_owner_r(opt_skb, sk); opt_skb = xchg(&np->pktoptions, opt_skb); -- cgit v1.2.3