From b595076a180a56d1bb170e6eceda6eb9d76f4cd3 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Mon, 1 Nov 2010 15:38:34 -0400 Subject: tree-wide: fix comment/printk typos MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit "gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König Signed-off-by: Jiri Kosina --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/ipv4') diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 05b1ecf36763..e96152207164 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1336,7 +1336,7 @@ static inline unsigned int tcp_cwnd_test(struct tcp_sock *tp, return 0; } -/* Intialize TSO state of a skb. +/* Initialize TSO state of a skb. * This must be invoked the first time we consider transmitting * SKB onto the wire. */ -- cgit v1.2.3 From c996d8b9a8f37bd1b4dd7823abc42780b20998f8 Mon Sep 17 00:00:00 2001 From: Michael Witten Date: Mon, 15 Nov 2010 19:55:34 +0000 Subject: Docs/Kconfig: Update: osdl.org -> linuxfoundation.org Some of the documentation refers to web pages under the domain `osdl.org'. However, `osdl.org' now redirects to `linuxfoundation.org'. Rather than rely on redirections, this patch updates the addresses appropriately; for the most part, only documentation that is meant to be current has been updated. The patch should be pretty quick to scan and check; each new web-page url was gotten by trying out the original URL in a browser and then simply copying the the redirected URL (formatting as necessary). There is some conflict as to which one of these domain names is preferred: linuxfoundation.org linux-foundation.org So, I wrote: info@linuxfoundation.org and got this reply: Message-ID: <4CE17EE6.9040807@linuxfoundation.org> Date: Mon, 15 Nov 2010 10:41:42 -0800 From: David Ames ... linuxfoundation.org is preferred. The canonical name for our web site is www.linuxfoundation.org. Our list site is actually lists.linux-foundation.org. Regarding email linuxfoundation.org is preferred there are a few people who choose to use linux-foundation.org for their own reasons. Consequently, I used `linuxfoundation.org' for web pages and `lists.linux-foundation.org' for mailing-list web pages and email addresses; the only personal email address I updated from `@osdl.org' was that of Andrew Morton, who prefers `linux-foundation.org' according `git log'. Signed-off-by: Michael Witten Signed-off-by: Jiri Kosina --- Documentation/ko_KR/HOWTO | 4 ++-- Documentation/lguest/lguest.txt | 7 +++++-- Documentation/networking/bridge.txt | 4 ++-- Documentation/networking/dccp.txt | 4 ++-- Documentation/networking/generic_netlink.txt | 2 +- Documentation/zh_CN/HOWTO | 4 ++-- Documentation/zh_CN/SubmittingDrivers | 2 +- MAINTAINERS | 10 +++++----- net/Kconfig | 4 +++- net/dccp/Kconfig | 4 +++- net/ipv4/Kconfig | 4 +++- net/sched/Kconfig | 2 +- 12 files changed, 30 insertions(+), 21 deletions(-) (limited to 'net/ipv4') diff --git a/Documentation/ko_KR/HOWTO b/Documentation/ko_KR/HOWTO index e3a55b6091e9..ab5189ae3428 100644 --- a/Documentation/ko_KR/HOWTO +++ b/Documentation/ko_KR/HOWTO @@ -391,8 +391,8 @@ bugme-new 메일링 리스트나(새로운 버그 리포트들만이 이곳에 bugme-janitor 메일링 리스트(bugzilla에 모든 변화들이 여기서 메일로 전해진다) 에 등록하면 된다. - http://lists.osdl.org/mailman/listinfo/bugme-new - http://lists.osdl.org/mailman/listinfo/bugme-janitors + https://lists.linux-foundation.org/mailman/listinfo/bugme-new + https://lists.linux-foundation.org/mailman/listinfo/bugme-janitors diff --git a/Documentation/lguest/lguest.txt b/Documentation/lguest/lguest.txt index efb3a6a045a2..6ccaf8e1a00e 100644 --- a/Documentation/lguest/lguest.txt +++ b/Documentation/lguest/lguest.txt @@ -111,8 +111,11 @@ Running Lguest: Then use --tunnet=bridge:lg0 when launching the guest. - See http://linux-net.osdl.org/index.php/Bridge for general information - on how to get bridging working. + See: + + http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge + + for general information on how to get bridging to work. There is a helpful mailing list at http://ozlabs.org/mailman/listinfo/lguest diff --git a/Documentation/networking/bridge.txt b/Documentation/networking/bridge.txt index bec69a8a1697..a7ba5e4e2c91 100644 --- a/Documentation/networking/bridge.txt +++ b/Documentation/networking/bridge.txt @@ -1,8 +1,8 @@ In order to use the Ethernet bridging functionality, you'll need the userspace tools. These programs and documentation are available -at http://www.linux-foundation.org/en/Net:Bridge. The download page is +at http://www.linuxfoundation.org/en/Net:Bridge. The download page is http://prdownloads.sourceforge.net/bridge. If you still have questions, don't hesitate to post to the mailing list -(more info http://lists.osdl.org/mailman/listinfo/bridge). +(more info https://lists.linux-foundation.org/mailman/listinfo/bridge). diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index 271d524a4c8d..1355fa413cb7 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt @@ -38,11 +38,11 @@ The Linux DCCP implementation does not currently support all the features that a specified in RFCs 4340...42. The known bugs are at: - http://linux-net.osdl.org/index.php/TODO#DCCP + http://www.linuxfoundation.org/collaborate/workgroups/networking/todo#DCCP For more up-to-date versions of the DCCP implementation, please consider using the experimental DCCP test tree; instructions for checking this out are on: -http://linux-net.osdl.org/index.php/DCCP_Testing#Experimental_DCCP_source_tree +http://www.linuxfoundation.org/collaborate/workgroups/networking/dccp_testing#Experimental_DCCP_source_tree Socket options diff --git a/Documentation/networking/generic_netlink.txt b/Documentation/networking/generic_netlink.txt index d4f8b8b9b53c..3e071115ca90 100644 --- a/Documentation/networking/generic_netlink.txt +++ b/Documentation/networking/generic_netlink.txt @@ -1,3 +1,3 @@ A wiki document on how to use Generic Netlink can be found here: - * http://linux-net.osdl.org/index.php/Generic_Netlink_HOWTO + * http://www.linuxfoundation.org/collaborate/workgroups/networking/generic_netlink_howto diff --git a/Documentation/zh_CN/HOWTO b/Documentation/zh_CN/HOWTO index 69160779e432..faf976c0c731 100644 --- a/Documentation/zh_CN/HOWTO +++ b/Documentation/zh_CN/HOWTO @@ -347,8 +347,8 @@ bugzilla.kernel.org是Linux内核开发者们用来跟踪内核Bug的网站。 最新bug的通知,可以订阅bugme-new邮件列表(只有新的bug报告会被寄到这里) 或者订阅bugme-janitor邮件列表(所有bugzilla的变动都会被寄到这里)。 - http://lists.osdl.org/mailman/listinfo/bugme-new - http://lists.osdl.org/mailman/listinfo/bugme-janitors + https://lists.linux-foundation.org/mailman/listinfo/bugme-new + https://lists.linux-foundation.org/mailman/listinfo/bugme-janitors 邮件列表 diff --git a/Documentation/zh_CN/SubmittingDrivers b/Documentation/zh_CN/SubmittingDrivers index c27b0f6cdd39..5889f8df6312 100644 --- a/Documentation/zh_CN/SubmittingDrivers +++ b/Documentation/zh_CN/SubmittingDrivers @@ -61,7 +61,7 @@ Linux 2.4: Linux 2.6: 除了遵循和 2.4 版内核同样的规则外,你还需要在 linux-kernel 邮件 列表上跟踪最新的 API 变化。向 Linux 2.6 内核提交驱动的顶级联系人 - 是 Andrew Morton 。 + 是 Andrew Morton 。 决定设备驱动能否被接受的条件 ---------------------------- diff --git a/MAINTAINERS b/MAINTAINERS index cb8b58020352..4d204e2a9754 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1894,7 +1894,7 @@ F: drivers/scsi/dc395x.* DCCP PROTOCOL M: Arnaldo Carvalho de Melo L: dccp@vger.kernel.org -W: http://linux-net.osdl.org/index.php/DCCP +W: http://www.linuxfoundation.org/collaborate/workgroups/networking/dccp S: Maintained F: include/linux/dccp.h F: include/linux/tfrc.h @@ -2306,7 +2306,7 @@ ETHERNET BRIDGE M: Stephen Hemminger L: bridge@lists.linux-foundation.org L: netdev@vger.kernel.org -W: http://www.linux-foundation.org/en/Net:Bridge +W: http://www.linuxfoundation.org/en/Net:Bridge S: Maintained F: include/linux/netfilter_bridge/ F: net/bridge/ @@ -4478,7 +4478,7 @@ M: Jeremy Fitzhardinge M: Chris Wright M: Alok Kataria M: Rusty Russell -L: virtualization@lists.osdl.org +L: virtualization@lists.linux-foundation.org S: Supported F: Documentation/ia64/paravirt_ops.txt F: arch/*/kernel/paravirt* @@ -6352,7 +6352,7 @@ F: include/linux/virtio_console.h VIRTIO HOST (VHOST) M: "Michael S. Tsirkin" L: kvm@vger.kernel.org -L: virtualization@lists.osdl.org +L: virtualization@lists.linux-foundation.org L: netdev@vger.kernel.org S: Maintained F: drivers/vhost/ @@ -6613,7 +6613,7 @@ XEN HYPERVISOR INTERFACE M: Jeremy Fitzhardinge M: Konrad Rzeszutek Wilk L: xen-devel@lists.xen.org -L: virtualization@lists.osdl.org +L: virtualization@lists.linux-foundation.org S: Supported F: arch/x86/xen/ F: drivers/*/xen-*front.c diff --git a/net/Kconfig b/net/Kconfig index 55fd82e9ffd9..5e203b558703 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -247,7 +247,9 @@ config NET_TCPPROBE what was just said, you don't need it: say N. Documentation on how to use TCP connection probing can be found - at http://linux-net.osdl.org/index.php/TcpProbe + at: + + http://www.linuxfoundation.org/collaborate/workgroups/networking/tcpprobe To compile this code as a module, choose M here: the module will be called tcp_probe. diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig index ad6dffd9070e..b75968a04017 100644 --- a/net/dccp/Kconfig +++ b/net/dccp/Kconfig @@ -49,7 +49,9 @@ config NET_DCCPPROBE what was just said, you don't need it: say N. Documentation on how to use DCCP connection probing can be found - at http://linux-net.osdl.org/index.php/DccpProbe + at: + + http://www.linuxfoundation.org/collaborate/workgroups/networking/dccpprobe To compile this code as a module, choose M here: the module will be called dccp_probe. diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index 9e95d7fb6d5a..a5a1050595d1 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -432,7 +432,9 @@ config INET_DIAG ---help--- Support for INET (TCP, DCCP, etc) socket monitoring interface used by native Linux tools such as ss. ss is included in iproute2, currently - downloadable at . + downloadable at: + + http://www.linuxfoundation.org/collaborate/workgroups/networking/iproute2 If unsure, say Y. diff --git a/net/sched/Kconfig b/net/sched/Kconfig index a36270a994d7..f04d4a484d53 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -24,7 +24,7 @@ menuconfig NET_SCHED To administer these schedulers, you'll need the user-level utilities from the package iproute2+tc at . That package also contains some documentation; for more, check out - . + . This Quality of Service (QoS) support will enable you to use Differentiated Services (diffserv) and Resource Reservation Protocol -- cgit v1.2.3 From 0ab03c2b1478f2438d2c80204f7fef65b1bca9cf Mon Sep 17 00:00:00 2001 From: Jan Engelhardt Date: Fri, 7 Jan 2011 03:15:05 +0000 Subject: netlink: test for all flags of the NLM_F_DUMP composite Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT | NLM_F_MATCH, when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL, non-dump requests with NLM_F_EXCL set are mistaken as dump requests. Substitute the condition to test for _all_ bits being set. Signed-off-by: Jan Engelhardt Acked-by: Pablo Neira Ayuso Signed-off-by: David S. Miller --- net/core/rtnetlink.c | 2 +- net/ipv4/inet_diag.c | 2 +- net/netfilter/nf_conntrack_netlink.c | 4 ++-- net/netlink/genetlink.c | 2 +- net/xfrm/xfrm_user.c | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) (limited to 'net/ipv4') diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 750db57f3bb3..a5f7535aab5b 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1820,7 +1820,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN)) return -EPERM; - if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) { + if (kind == 2 && (nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) { struct sock *rtnl; rtnl_dumpit_func dumpit; diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 2ada17129fce..2746c1fa6417 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -858,7 +858,7 @@ static int inet_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) nlmsg_len(nlh) < hdrlen) return -EINVAL; - if (nlh->nlmsg_flags & NLM_F_DUMP) { + if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) { if (nlmsg_attrlen(nlh, hdrlen)) { struct nlattr *attr; diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 0cdba50c0d69..746140264b2d 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -928,7 +928,7 @@ ctnetlink_get_conntrack(struct sock *ctnl, struct sk_buff *skb, u16 zone; int err; - if (nlh->nlmsg_flags & NLM_F_DUMP) + if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) return netlink_dump_start(ctnl, skb, nlh, ctnetlink_dump_table, ctnetlink_done); @@ -1790,7 +1790,7 @@ ctnetlink_get_expect(struct sock *ctnl, struct sk_buff *skb, u16 zone; int err; - if (nlh->nlmsg_flags & NLM_F_DUMP) { + if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) { return netlink_dump_start(ctnl, skb, nlh, ctnetlink_exp_dump_table, ctnetlink_exp_done); diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index 1781d99145e2..f83cb370292b 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -519,7 +519,7 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) security_netlink_recv(skb, CAP_NET_ADMIN)) return -EPERM; - if (nlh->nlmsg_flags & NLM_F_DUMP) { + if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) { if (ops->dumpit == NULL) return -EOPNOTSUPP; diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 8eb889510916..6a8da81ff66f 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -2187,7 +2187,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) if ((type == (XFRM_MSG_GETSA - XFRM_MSG_BASE) || type == (XFRM_MSG_GETPOLICY - XFRM_MSG_BASE)) && - (nlh->nlmsg_flags & NLM_F_DUMP)) { + (nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) { if (link->dump == NULL) return -EINVAL; -- cgit v1.2.3 From 83723d60717f8da0f53f91cf42a845ed56c09662 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 10 Jan 2011 20:11:38 +0100 Subject: netfilter: x_tables: dont block BH while reading counters Using "iptables -L" with a lot of rules have a too big BH latency. Jesper mentioned ~6 ms and worried of frame drops. Switch to a per_cpu seqlock scheme, so that taking a snapshot of counters doesnt need to block BH (for this cpu, but also other cpus). This adds two increments on seqlock sequence per ipt_do_table() call, its a reasonable cost for allowing "iptables -L" not block BH processing. Reported-by: Jesper Dangaard Brouer Signed-off-by: Eric Dumazet CC: Patrick McHardy Acked-by: Stephen Hemminger Acked-by: Jesper Dangaard Brouer Signed-off-by: Pablo Neira Ayuso --- include/linux/netfilter/x_tables.h | 10 ++++----- net/ipv4/netfilter/arp_tables.c | 45 ++++++++++++-------------------------- net/ipv4/netfilter/ip_tables.c | 45 ++++++++++++-------------------------- net/ipv6/netfilter/ip6_tables.c | 45 ++++++++++++-------------------------- net/netfilter/x_tables.c | 3 ++- 5 files changed, 49 insertions(+), 99 deletions(-) (limited to 'net/ipv4') diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 742bec051440..6712e713b299 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -472,7 +472,7 @@ extern void xt_free_table_info(struct xt_table_info *info); * necessary for reading the counters. */ struct xt_info_lock { - spinlock_t lock; + seqlock_t lock; unsigned char readers; }; DECLARE_PER_CPU(struct xt_info_lock, xt_info_locks); @@ -497,7 +497,7 @@ static inline void xt_info_rdlock_bh(void) local_bh_disable(); lock = &__get_cpu_var(xt_info_locks); if (likely(!lock->readers++)) - spin_lock(&lock->lock); + write_seqlock(&lock->lock); } static inline void xt_info_rdunlock_bh(void) @@ -505,7 +505,7 @@ static inline void xt_info_rdunlock_bh(void) struct xt_info_lock *lock = &__get_cpu_var(xt_info_locks); if (likely(!--lock->readers)) - spin_unlock(&lock->lock); + write_sequnlock(&lock->lock); local_bh_enable(); } @@ -516,12 +516,12 @@ static inline void xt_info_rdunlock_bh(void) */ static inline void xt_info_wrlock(unsigned int cpu) { - spin_lock(&per_cpu(xt_info_locks, cpu).lock); + write_seqlock(&per_cpu(xt_info_locks, cpu).lock); } static inline void xt_info_wrunlock(unsigned int cpu) { - spin_unlock(&per_cpu(xt_info_locks, cpu).lock); + write_sequnlock(&per_cpu(xt_info_locks, cpu).lock); } /* diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 3fac340a28d5..e855fffaed95 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -710,42 +710,25 @@ static void get_counters(const struct xt_table_info *t, struct arpt_entry *iter; unsigned int cpu; unsigned int i; - unsigned int curcpu = get_cpu(); - - /* Instead of clearing (by a previous call to memset()) - * the counters and using adds, we set the counters - * with data used by 'current' CPU - * - * Bottom half has to be disabled to prevent deadlock - * if new softirq were to run and call ipt_do_table - */ - local_bh_disable(); - i = 0; - xt_entry_foreach(iter, t->entries[curcpu], t->size) { - SET_COUNTER(counters[i], iter->counters.bcnt, - iter->counters.pcnt); - ++i; - } - local_bh_enable(); - /* Processing counters from other cpus, we can let bottom half enabled, - * (preemption is disabled) - */ for_each_possible_cpu(cpu) { - if (cpu == curcpu) - continue; + seqlock_t *lock = &per_cpu(xt_info_locks, cpu).lock; + i = 0; - local_bh_disable(); - xt_info_wrlock(cpu); xt_entry_foreach(iter, t->entries[cpu], t->size) { - ADD_COUNTER(counters[i], iter->counters.bcnt, - iter->counters.pcnt); + u64 bcnt, pcnt; + unsigned int start; + + do { + start = read_seqbegin(lock); + bcnt = iter->counters.bcnt; + pcnt = iter->counters.pcnt; + } while (read_seqretry(lock, start)); + + ADD_COUNTER(counters[i], bcnt, pcnt); ++i; } - xt_info_wrunlock(cpu); - local_bh_enable(); } - put_cpu(); } static struct xt_counters *alloc_counters(const struct xt_table *table) @@ -759,7 +742,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table) * about). */ countersize = sizeof(struct xt_counters) * private->number; - counters = vmalloc(countersize); + counters = vzalloc(countersize); if (counters == NULL) return ERR_PTR(-ENOMEM); @@ -1007,7 +990,7 @@ static int __do_replace(struct net *net, const char *name, struct arpt_entry *iter; ret = 0; - counters = vmalloc(num_counters * sizeof(struct xt_counters)); + counters = vzalloc(num_counters * sizeof(struct xt_counters)); if (!counters) { ret = -ENOMEM; goto out; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index a846d633b3b6..652efea013dc 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -884,42 +884,25 @@ get_counters(const struct xt_table_info *t, struct ipt_entry *iter; unsigned int cpu; unsigned int i; - unsigned int curcpu = get_cpu(); - - /* Instead of clearing (by a previous call to memset()) - * the counters and using adds, we set the counters - * with data used by 'current' CPU. - * - * Bottom half has to be disabled to prevent deadlock - * if new softirq were to run and call ipt_do_table - */ - local_bh_disable(); - i = 0; - xt_entry_foreach(iter, t->entries[curcpu], t->size) { - SET_COUNTER(counters[i], iter->counters.bcnt, - iter->counters.pcnt); - ++i; - } - local_bh_enable(); - /* Processing counters from other cpus, we can let bottom half enabled, - * (preemption is disabled) - */ for_each_possible_cpu(cpu) { - if (cpu == curcpu) - continue; + seqlock_t *lock = &per_cpu(xt_info_locks, cpu).lock; + i = 0; - local_bh_disable(); - xt_info_wrlock(cpu); xt_entry_foreach(iter, t->entries[cpu], t->size) { - ADD_COUNTER(counters[i], iter->counters.bcnt, - iter->counters.pcnt); + u64 bcnt, pcnt; + unsigned int start; + + do { + start = read_seqbegin(lock); + bcnt = iter->counters.bcnt; + pcnt = iter->counters.pcnt; + } while (read_seqretry(lock, start)); + + ADD_COUNTER(counters[i], bcnt, pcnt); ++i; /* macro does multi eval of i */ } - xt_info_wrunlock(cpu); - local_bh_enable(); } - put_cpu(); } static struct xt_counters *alloc_counters(const struct xt_table *table) @@ -932,7 +915,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table) (other than comefrom, which userspace doesn't care about). */ countersize = sizeof(struct xt_counters) * private->number; - counters = vmalloc(countersize); + counters = vzalloc(countersize); if (counters == NULL) return ERR_PTR(-ENOMEM); @@ -1203,7 +1186,7 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks, struct ipt_entry *iter; ret = 0; - counters = vmalloc(num_counters * sizeof(struct xt_counters)); + counters = vzalloc(num_counters * sizeof(struct xt_counters)); if (!counters) { ret = -ENOMEM; goto out; diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 455582384ece..7d227c644f72 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -897,42 +897,25 @@ get_counters(const struct xt_table_info *t, struct ip6t_entry *iter; unsigned int cpu; unsigned int i; - unsigned int curcpu = get_cpu(); - - /* Instead of clearing (by a previous call to memset()) - * the counters and using adds, we set the counters - * with data used by 'current' CPU - * - * Bottom half has to be disabled to prevent deadlock - * if new softirq were to run and call ipt_do_table - */ - local_bh_disable(); - i = 0; - xt_entry_foreach(iter, t->entries[curcpu], t->size) { - SET_COUNTER(counters[i], iter->counters.bcnt, - iter->counters.pcnt); - ++i; - } - local_bh_enable(); - /* Processing counters from other cpus, we can let bottom half enabled, - * (preemption is disabled) - */ for_each_possible_cpu(cpu) { - if (cpu == curcpu) - continue; + seqlock_t *lock = &per_cpu(xt_info_locks, cpu).lock; + i = 0; - local_bh_disable(); - xt_info_wrlock(cpu); xt_entry_foreach(iter, t->entries[cpu], t->size) { - ADD_COUNTER(counters[i], iter->counters.bcnt, - iter->counters.pcnt); + u64 bcnt, pcnt; + unsigned int start; + + do { + start = read_seqbegin(lock); + bcnt = iter->counters.bcnt; + pcnt = iter->counters.pcnt; + } while (read_seqretry(lock, start)); + + ADD_COUNTER(counters[i], bcnt, pcnt); ++i; } - xt_info_wrunlock(cpu); - local_bh_enable(); } - put_cpu(); } static struct xt_counters *alloc_counters(const struct xt_table *table) @@ -945,7 +928,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table) (other than comefrom, which userspace doesn't care about). */ countersize = sizeof(struct xt_counters) * private->number; - counters = vmalloc(countersize); + counters = vzalloc(countersize); if (counters == NULL) return ERR_PTR(-ENOMEM); @@ -1216,7 +1199,7 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks, struct ip6t_entry *iter; ret = 0; - counters = vmalloc(num_counters * sizeof(struct xt_counters)); + counters = vzalloc(num_counters * sizeof(struct xt_counters)); if (!counters) { ret = -ENOMEM; goto out; diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 80463507420e..c94237631077 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1325,7 +1325,8 @@ static int __init xt_init(void) for_each_possible_cpu(i) { struct xt_info_lock *lock = &per_cpu(xt_info_locks, i); - spin_lock_init(&lock->lock); + + seqlock_init(&lock->lock); lock->readers = 0; } -- cgit v1.2.3 From 545ecdc3b3a2fe0b54a3053bf8bf85321bbca7da Mon Sep 17 00:00:00 2001 From: Maxim Levitsky Date: Sat, 8 Jan 2011 13:57:12 +0000 Subject: arp: allow to invalidate specific ARP entries IPv4 over firewire needs to be able to remove ARP entries from the ARP cache that belong to nodes that are removed, because IPv4 over firewire uses ARP packets for private information about nodes. This information becomes invalid as soon as node drops off the bus and when it reconnects, its only possible to start talking to it after it responded to an ARP packet. But ARP cache prevents such packets from being sent. Signed-off-by: Maxim Levitsky Signed-off-by: David S. Miller --- include/net/arp.h | 1 + net/ipv4/arp.c | 29 ++++++++++++++++++----------- 2 files changed, 19 insertions(+), 11 deletions(-) (limited to 'net/ipv4') diff --git a/include/net/arp.h b/include/net/arp.h index f4cf6ce66586..91f0568a04ef 100644 --- a/include/net/arp.h +++ b/include/net/arp.h @@ -25,5 +25,6 @@ extern struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, const unsigned char *src_hw, const unsigned char *target_hw); extern void arp_xmit(struct sk_buff *skb); +int arp_invalidate(struct net_device *dev, __be32 ip); #endif /* _ARP_H */ diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index a2fc7b961dbc..04c8b69fd426 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -1143,6 +1143,23 @@ static int arp_req_get(struct arpreq *r, struct net_device *dev) return err; } +int arp_invalidate(struct net_device *dev, __be32 ip) +{ + struct neighbour *neigh = neigh_lookup(&arp_tbl, &ip, dev); + int err = -ENXIO; + + if (neigh) { + if (neigh->nud_state & ~NUD_NOARP) + err = neigh_update(neigh, NULL, NUD_FAILED, + NEIGH_UPDATE_F_OVERRIDE| + NEIGH_UPDATE_F_ADMIN); + neigh_release(neigh); + } + + return err; +} +EXPORT_SYMBOL(arp_invalidate); + static int arp_req_delete_public(struct net *net, struct arpreq *r, struct net_device *dev) { @@ -1163,7 +1180,6 @@ static int arp_req_delete(struct net *net, struct arpreq *r, { int err; __be32 ip; - struct neighbour *neigh; if (r->arp_flags & ATF_PUBL) return arp_req_delete_public(net, r, dev); @@ -1181,16 +1197,7 @@ static int arp_req_delete(struct net *net, struct arpreq *r, if (!dev) return -EINVAL; } - err = -ENXIO; - neigh = neigh_lookup(&arp_tbl, &ip, dev); - if (neigh) { - if (neigh->nud_state & ~NUD_NOARP) - err = neigh_update(neigh, NULL, NUD_FAILED, - NEIGH_UPDATE_F_OVERRIDE| - NEIGH_UPDATE_F_ADMIN); - neigh_release(neigh); - } - return err; + return arp_invalidate(dev, ip); } /* -- cgit v1.2.3 From c191a836a908d1dd6b40c503741f91b914de3348 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 11 Jan 2011 01:14:22 +0000 Subject: tcp: disallow bind() to reuse addr/port inet_csk_bind_conflict() logic currently disallows a bind() if it finds a friend socket (a socket bound on same address/port) satisfying a set of conditions : 1) Current (to be bound) socket doesnt have sk_reuse set OR 2) other socket doesnt have sk_reuse set OR 3) other socket is in LISTEN state We should add the CLOSE state in the 3) condition, in order to avoid two REUSEADDR sockets in CLOSE state with same local address/port, since this can deny further operations. Note : a prior patch tried to address the problem in a different (and buggy) way. (commit fda48a0d7a8412ced tcp: bind() fix when many ports are bound). Reported-by: Gaspar Chilingarov Reported-by: Daniel Baluta Tested-by: Daniel Baluta Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv4/inet_connection_sock.c | 5 +++-- net/ipv6/inet6_connection_sock.c | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) (limited to 'net/ipv4') diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 25e318153f14..97e5fb765265 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -73,7 +73,7 @@ int inet_csk_bind_conflict(const struct sock *sk, !sk2->sk_bound_dev_if || sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) { if (!reuse || !sk2->sk_reuse || - sk2->sk_state == TCP_LISTEN) { + ((1 << sk2->sk_state) & (TCPF_LISTEN | TCPF_CLOSE))) { const __be32 sk2_rcv_saddr = sk_rcv_saddr(sk2); if (!sk2_rcv_saddr || !sk_rcv_saddr(sk) || sk2_rcv_saddr == sk_rcv_saddr(sk)) @@ -122,7 +122,8 @@ again: (tb->num_owners < smallest_size || smallest_size == -1)) { smallest_size = tb->num_owners; smallest_rover = rover; - if (atomic_read(&hashinfo->bsockets) > (high - low) + 1) { + if (atomic_read(&hashinfo->bsockets) > (high - low) + 1 && + !inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb)) { spin_unlock(&head->lock); snum = smallest_rover; goto have_snum; diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index e46305d1815a..d144e629d2b4 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -44,7 +44,7 @@ int inet6_csk_bind_conflict(const struct sock *sk, !sk2->sk_bound_dev_if || sk->sk_bound_dev_if == sk2->sk_bound_dev_if) && (!sk->sk_reuse || !sk2->sk_reuse || - sk2->sk_state == TCP_LISTEN) && + ((1 << sk2->sk_state) & (TCPF_LISTEN | TCPF_CLOSE))) && ipv6_rcv_saddr_equal(sk, sk2)) break; } -- cgit v1.2.3 From 4b0ef1f223be4e092632b4152ceec5627ac10f59 Mon Sep 17 00:00:00 2001 From: Dang Hongwu Date: Tue, 11 Jan 2011 07:13:33 +0000 Subject: ah: reload pointers to skb data after calling skb_cow_data() skb_cow_data() may allocate a new data buffer, so pointers on skb should be set after this function. Bug was introduced by commit dff3bb06 ("ah4: convert to ahash") and 8631e9bd ("ah6: convert to ahash"). Signed-off-by: Wang Xuefu Acked-by: Krzysztof Witek Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller --- net/ipv4/ah4.c | 7 ++++--- net/ipv6/ah6.c | 8 +++++--- 2 files changed, 9 insertions(+), 6 deletions(-) (limited to 'net/ipv4') diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index 880a5ec6dce0..86961bec70ab 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -314,14 +314,15 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb) skb->ip_summed = CHECKSUM_NONE; - ah = (struct ip_auth_hdr *)skb->data; - iph = ip_hdr(skb); - ihl = ip_hdrlen(skb); if ((err = skb_cow_data(skb, 0, &trailer)) < 0) goto out; nfrags = err; + ah = (struct ip_auth_hdr *)skb->data; + iph = ip_hdr(skb); + ihl = ip_hdrlen(skb); + work_iph = ah_alloc_tmp(ahash, nfrags, ihl + ahp->icv_trunc_len); if (!work_iph) goto out; diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c index ee82d4ef26ce..1aba54ae53c4 100644 --- a/net/ipv6/ah6.c +++ b/net/ipv6/ah6.c @@ -538,14 +538,16 @@ static int ah6_input(struct xfrm_state *x, struct sk_buff *skb) if (!pskb_may_pull(skb, ah_hlen)) goto out; - ip6h = ipv6_hdr(skb); - - skb_push(skb, hdr_len); if ((err = skb_cow_data(skb, 0, &trailer)) < 0) goto out; nfrags = err; + ah = (struct ip_auth_hdr *)skb->data; + ip6h = ipv6_hdr(skb); + + skb_push(skb, hdr_len); + work_iph = ah_alloc_tmp(ahash, nfrags, hdr_len + ahp->icv_trunc_len); if (!work_iph) goto out; -- cgit v1.2.3 From b8f3ab4290f1e720166e888ea2a1d1d44c4d15dd Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Tue, 18 Jan 2011 12:40:38 -0800 Subject: Revert "netlink: test for all flags of the NLM_F_DUMP composite" This reverts commit 0ab03c2b1478f2438d2c80204f7fef65b1bca9cf. It breaks several things including the avahi daemon. Signed-off-by: David S. Miller --- net/core/rtnetlink.c | 2 +- net/ipv4/inet_diag.c | 2 +- net/netfilter/nf_conntrack_netlink.c | 4 ++-- net/netlink/genetlink.c | 2 +- net/xfrm/xfrm_user.c | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) (limited to 'net/ipv4') diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index a5f7535aab5b..750db57f3bb3 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1820,7 +1820,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN)) return -EPERM; - if (kind == 2 && (nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) { + if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) { struct sock *rtnl; rtnl_dumpit_func dumpit; diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 2746c1fa6417..2ada17129fce 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -858,7 +858,7 @@ static int inet_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) nlmsg_len(nlh) < hdrlen) return -EINVAL; - if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) { + if (nlh->nlmsg_flags & NLM_F_DUMP) { if (nlmsg_attrlen(nlh, hdrlen)) { struct nlattr *attr; diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 2b7eef37875c..93297aaceb2b 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -924,7 +924,7 @@ ctnetlink_get_conntrack(struct sock *ctnl, struct sk_buff *skb, u16 zone; int err; - if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) + if (nlh->nlmsg_flags & NLM_F_DUMP) return netlink_dump_start(ctnl, skb, nlh, ctnetlink_dump_table, ctnetlink_done); @@ -1787,7 +1787,7 @@ ctnetlink_get_expect(struct sock *ctnl, struct sk_buff *skb, u16 zone; int err; - if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) { + if (nlh->nlmsg_flags & NLM_F_DUMP) { return netlink_dump_start(ctnl, skb, nlh, ctnetlink_exp_dump_table, ctnetlink_exp_done); diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index f83cb370292b..1781d99145e2 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -519,7 +519,7 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) security_netlink_recv(skb, CAP_NET_ADMIN)) return -EPERM; - if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) { + if (nlh->nlmsg_flags & NLM_F_DUMP) { if (ops->dumpit == NULL) return -EOPNOTSUPP; diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index d5e1e0b08890..61291965c5f6 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -2189,7 +2189,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) if ((type == (XFRM_MSG_GETSA - XFRM_MSG_BASE) || type == (XFRM_MSG_GETPOLICY - XFRM_MSG_BASE)) && - (nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) { + (nlh->nlmsg_flags & NLM_F_DUMP)) { if (link->dump == NULL) return -EINVAL; -- cgit v1.2.3 From c506653d35249bb4738bb139c24362e1ae724bc1 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 24 Jan 2011 13:16:16 -0800 Subject: net: arp_ioctl() must hold RTNL Commit 941666c2e3e0 "net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()" introduced a regression, reported by Jamie Heilman. "arp -Ds 192.168.2.41 eth0 pub" triggered the ASSERT_RTNL() assert in pneigh_lookup() Removing RTNL requirement from arp_ioctl() was a mistake, just revert that part. Reported-by: Jamie Heilman Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/core/dev.c | 3 ++- net/ipv4/arp.c | 11 +++++------ 2 files changed, 7 insertions(+), 7 deletions(-) (limited to 'net/ipv4') diff --git a/net/core/dev.c b/net/core/dev.c index 8393ec408cd4..1e5077d2cca6 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -749,7 +749,8 @@ EXPORT_SYMBOL(dev_get_by_index); * @ha: hardware address * * Search for an interface by MAC address. Returns NULL if the device - * is not found or a pointer to the device. The caller must hold RCU + * is not found or a pointer to the device. + * The caller must hold RCU or RTNL. * The returned device has not had its ref count increased * and the caller must therefore be careful about locking * diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 04c8b69fd426..7927589813b5 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -1017,14 +1017,13 @@ static int arp_req_set_proxy(struct net *net, struct net_device *dev, int on) IPV4_DEVCONF_ALL(net, PROXY_ARP) = on; return 0; } - if (__in_dev_get_rcu(dev)) { - IN_DEV_CONF_SET(__in_dev_get_rcu(dev), PROXY_ARP, on); + if (__in_dev_get_rtnl(dev)) { + IN_DEV_CONF_SET(__in_dev_get_rtnl(dev), PROXY_ARP, on); return 0; } return -ENXIO; } -/* must be called with rcu_read_lock() */ static int arp_req_set_public(struct net *net, struct arpreq *r, struct net_device *dev) { @@ -1233,10 +1232,10 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg) if (!(r.arp_flags & ATF_NETMASK)) ((struct sockaddr_in *)&r.arp_netmask)->sin_addr.s_addr = htonl(0xFFFFFFFFUL); - rcu_read_lock(); + rtnl_lock(); if (r.arp_dev[0]) { err = -ENODEV; - dev = dev_get_by_name_rcu(net, r.arp_dev); + dev = __dev_get_by_name(net, r.arp_dev); if (dev == NULL) goto out; @@ -1263,7 +1262,7 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg) break; } out: - rcu_read_unlock(); + rtnl_unlock(); if (cmd == SIOCGARP && !err && copy_to_user(arg, &r, sizeof(r))) err = -EFAULT; return err; -- cgit v1.2.3 From 3408404a4c2a4eead9d73b0bbbfe3f225b65f492 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Mon, 24 Jan 2011 14:37:46 -0800 Subject: inetpeer: Use correct AVL tree base pointer in inet_getpeer(). Family was hard-coded to AF_INET but should be daddr->family. This fixes crashes when unlinking ipv6 peer entries, since the unlink code was looking up the base pointer properly. Reported-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv4/inetpeer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/ipv4') diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c index d9bc85751c74..a96e65674ac3 100644 --- a/net/ipv4/inetpeer.c +++ b/net/ipv4/inetpeer.c @@ -475,7 +475,7 @@ static int cleanup_once(unsigned long ttl) struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create) { struct inet_peer __rcu **stack[PEER_MAXDEPTH], ***stackptr; - struct inet_peer_base *base = family_to_base(AF_INET); + struct inet_peer_base *base = family_to_base(daddr->family); struct inet_peer *p; /* Look up for the address quickly, lockless. -- cgit v1.2.3 From fd0273c5033630b8673554cd39660435d1ab2ac4 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 24 Jan 2011 14:41:20 -0800 Subject: tcp: fix bug in listening_get_next() commit a8b690f98baf9fb19 (tcp: Fix slowness in read /proc/net/tcp) introduced a bug in handling of SYN_RECV sockets. st->offset represents number of sockets found since beginning of listening_hash[st->bucket]. We should not reset st->offset when iterating through syn_table[st->sbucket], or else if more than ~25 sockets (if PAGE_SIZE=4096) are in SYN_RECV state, we exit from listening_get_next() with a too small st->offset Next time we enter tcp_seek_last_pos(), we are not able to seek past already found sockets. Reported-by: PK CC: Tom Herbert Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv4/tcp_ipv4.c | 1 - 1 file changed, 1 deletion(-) (limited to 'net/ipv4') diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 856f68466d49..02f583b3744a 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1994,7 +1994,6 @@ static void *listening_get_next(struct seq_file *seq, void *cur) } req = req->dl_next; } - st->offset = 0; if (++st->sbucket >= icsk->icsk_accept_queue.listen_opt->nr_table_entries) break; get_req: -- cgit v1.2.3 From 44f5324b5d13ef2187729d949eca442689627f39 Mon Sep 17 00:00:00 2001 From: Jerry Chu Date: Tue, 25 Jan 2011 13:46:30 -0800 Subject: TCP: fix a bug that triggers large number of TCP RST by mistake This patch fixes a bug that causes TCP RST packets to be generated on otherwise correctly behaved applications, e.g., no unread data on close,..., etc. To trigger the bug, at least two conditions must be met: 1. The FIN flag is set on the last data packet, i.e., it's not on a separate, FIN only packet. 2. The size of the last data chunk on the receive side matches exactly with the size of buffer posted by the receiver, and the receiver closes the socket without any further read attempt. This bug was first noticed on our netperf based testbed for our IW10 proposal to IETF where a large number of RST packets were observed. netperf's read side code meets the condition 2 above 100%. Before the fix, tcp_data_queue() will queue the last skb that meets condition 1 to sk_receive_queue even though it has fully copied out (skb_copy_datagram_iovec()) the data. Then if condition 2 is also met, tcp_recvmsg() often returns all the copied out data successfully without actually consuming the skb, due to a check "if ((chunk = len - tp->ucopy.len) != 0) {" and "len -= chunk;" after tcp_prequeue_process() that causes "len" to become 0 and an early exit from the big while loop. I don't see any reason not to free the skb whose data have been fully consumed in tcp_data_queue(), regardless of the FIN flag. We won't get there if MSG_PEEK is on. Am I missing some arcane cases related to urgent data? Signed-off-by: H.K. Jerry Chu Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/ipv4') diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 2549b29b062d..eb7f82ebf4a3 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4399,7 +4399,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) if (!skb_copy_datagram_iovec(skb, 0, tp->ucopy.iov, chunk)) { tp->ucopy.len -= chunk; tp->copied_seq += chunk; - eaten = (chunk == skb->len && !th->fin); + eaten = (chunk == skb->len); tcp_rcv_space_adjust(sk); } local_bh_disable(); -- cgit v1.2.3 From 709b46e8d90badda1898caea50483c12af178e96 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Sat, 29 Jan 2011 16:15:56 +0000 Subject: net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1, which unfortunately means the existing infrastructure for compat networking ioctls is insufficient. A trivial compact ioctl implementation would conflict with: SIOCAX25ADDUID SIOCAIPXPRISLT SIOCGETSGCNT_IN6 SIOCGETSGCNT SIOCRSSCAUSE SIOCX25SSUBSCRIP SIOCX25SDTEFACILITIES To make this work I have updated the compat_ioctl decode path to mirror the the normal ioctl decode path. I have added an ipv4 inet_compat_ioctl function so that I can have ipv4 specific compat ioctls. I have added a compat_ioctl function into struct proto so I can break out ioctls by which kind of ip socket I am using. I have added a compat_raw_ioctl function because SIOCGETSGCNT only works on raw sockets. I have added a ipmr_compat_ioctl that mirrors the normal ipmr_ioctl. This was necessary because unfortunately the struct layout for the SIOCGETSGCNT has unsigned longs in it so changes between 32bit and 64bit kernels. This change was sufficient to run a 32bit ip multicast routing daemon on a 64bit kernel. Reported-by: Bill Fenner Signed-off-by: Eric W. Biederman Signed-off-by: David S. Miller --- include/linux/mroute.h | 1 + include/net/sock.h | 2 ++ net/ipv4/af_inet.c | 16 ++++++++++++++++ net/ipv4/ipmr.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ net/ipv4/raw.c | 19 +++++++++++++++++++ 5 files changed, 84 insertions(+) (limited to 'net/ipv4') diff --git a/include/linux/mroute.h b/include/linux/mroute.h index 0fa7a3a874c8..b21d567692b2 100644 --- a/include/linux/mroute.h +++ b/include/linux/mroute.h @@ -150,6 +150,7 @@ static inline int ip_mroute_opt(int opt) extern int ip_mroute_setsockopt(struct sock *, int, char __user *, unsigned int); extern int ip_mroute_getsockopt(struct sock *, int, char __user *, int __user *); extern int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg); +extern int ipmr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg); extern int ip_mr_init(void); #else static inline diff --git a/include/net/sock.h b/include/net/sock.h index d884d268c704..bc1cf7d88ccb 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -753,6 +753,8 @@ struct proto { int level, int optname, char __user *optval, int __user *option); + int (*compat_ioctl)(struct sock *sk, + unsigned int cmd, unsigned long arg); #endif int (*sendmsg)(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index f2b61107df6c..45b89d7bda5a 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -880,6 +880,19 @@ int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) } EXPORT_SYMBOL(inet_ioctl); +#ifdef CONFIG_COMPAT +int inet_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) +{ + struct sock *sk = sock->sk; + int err = -ENOIOCTLCMD; + + if (sk->sk_prot->compat_ioctl) + err = sk->sk_prot->compat_ioctl(sk, cmd, arg); + + return err; +} +#endif + const struct proto_ops inet_stream_ops = { .family = PF_INET, .owner = THIS_MODULE, @@ -903,6 +916,7 @@ const struct proto_ops inet_stream_ops = { #ifdef CONFIG_COMPAT .compat_setsockopt = compat_sock_common_setsockopt, .compat_getsockopt = compat_sock_common_getsockopt, + .compat_ioctl = inet_compat_ioctl, #endif }; EXPORT_SYMBOL(inet_stream_ops); @@ -929,6 +943,7 @@ const struct proto_ops inet_dgram_ops = { #ifdef CONFIG_COMPAT .compat_setsockopt = compat_sock_common_setsockopt, .compat_getsockopt = compat_sock_common_getsockopt, + .compat_ioctl = inet_compat_ioctl, #endif }; EXPORT_SYMBOL(inet_dgram_ops); @@ -959,6 +974,7 @@ static const struct proto_ops inet_sockraw_ops = { #ifdef CONFIG_COMPAT .compat_setsockopt = compat_sock_common_setsockopt, .compat_getsockopt = compat_sock_common_getsockopt, + .compat_ioctl = inet_compat_ioctl, #endif }; diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index 3f3a9afd73e0..7e41ac0b9260 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -60,6 +60,7 @@ #include #include #include +#include #include #include #include @@ -1434,6 +1435,51 @@ int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg) } } +#ifdef CONFIG_COMPAT +struct compat_sioc_sg_req { + struct in_addr src; + struct in_addr grp; + compat_ulong_t pktcnt; + compat_ulong_t bytecnt; + compat_ulong_t wrong_if; +}; + +int ipmr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg) +{ + struct sioc_sg_req sr; + struct mfc_cache *c; + struct net *net = sock_net(sk); + struct mr_table *mrt; + + mrt = ipmr_get_table(net, raw_sk(sk)->ipmr_table ? : RT_TABLE_DEFAULT); + if (mrt == NULL) + return -ENOENT; + + switch (cmd) { + case SIOCGETSGCNT: + if (copy_from_user(&sr, arg, sizeof(sr))) + return -EFAULT; + + rcu_read_lock(); + c = ipmr_cache_find(mrt, sr.src.s_addr, sr.grp.s_addr); + if (c) { + sr.pktcnt = c->mfc_un.res.pkt; + sr.bytecnt = c->mfc_un.res.bytes; + sr.wrong_if = c->mfc_un.res.wrong_if; + rcu_read_unlock(); + + if (copy_to_user(arg, &sr, sizeof(sr))) + return -EFAULT; + return 0; + } + rcu_read_unlock(); + return -EADDRNOTAVAIL; + default: + return -ENOIOCTLCMD; + } +} +#endif + static int ipmr_device_event(struct notifier_block *this, unsigned long event, void *ptr) { diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index a3d5ab786e81..6390ba299b3d 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -76,6 +76,7 @@ #include #include #include +#include static struct raw_hashinfo raw_v4_hashinfo = { .lock = __RW_LOCK_UNLOCKED(raw_v4_hashinfo.lock), @@ -838,6 +839,23 @@ static int raw_ioctl(struct sock *sk, int cmd, unsigned long arg) } } +#ifdef CONFIG_COMPAT +static int compat_raw_ioctl(struct sock *sk, unsigned int cmd, unsigned long arg) +{ + switch (cmd) { + case SIOCOUTQ: + case SIOCINQ: + return -ENOIOCTLCMD; + default: +#ifdef CONFIG_IP_MROUTE + return ipmr_compat_ioctl(sk, cmd, compat_ptr(arg)); +#else + return -ENOIOCTLCMD; +#endif + } +} +#endif + struct proto raw_prot = { .name = "RAW", .owner = THIS_MODULE, @@ -860,6 +878,7 @@ struct proto raw_prot = { #ifdef CONFIG_COMPAT .compat_setsockopt = compat_raw_setsockopt, .compat_getsockopt = compat_raw_getsockopt, + .compat_ioctl = compat_raw_ioctl, #endif }; -- cgit v1.2.3 From ec831ea72ee5d7d473899e27a86bd659482c4d0d Mon Sep 17 00:00:00 2001 From: Roland Dreier Date: Mon, 31 Jan 2011 13:16:00 -0800 Subject: net: Add default_mtu() methods to blackhole dst_ops When an IPSEC SA is still being set up, __xfrm_lookup() will return -EREMOTE and so ip_route_output_flow() will return a blackhole route. This can happen in a sndmsg call, and after d33e455337ea ("net: Abstract default MTU metric calculation behind an accessor.") this leads to a crash in ip_append_data() because the blackhole dst_ops have no default_mtu() method and so dst_mtu() calls a NULL pointer. Fix this by adding default_mtu() methods (that simply return 0, matching the old behavior) to the blackhole dst_ops. The IPv4 part of this patch fixes a crash that I saw when using an IPSEC VPN; the IPv6 part is untested because I don't have an IPv6 VPN, but it looks to be needed as well. Signed-off-by: Roland Dreier Signed-off-by: David S. Miller --- net/ipv4/route.c | 6 ++++++ net/ipv6/route.c | 6 ++++++ 2 files changed, 12 insertions(+) (limited to 'net/ipv4') diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 351dc4e85242..788a3e74834e 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2707,6 +2707,11 @@ static struct dst_entry *ipv4_blackhole_dst_check(struct dst_entry *dst, u32 coo return NULL; } +static unsigned int ipv4_blackhole_default_mtu(const struct dst_entry *dst) +{ + return 0; +} + static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst, u32 mtu) { } @@ -2716,6 +2721,7 @@ static struct dst_ops ipv4_dst_blackhole_ops = { .protocol = cpu_to_be16(ETH_P_IP), .destroy = ipv4_dst_destroy, .check = ipv4_blackhole_dst_check, + .default_mtu = ipv4_blackhole_default_mtu, .update_pmtu = ipv4_rt_blackhole_update_pmtu, }; diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 28a85fc63cb8..1c29f95695de 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -113,6 +113,11 @@ static struct dst_ops ip6_dst_ops_template = { .local_out = __ip6_local_out, }; +static unsigned int ip6_blackhole_default_mtu(const struct dst_entry *dst) +{ + return 0; +} + static void ip6_rt_blackhole_update_pmtu(struct dst_entry *dst, u32 mtu) { } @@ -122,6 +127,7 @@ static struct dst_ops ip6_dst_blackhole_ops = { .protocol = cpu_to_be16(ETH_P_IPV6), .destroy = ip6_dst_destroy, .check = ip6_dst_check, + .default_mtu = ip6_blackhole_default_mtu, .update_pmtu = ip6_rt_blackhole_update_pmtu, }; -- cgit v1.2.3 From 9d0db8b6b1da9e3d4c696ef29449700c58d589db Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 1 Feb 2011 16:03:46 +0100 Subject: netfilter: arpt_mangle: fix return values of checkentry In 135367b "netfilter: xtables: change xt_target.checkentry return type", the type returned by checkentry was changed from boolean to int, but the return values where not adjusted. arptables: Input/output error This broke arptables with the mangle target since it returns true under success, which is interpreted by xtables as >0, thus returning EIO. Signed-off-by: Pablo Neira Ayuso Signed-off-by: Patrick McHardy --- net/ipv4/netfilter/arpt_mangle.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'net/ipv4') diff --git a/net/ipv4/netfilter/arpt_mangle.c b/net/ipv4/netfilter/arpt_mangle.c index b8ddcc480ed9..a5e52a9f0a12 100644 --- a/net/ipv4/netfilter/arpt_mangle.c +++ b/net/ipv4/netfilter/arpt_mangle.c @@ -60,12 +60,12 @@ static int checkentry(const struct xt_tgchk_param *par) if (mangle->flags & ~ARPT_MANGLE_MASK || !(mangle->flags & ARPT_MANGLE_MASK)) - return false; + return -EINVAL; if (mangle->target != NF_DROP && mangle->target != NF_ACCEPT && mangle->target != XT_CONTINUE) - return false; - return true; + return -EINVAL; + return 0; } static struct xt_target arpt_mangle_reg __read_mostly = { -- cgit v1.2.3 From 0033d5ad27a6db33a55ff39951d3ec61a8c13b89 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Thu, 3 Feb 2011 17:21:31 -0800 Subject: net: Fix bug in compat SIOCGETSGCNT handling. Commit 709b46e8d90badda1898caea50483c12af178e96 ("net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT") added the correct plumbing to handle SIOCGETSGCNT properly. However, whilst definiting a proper "struct compat_sioc_sg_req" it isn't actually used in ipmr_compat_ioctl(). Correct this oversight. Signed-off-by: David S. Miller --- net/ipv4/ipmr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/ipv4') diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index 7e41ac0b9260..fce10a080bd6 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -1446,7 +1446,7 @@ struct compat_sioc_sg_req { int ipmr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg) { - struct sioc_sg_req sr; + struct compat_sioc_sg_req sr; struct mfc_cache *c; struct net *net = sock_net(sk); struct mr_table *mrt; -- cgit v1.2.3 From ca6b8bb097c8e0ab6bce4fa04584074dee17c0d9 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Thu, 3 Feb 2011 17:24:28 -0800 Subject: net: Support compat SIOCGETVIFCNT ioctl in ipv4. Signed-off-by: David S. Miller --- net/ipv4/ipmr.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) (limited to 'net/ipv4') diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index fce10a080bd6..8b65a12654e7 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -1444,9 +1444,19 @@ struct compat_sioc_sg_req { compat_ulong_t wrong_if; }; +struct compat_sioc_vif_req { + vifi_t vifi; /* Which iface */ + compat_ulong_t icount; + compat_ulong_t ocount; + compat_ulong_t ibytes; + compat_ulong_t obytes; +}; + int ipmr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg) { struct compat_sioc_sg_req sr; + struct compat_sioc_vif_req vr; + struct vif_device *vif; struct mfc_cache *c; struct net *net = sock_net(sk); struct mr_table *mrt; @@ -1456,6 +1466,26 @@ int ipmr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg) return -ENOENT; switch (cmd) { + case SIOCGETVIFCNT: + if (copy_from_user(&vr, arg, sizeof(vr))) + return -EFAULT; + if (vr.vifi >= mrt->maxvif) + return -EINVAL; + read_lock(&mrt_lock); + vif = &mrt->vif_table[vr.vifi]; + if (VIF_EXISTS(mrt, vr.vifi)) { + vr.icount = vif->pkt_in; + vr.ocount = vif->pkt_out; + vr.ibytes = vif->bytes_in; + vr.obytes = vif->bytes_out; + read_unlock(&mrt_lock); + + if (copy_to_user(arg, &vr, sizeof(vr))) + return -EFAULT; + return 0; + } + read_unlock(&mrt_lock); + return -EADDRNOTAVAIL; case SIOCGETSGCNT: if (copy_from_user(&sr, arg, sizeof(sr))) return -EFAULT; -- cgit v1.2.3 From 946bf5ee3c46f73b5cbd52aab594697b1a132d1f Mon Sep 17 00:00:00 2001 From: Steffen Klassert Date: Fri, 11 Feb 2011 11:21:57 -0800 Subject: ip_gre: Add IPPROTO_GRE to flowi in ipgre_tunnel_xmit Commit 5811662b15db018c740c57d037523683fd3e6123 ("net: use the macros defined for the members of flowi") accidentally removed the setting of IPPROTO_GRE from the struct flowi in ipgre_tunnel_xmit. This patch restores it. Signed-off-by: Steffen Klassert Acked-by: Changli Gao Signed-off-by: David S. Miller --- net/ipv4/ip_gre.c | 1 + 1 file changed, 1 insertion(+) (limited to 'net/ipv4') diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index eb68a0e34e49..6613edfac28c 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -775,6 +775,7 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev .fl4_dst = dst, .fl4_src = tiph->saddr, .fl4_tos = RT_TOS(tos), + .proto = IPPROTO_GRE, .fl_gre_key = tunnel->parms.o_key }; if (ip_route_output_key(dev_net(dev), &rt, &fl)) { -- cgit v1.2.3 From d11327ad6695db8117c78d70611e71102ceec2ac Mon Sep 17 00:00:00 2001 From: Ian Campbell Date: Fri, 11 Feb 2011 07:44:16 +0000 Subject: arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS. NETDEV_NOTIFY_PEER is an explicit request by the driver to send a link notification while NETDEV_UP/NETDEV_CHANGEADDR generate link notifications as a sort of side effect. In the later cases the sysctl option is present because link notification events can have undesired effects e.g. if the link is flapping. I don't think this applies in the case of an explicit request from a driver. This patch makes NETDEV_NOTIFY_PEER unconditional, if preferred we could add a new sysctl for this case which defaults to on. This change causes Xen post-migration ARP notifications (which cause switches to relearn their MAC tables etc) to be sent by default. Signed-off-by: Ian Campbell Signed-off-by: David S. Miller --- net/ipv4/devinet.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) (limited to 'net/ipv4') diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 748cb5b337bd..df4616fce929 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1030,6 +1030,21 @@ static inline bool inetdev_valid_mtu(unsigned mtu) return mtu >= 68; } +static void inetdev_send_gratuitous_arp(struct net_device *dev, + struct in_device *in_dev) + +{ + struct in_ifaddr *ifa = in_dev->ifa_list; + + if (!ifa) + return; + + arp_send(ARPOP_REQUEST, ETH_P_ARP, + ifa->ifa_address, dev, + ifa->ifa_address, NULL, + dev->dev_addr, NULL); +} + /* Called only under RTNL semaphore */ static int inetdev_event(struct notifier_block *this, unsigned long event, @@ -1082,18 +1097,13 @@ static int inetdev_event(struct notifier_block *this, unsigned long event, } ip_mc_up(in_dev); /* fall through */ - case NETDEV_NOTIFY_PEERS: case NETDEV_CHANGEADDR: + if (!IN_DEV_ARP_NOTIFY(in_dev)) + break; + /* fall through */ + case NETDEV_NOTIFY_PEERS: /* Send gratuitous ARP to notify of link change */ - if (IN_DEV_ARP_NOTIFY(in_dev)) { - struct in_ifaddr *ifa = in_dev->ifa_list; - - if (ifa) - arp_send(ARPOP_REQUEST, ETH_P_ARP, - ifa->ifa_address, dev, - ifa->ifa_address, NULL, - dev->dev_addr, NULL); - } + inetdev_send_gratuitous_arp(dev, in_dev); break; case NETDEV_DOWN: ip_mc_down(in_dev); -- cgit v1.2.3 From 214f45c91bbda8321d9676f1197238e4663edcbb Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 18 Feb 2011 11:39:01 -0800 Subject: net: provide default_advmss() methods to blackhole dst_ops Commit 0dbaee3b37e118a (net: Abstract default ADVMSS behind an accessor.) introduced a possible crash in tcp_connect_init(), when dst->default_advmss() is called from dst_metric_advmss() Reported-by: George Spelvin Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv4/route.c | 1 + net/ipv6/route.c | 1 + 2 files changed, 2 insertions(+) (limited to 'net/ipv4') diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 788a3e74834e..6ed6603c2f6d 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2722,6 +2722,7 @@ static struct dst_ops ipv4_dst_blackhole_ops = { .destroy = ipv4_dst_destroy, .check = ipv4_blackhole_dst_check, .default_mtu = ipv4_blackhole_default_mtu, + .default_advmss = ipv4_default_advmss, .update_pmtu = ipv4_rt_blackhole_update_pmtu, }; diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 1c29f95695de..a998db6e7895 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -128,6 +128,7 @@ static struct dst_ops ip6_dst_blackhole_ops = { .destroy = ip6_dst_destroy, .check = ip6_dst_check, .default_mtu = ip6_blackhole_default_mtu, + .default_advmss = ip6_default_advmss, .update_pmtu = ip6_rt_blackhole_update_pmtu, }; -- cgit v1.2.3 From 91035f0b7d89291af728b6f3e370c3be58fcbe1b Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 18 Feb 2011 22:35:56 +0000 Subject: tcp: fix inet_twsk_deschedule() Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule() This is caused by inet_twsk_purge(), run from process context, and commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.) removed the BH disabling that was necessary. Add the BH disabling but fine grained, right before calling inet_twsk_deschedule(), instead of whole function. With help from Linus Torvalds and Eric W. Biederman Reported-by: Eric W. Biederman Signed-off-by: Eric Dumazet CC: Daniel Lezcano CC: Pavel Emelyanov CC: Arnaldo Carvalho de Melo CC: stable (# 2.6.33+) Signed-off-by: David S. Miller --- net/ipv4/inet_timewait_sock.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net/ipv4') diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index c5af909cf701..3c8dfa16614d 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -505,7 +505,9 @@ restart: } rcu_read_unlock(); + local_bh_disable(); inet_twsk_deschedule(tw, twdr); + local_bh_enable(); inet_twsk_put(tw); goto restart_rcu; } -- cgit v1.2.3 From c24f691b56107feeba076616982093ee2d3c8fb5 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Mon, 7 Feb 2011 12:57:04 +0000 Subject: tcp: undo_retrans counter fixes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix a bug that undo_retrans is incorrectly decremented when undo_marker is not set or undo_retrans is already 0. This happens when sender receives more DSACK ACKs than packets retransmitted during the current undo phase. This may also happen when sender receives DSACK after the undo operation is completed or cancelled. Fix another bug that undo_retrans is incorrectly incremented when sender retransmits an skb and tcp_skb_pcount(skb) > 1 (TSO). This case is rare but not impossible. Signed-off-by: Yuchung Cheng Acked-by: Ilpo Järvinen Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 5 +++-- net/ipv4/tcp_output.c | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) (limited to 'net/ipv4') diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index eb7f82ebf4a3..65f6c0406245 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1222,7 +1222,7 @@ static int tcp_check_dsack(struct sock *sk, struct sk_buff *ack_skb, } /* D-SACK for already forgotten data... Do dumb counting. */ - if (dup_sack && + if (dup_sack && tp->undo_marker && tp->undo_retrans && !after(end_seq_0, prior_snd_una) && after(end_seq_0, tp->undo_marker)) tp->undo_retrans--; @@ -1299,7 +1299,8 @@ static u8 tcp_sacktag_one(struct sk_buff *skb, struct sock *sk, /* Account D-SACK for retransmitted packet. */ if (dup_sack && (sacked & TCPCB_RETRANS)) { - if (after(TCP_SKB_CB(skb)->end_seq, tp->undo_marker)) + if (tp->undo_marker && tp->undo_retrans && + after(TCP_SKB_CB(skb)->end_seq, tp->undo_marker)) tp->undo_retrans--; if (sacked & TCPCB_SACKED_ACKED) state->reord = min(fack_count, state->reord); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 406f320336e6..dfa5beb0c1c8 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2162,7 +2162,7 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb) if (!tp->retrans_stamp) tp->retrans_stamp = TCP_SKB_CB(skb)->when; - tp->undo_retrans++; + tp->undo_retrans += tcp_skb_pcount(skb); /* snd_nxt is stored to detect loss of retransmitted segment, * see tcp_input.c tcp_sacktag_write_queue(). -- cgit v1.2.3