summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2014-01-16neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions.Jiri Pirko1-2/+2
When ndo_neigh_setup is called, the bitfield used by NEIGH_VAR_SET is not initialized yet. This might cause confusion for the people who use NEIGH_VAR_SET in ndo_neigh_setup. So rather introduce NEIGH_VAR_INIT for usage in ndo_neigh_setup. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTEThomas Haller1-75/+109
Refactor the deletion/update of prefix routes when removing an address. Now also consider IFA_F_NOPREFIXROUTE and if there is an address present with this flag, to not cleanup the route. Instead, assume that userspace is taking care of this route. Also perform the same cleanup, when userspace changes an existing address to add NOPREFIXROUTE (to an address that didn't have this flag). This is done because when the address was added, a prefix route was created for it. Since the user now wants to handle this route by himself, we cleanup this route. This cleanup of the route is not totally robust. There is no guarantee, that the route we are about to delete was really the one added by the kernel. This behavior does not change by the patch, and in practice it should work just fine. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routesThomas Haller1-6/+13
When adding/modifying an IPv6 address, the userspace application needs a way to suppress adding a prefix route. This is for example relevant together with IFA_F_MANAGERTEMPADDR, where userspace creates autoconf generated addresses, but depending on on-link, no route for the prefix should be added. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16Revert "batman-adv: drop dependency against CRC16"David S. Miller1-0/+1
This reverts commit 12afc36e38b3b6a0ec9bda71632c2285e7fdbab2. The dependency is actually still necessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16sctp: create helper function to enable|disable sackdelaywangweidong1-18/+19
add sctp_spp_sackdelay_{enable|disable} helper function for avoiding code duplication. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16ipv6: move IPV6_TCLASS_SHIFT into ipv6.h and define a helperLi RongQing4-6/+3
Two places defined IPV6_TCLASS_SHIFT, so we should move it into ipv6.h, and use this macro as possible. And define ip6_tclass helper to return tclass Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16net: move 6lowpan compression code to separate moduleDmitry Eremin-Solenikov4-2/+11
IEEE 802.15.4 and Bluetooth networking stacks share 6lowpan compression code. Instead of introducing Makefile/Kconfig hacks, build this code as a separate module referenced from both ieee802154 and bluetooth modules. This fixes the following build error observed in some kernel configurations: net/built-in.o: In function `header_create': 6lowpan.c:(.text+0x166149): undefined reference to `lowpan_header_compress' net/built-in.o: In function `bt_6lowpan_recv': (.text+0x166b3c): undefined reference to `lowpan_process_data' Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Dmitry Eremin-Solenikov <dmitry_eremin@mentor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16net: rename sysfs symlinks on device name changeVeaceslav Falico1-0/+22
Currently, we don't rename the upper/lower_ifc symlinks in /sys/class/net/*/ , which might result stale/duplicate links/names. Fix this by adding netdev_adjacent_rename_links(dev, oldname) which renames all the upper/lower interface's links to dev from the upper/lower_oldname to the new name. We don't need a rollback because only we control these symlinks and if we fail to rename them - sysfs will anyway complain. Reported-by: Ding Tianhong <dingtianhong@huawei.com> CC: Ding Tianhong <dingtianhong@huawei.com> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16net: add sysfs helpers for netdev_adjacent logicVeaceslav Falico1-27/+30
They clean up the code a bit and can be used further. CC: Ding Tianhong <dingtianhong@huawei.com> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16neigh: split lines for NEIGH_VAR_SET so they are not too longJiri Pirko1-3/+6
introduced by: commit 1f9248e5606afc6485255e38ad57bdac08fa7711 "neigh: convert parms to an array" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15tcp: do not export tcp_gso_segment() and tcp_gro_receive()Eric Dumazet1-2/+0
tcp_gso_segment() and tcp_gro_receive() no longer need to be exported. IPv4 and IPv6 offloads are statically linked. Note that tcp_gro_complete() is still used by bnx2x, unfortunately. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15net: nl80211: __dev_get_by_index instead of dev_get_by_index to find interfaceYing Xue1-65/+37
As __cfg80211_rdev_from_attrs(), nl80211_dump_wiphy_parse() and nl80211_set_wiphy() are all under rtnl_lock protection, __dev_get_by_index() instead of dev_get_by_index() should be used to find interface handler in them allowing us to avoid to change interface reference counter. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15can: use __dev_get_by_index instead of dev_get_by_index to find interfaceYing Xue1-10/+5
As cgw_create_job() is always under rtnl_lock protection, __dev_get_by_index() instead of dev_get_by_index() should be used to find interface handler in it having us avoid to change interface reference counter. Cc: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15caif: __dev_get_by_index instead of dev_get_by_index to find interfaceYing Xue1-2/+1
The following call chains indicate that chnl_net_open() is under rtnl_lock protection as __dev_open() is protected by rtnl_lock. So if __dev_get_by_index() instead of dev_get_by_index() is used to find interface handler in it, this would help us avoid to change interface reference counter. __dev_open() chnl_net_open() Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15batman-adv: use __dev_get_by_index instead of dev_get_by_index to find interfaceYing Xue1-3/+1
The following call chains indicate that batadv_is_on_batman_iface() is always under rtnl_lock protection as call_netdevice_notifier() is protected by rtnl_lock. So if __dev_get_by_index() rather than dev_get_by_index() is used to find interface handler in it, this would help us avoid to change interface reference counter. call_netdevice_notifier() batadv_hard_if_event() batadv_hardif_add_interface() batadv_is_valid_iface() batadv_is_on_batman_iface() Cc: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15decnet: use __dev_get_by_index instead of dev_get_by_index to find interfaceYing Xue1-8/+2
The following call chain we can identify that dn_cache_getroute() is protected under rtnl_lock. So if we use __dev_get_by_index() instead of dev_get_by_index() to find interface handlers in it, this would help us avoid to change interface reference counter. rtnetlink_rcv() rtnl_lock() netlink_rcv_skb() dn_cache_getroute() rtnl_unlock() Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15dcb: use __dev_get_by_name instead of dev_get_by_name to find interfaceYing Xue1-10/+5
The following call chain indicates that dcb_doit() is protected under rtnl_lock. So if we use __dev_get_by_name() instead of dev_get_by_name() to find interface handlers in it, this would help us avoid to change interface reference counter. rtnetlink_rcv() rtnl_lock() netlink_rcv_skb() dcb_doit() rtnl_unlock() Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15IPv6: move the anycast_src_echo_reply sysctl to netns_sysctl_ipv6FX Le Bail2-3/+3
This change move anycast_src_echo_reply sysctl with other ipv6 sysctls. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15sctp: remove a redundant NULL checkDan Carpenter1-1/+1
It confuses Smatch when we check "sinit" for NULL and then non-NULL and that causes a false positive warning later. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15tipc: spelling fixesstephen hemminger3-3/+3
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15ipv6: addrconf spelling fixesstephen hemminger1-5/+5
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15net: Spelling s/transmition/transmission/Geert Uytterhoeven1-1/+1
Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15net: replace macros net_random and net_srandom with direct calls to prandomAruna-Hewapathirane27-46/+48
This patch removes the net_random and net_srandom macros and replaces them with direct calls to the prandom ones. As new commits only seem to use prandom_u32 there is no use to keep them around. This change makes it easier to grep for users of prandom_u32. Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15ipv6: copy traffic class from ping request to replyHannes Frederic Sowa1-1/+4
Suggested-by: Simon Schneider <simon-schneider@gmx.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15ipv4: register igmp_notifier even when !CONFIG_PROC_FSWANG Cong2-4/+8
We still need this notifier even when we don't config PROC_FS. It should be rare to have a kernel without PROC_FS, so just for completeness. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15net: Add trace events for all receive entry points, exposing more skb fieldsBen Hutchings1-39/+61
The existing net/netif_rx and net/netif_receive_skb trace events provide little information about the skb, nor do they indicate how it entered the stack. Add trace events at entry of each of the exported functions, including most fields that are likely to be interesting for debugging driver datapath behaviour. Split netif_rx() and netif_receive_skb() so that internal calls are not traced. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15net: Add net_dev_start_xmit trace event, exposing more skb fieldsBen Hutchings1-0/+2
The existing net/net_dev_xmit trace event provides little information about the skb that has been passed to the driver, and it is not simple to add more since the skb may already have been freed at the point the event is emitted. Add a separate trace event before the skb is passed to the driver, including most fields that are likely to be interesting for debugging driver datapath behaviour. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15net: Fix indentation in dev_hard_start_xmit()Ben Hutchings1-2/+1
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller13-39/+87
2014-01-15net: add skb_checksum_setupPaul Durrant1-0/+273
This patch adds a function to set up the partial checksum offset for IP packets (and optionally re-calculate the pseudo-header checksum) into the core network code. The implementation was previously private and duplicated between xen-netback and xen-netfront, however it is not xen-specific and is potentially useful to any network driver. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14bridge: move br_net_exit() to br.cWANG Cong3-19/+18
And it can become static. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14Merge branch 'master' of ↵David S. Miller6-48/+122
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Conflicts: net/xfrm/xfrm_policy.c Steffen Klassert says: ==================== This pull request has a merge conflict between commits be7928d20bab ("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and da7c224b1baa ("net: xfrm: xfrm_policy: silence compiler warning") from the net-next tree and commit 2f3ea9a95c58 ("xfrm: checkpatch erros with inline keyword position") from the ipsec-next tree. The version from net-next can be used, like it is done in linux-next. 1) Checkpatch cleanups, from Weilong Chen. 2) Fix lockdep complaints when pktgen is used with IPsec, from Fan Du. 3) Update pktgen to allow any combination of IPsec transport/tunnel mode and AH/ESP/IPcomp type, from Fan Du. 4) Make pktgen_dst_metrics static, Fengguang Wu. 5) Compile fix for pktgen when CONFIG_XFRM is not set, from Fan Du. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait socketsNeal Cardwell1-1/+4
Fix inet_diag_dump_icsk() to reflect the fact that both TCP_TIME_WAIT and TCP_FIN_WAIT2 connections are represented by inet_timewait_sock (not just TIME_WAIT), and for such sockets the tw_substate field holds the real state, which can be either TCP_TIME_WAIT or TCP_FIN_WAIT2. This brings the inet_diag state-matching code in line with the field it uses to populate idiag_state. This is also analogous to the info exported in /proc/net/tcp, where get_tcp4_sock() exports sk->sk_state and get_timewait4_sock() exports tw->tw_substate. Before fixing this, (a) neither "ss -nemoi" nor "ss -nemoi state fin-wait-2" would return a socket in TCP_FIN_WAIT2; and (b) "ss -nemoi state time-wait" would also return sockets in state TCP_FIN_WAIT2. This is an old bug that predates 05dbc7b ("tcp/dccp: remove twchain"). Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller42-763/+1648
Included changes: - drop dependency against CRC16 - move to new release version - add size check at compile time for packet structs - update copyright years in every file - implement new bonding/interface alternation feature Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14net: make dev_set_mtu() honor notification return codeVeaceslav Falico1-9/+25
Currently, after changing the MTU for a device, dev_set_mtu() calls NETDEV_CHANGEMTU notification, however doesn't verify it's return code - which can be NOTIFY_BAD - i.e. some of the net notifier blocks refused this change, and continues nevertheless. To fix this, verify the return code, and if it's an error - then revert the MTU to the original one, notify again and pass the error code. CC: Jiri Pirko <jiri@resnulli.us> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14sctp: make sctp_addto_chunk_fixed localstephen hemminger1-2/+4
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14l2tp: make local functions staticstephen hemminger2-6/+2
Avoid needless export of local functions Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13gre_offload: simplify GRE header length calculation in gre_gso_segment()Neal Cardwell1-10/+6
Simplify the GRE header length calculation in gre_gso_segment(). Switch to an approach that is simpler, faster, and more general. The new approach will continue to be correct even if we add support for the optional variable-length routing info that may be present in a GRE header. Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13net_sched: act: remove struct tcf_act_hdrWANG Cong1-8/+8
It is not necessary at all. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13net_sched: avoid casting void pointerWANG Cong5-26/+23
tp->root is a void* pointer, no need to cast it. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13net_sched: optimize tcf_match_indev()WANG Cong2-14/+24
tcf_match_indev() is called in fast path, it is not wise to search for a netdev by ifindex and then compare by its name, just compare the ifindex. Also, dev->name could be changed by user-space, therefore the match would be always fail, but dev->ifindex could be consistent. BTW, this will also save some bytes from the core struct of u32. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13net_sched: add struct net pointer to tcf_proto_ops->dumpWANG Cong10-14/+15
It will be needed by the next patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13net_sched: act: clean up notification functionsWANG Cong1-55/+40
Refactor tcf_add_notify() and factor out tcf_del_notify(). Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13net_sched: act: move idx_gen into struct tcf_hashinfoWANG Cong10-23/+14
There is no need to store the index separatedly since tcf_hashinfo is allocated statically too. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13net: gro: change GRO overflow strategyEric Dumazet1-2/+15
GRO layer has a limit of 8 flows being held in GRO list, for performance reason. When a packet comes for a flow not yet in the list, and list is full, we immediately give it to upper stacks, lowering aggregation performance. With TSO auto sizing and FQ packet scheduler, this situation happens more often. This patch changes strategy to simply evict the oldest flow of the list. This works better because of the nature of packet trains for which GRO is efficient. This also has the effect of lowering the GRO latency if many flows are competing. Tested : Used a 40Gbps NIC, with 4 RX queues, and 200 concurrent TCP_STREAM netperf. Before patch, aggregate rate is 11Gbps (while a single flow can reach 30Gbps) After patch, line rate is reached. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jerry Chu <hkchu@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13gre_offload: fix sparse non static symbol warningWei Yongjun1-1/+1
Fixes the following sparse warning: net/ipv4/gre_offload.c:253:5: warning: symbol 'gre_gro_complete' was not declared. Should it be static? Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13ipv4: introduce hardened ip_no_pmtu_disc modeHannes Frederic Sowa4-4/+27
This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors to be honored by protocols which do more stringent validation on the ICMP's packet payload. This knob is useful for people who e.g. want to run an unmodified DNS server in a namespace where they need to use pmtu for TCP connections (as they are used for zone transfers or fallback for requests) but don't want to use possibly spoofed UDP pmtu information. Currently the whitelisted protocols are TCP, SCTP and DCCP as they check if the returned packet is in the window or if the association is valid. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Suggested-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13ipv6: introduce ip6_dst_mtu_forward and protect forwarding path with itHannes Frederic Sowa1-1/+22
In the IPv6 forwarding path we are only concerend about the outgoing interface MTU, but also respect locked MTUs on routes. Tunnel provider or IPSEC already have to recheck and if needed send PtB notifications to the sending host in case the data does not fit into the packet with added headers (we only know the final header sizes there, while also using path MTU information). The reason for this change is, that path MTU information can be injected into the kernel via e.g. icmp_err protocol handler without verification of local sockets. As such, this could cause the IPv6 forwarding path to wrongfully emit Packet-too-Big errors and drop IPv6 packets. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against ↵Hannes Frederic Sowa4-8/+17
pmtu spoofing While forwarding we should not use the protocol path mtu to calculate the mtu for a forwarded packet but instead use the interface mtu. We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was introduced for multicast forwarding. But as it does not conflict with our usage in unicast code path it is perfect for reuse. I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular dependencies because of IPSKB_FORWARDED. Because someone might have written a software which does probe destinations manually and expects the kernel to honour those path mtus I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone can disable this new behaviour. We also still use mtus which are locked on a route for forwarding. The reason for this change is, that path mtus information can be injected into the kernel via e.g. icmp_err protocol handler without verification of local sockets. As such, this could cause the IPv4 forwarding path to wrongfully emit fragmentation needed notifications or start to fragment packets along a path. Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED won't be set and further fragmentation logic will use the path mtu to determine the fragmentation size. They also recheck packet size with help of path mtu discovery and report appropriate errors. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13HHF qdisc: fix jiffies-time conversion.Terry Lam1-6/+6
This is to be compatible with the use of "get_time" (i.e. default time unit in us) in iproute2 patch for HHF as requested by Stephen. Signed-off-by: Terry Lam <vtlam@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>