summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2015-09-22tcp: usec resolution SYN/ACK RTTYuchung Cheng1-0/+1
Currently SYN/ACK RTT is measured in jiffies. For LAN the SYN/ACK RTT is often measured as 0ms or sometimes 1ms, which would affect RTT estimation and min RTT samping used by some congestion control. This patch improves SYN/ACK RTT to be usec resolution if platform supports it. While the timestamping of SYN/ACK is done in request sock, the RTT measurement is carefully arranged to avoid storing another u64 timestamp in tcp_sock. For regular handshake w/o SYNACK retransmission, the RTT is sampled right after the child socket is created and right before the request sock is released (tcp_check_req() in tcp_minisocks.c) For Fast Open the child socket is already created when SYN/ACK was sent, the RTT is sampled in tcp_rcv_state_process() after processing the final ACK an right before the request socket is released. If the SYN/ACK was retransmistted or SYN-cookie was used, we rely on TCP timestamps to measure the RTT. The sample is taken at the same place in tcp_rcv_state_process() after the timestamp values are validated in tcp_validate_incoming(). Note that we do not store TS echo value in request_sock for SYN-cookies, because the value is already stored in tp->rx_opt used by tcp_ack_update_rtt(). One side benefit is that the RTT measurement now happens before initializing congestion control (of the passive side). Therefore the congestion control can use the SYN/ACK RTT. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-22s390/iucv: do not use arrays as argumentUrsula Braun1-10/+10
The iucv code uses arrays as arguments. Even though this does not really cause a problem, it could be misleading, since the compiler turns array arguments into just a pointer argument. To be more precise this patch changes the array arguments into pointers. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-22Merge branch 'for-upstream' of ↵David S. Miller5-120/+53
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-09-18 Here's the first bluetooth-next pull request for the 4.4 kernel: - ieee802154 cleanups & fixes - debugfs support for the at86rf230 driver - Support for quirky (seemingly counterfeit) CSR Bluetooth controllers - Power management and device config improvements for Intel controllers - Fix for devices with incorrect advertising data length - Fix for closing HCI user channel socket Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-18Bluetooth: Add BT_ERR_RATELIMITEDSzymon Janc1-0/+4
This patch adds ratelimited version of the BT_ERR macro. Signed-off-by: Szymon Janc <ext.szymon.janc@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-09-18bpf: add bpf_redirect() helperAlexei Starovoitov1-0/+1
Existing bpf_clone_redirect() helper clones skb before redirecting it to RX or TX of destination netdev. Introduce bpf_redirect() helper that does that without cloning. Benchmarked with two hosts using 10G ixgbe NICs. One host is doing line rate pktgen. Another host is configured as: $ tc qdisc add dev $dev ingress $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop so it receives the packet on $dev and immediately xmits it on $dev + 1 The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program that does bpf_clone_redirect() and performance is 2.0 Mpps $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ action bpf run object-file tcbpf1_kern.o section redirect_xmit drop which is using bpf_redirect() - 2.4 Mpps and using cls_bpf with integrated actions as: $ tc filter add dev $dev root pref 10 \ bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1 performance is 2.5 Mpps To summarize: u32+act_bpf using clone_redirect - 2.0 Mpps u32+act_bpf using redirect - 2.4 Mpps cls_bpf using redirect - 2.5 Mpps For comparison linux bridge in this setup is doing 2.1 Mpps and ixgbe rx + drop in ip_rcv - 7.8 Mpps Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-18cls_bpf: introduce integrated actionsDaniel Borkmann1-1/+1
Often cls_bpf classifier is used with single action drop attached. Optimize this use case and let cls_bpf return both classid and action. For backwards compatibility reasons enable this feature under TCA_BPF_FLAG_ACT_DIRECT flag. Then more interesting programs like the following are easier to write: int cls_bpf_prog(struct __sk_buff *skb) { /* classify arp, ip, ipv6 into different traffic classes * and drop all other packets */ switch (skb->protocol) { case htons(ETH_P_ARP): skb->tc_classid = 1; break; case htons(ETH_P_IP): skb->tc_classid = 2; break; case htons(ETH_P_IPV6): skb->tc_classid = 3; break; default: return TC_ACT_SHOT; } return TC_ACT_OK; } Joint work with Daniel Borkmann. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-18tcp: provide skb->hash to synack packetsEric Dumazet1-4/+8
In commit b73c3d0e4f0e ("net: Save TX flow hash in sock and set in skbuf on xmit"), Tom provided a l4 hash to most outgoing TCP packets. We'd like to provide one as well for SYNACK packets, so that all packets of a given flow share same txhash, to later enable bonding driver to also use skb->hash to perform slave selection. Note that a SYNACK retransmit shuffles the tx hash, as Tom did in commit 265f94ff54d62 ("net: Recompute sk_txhash on negative routing advice") for established sockets. This has nice effect making TCP flows resilient to some kind of black holes, even at connection establish phase. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Mahesh Bandewar <maheshb@google.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-18netfilter: Pass net into okfnEric W. Biederman4-5/+9
This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-18net: Merge dst_output and dst_output_skEric W. Biederman1-5/+1
Add a sock paramter to dst_output making dst_output_sk superfluous. Add a skb->sk parameter to all of the callers of dst_output Have the callers of dst_output_sk call dst_output. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-18xfrm: Remove unused afinfo method init_dstEric W. Biederman1-2/+0
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17ieee802154: af_ieee802154: fix typo in comment.Stefan Schmidt1-1/+1
Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-09-17ieee802154: 6lowpan: remove tx full-size calc workaroundAlexander Aring1-113/+0
This patch removes a workaround for datagram_size calculation while doing fragmentation on transmit. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-09-17ieee802154: 6lowpan: check on valid 802.15.4 frameAlexander Aring1-0/+15
This patch adds frame control checks to check if the received frame is something which could contain a 6LoWPAN packet. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-09-17ieee820154: 6lowpan: dispatch evaluation reworkAlexander Aring1-6/+25
This patch complete reworks the evaluation of 6lowpan dispatch value by introducing a receive handler mechanism for each dispatch value. A list of changes: - Doing uncompression on-the-fly when FRAG1 is received, this require some special handling for 802.15.4 lltype in generic 6lowpan branch for setting the payload length correct. - Fix dispatch mask for fragmentation. - Add IPv6 dispatch evaluation for FRAG1. - Add skb_unshare for dispatch which might manipulate the skb data buffer. Cc: Jukka Rissanen <jukka.rissanen@linux.intel.com> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-09-17Bluetooth: close HCI device when user channel socket gets closedSimon Fels1-0/+1
With 9380f9eacfbbee701daa416edd6625efcd3e29e1 the order of unsetting the HCI_USER_CHANNEL flag of the HCI device was reverted to ensure the device is first closed before making it available again. Due to hci_dev_close checking for HCI_USER_CHANNEL being set on the device it was never really closed and was kept opened. We're now calling hci_dev_do_close directly to make sure the device is correctly closed and we keep the correct order to unset the flag on our device object. Signed-off-by: Simon Fels <simon.fels@canonical.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-09-17Bluetooth: Add bt_dev logging macrosLoic Poulain1-0/+7
Add specific bluetooth device logging macros since hci device name is repeatedly referred in bluetooth subsystem logs. Signed-off-by: Loic Poulain <loic.poulain@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-09-16rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 statsSowmini Varadhan1-1/+2
Many commonly used functions like getifaddrs() invoke RTM_GETLINK to dump the interface information, and do not need the the AF_INET6 statististics that are always returned by default from rtnl_fill_ifinfo(). Computing the statistics can be an expensive operation that impacts scaling, so it is desirable to avoid this if the information is not needed. This patch adds a the RTEXT_FILTER_SKIP_STATS extended info flag that can be passed with netlink_request() to avoid statistics computation for the ifinfo path. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-15net: Add FIB table id to rtableDavid Ahern1-0/+2
Add the FIB table id to rtable to make the information available for IPv4 as it is for IPv6. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds5-5/+6
Pull networking fixes from David Miller: 1) Fix out-of-bounds array access in netfilter ipset, from Jozsef Kadlecsik. 2) Use correct free operation on netfilter conntrack templates, from Daniel Borkmann. 3) Fix route leak in SCTP, from Marcelo Ricardo Leitner. 4) Fix sizeof(pointer) in mac80211, from Thierry Reding. 5) Fix cache pointer comparison in ip6mr leading to missed unlock of mrt_lock. From Richard Laing. 6) rds_conn_lookup() needs to consider network namespace in key comparison, from Sowmini Varadhan. 7) Fix deadlock in TIPC code wrt broadcast link wakeups, from Kolmakov Dmitriy. 8) Fix fd leaks in bpf syscall, from Daniel Borkmann. 9) Fix error recovery when installing ipv6 multipath routes, we would delete the old route before we would know if we could fully commit to the new set of nexthops. Fix from Roopa Prabhu. 10) Fix run-time suspend problems in r8152, from Hayes Wang. 11) In fec, don't program the MAC address into the chip when the clocks are gated off. From Fugang Duan. 12) Fix poll behavior for netlink sockets when using rx ring mmap, from Daniel Borkmann. 13) Don't allocate memory with GFP_KERNEL from get_stats64 in r8169 driver, from Corinna Vinschen. 14) In TCP Cubic congestion control, handle idle periods better where we are application limited, in order to keep cwnd from growing out of control. From Eric Dumzet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits) tcp_cubic: better follow cubic curve after idle period tcp: generate CA_EVENT_TX_START on data frames xen-netfront: respect user provided max_queues xen-netback: respect user provided max_queues r8169: Fix sleeping function called during get_stats64, v2 ether: add IEEE 1722 ethertype - TSN netlink, mmap: fix edge-case leakages in nf queue zero-copy netlink, mmap: don't walk rx ring on poll if receive queue non-empty cxgb4: changes for new firmware 1.14.4.0 net: fec: add netif status check before set mac address r8152: fix the runtime suspend issues r8152: split DRIVER_VERSION ipv6: fix ifnullfree.cocci warnings add microchip LAN88xx phy driver stmmac: fix check for phydev being open net: qlcnic: delete redundant memsets net: mv643xx_eth: use kzalloc net: jme: use kzalloc() instead of kmalloc+memset net: cavium: liquidio: use kzalloc in setup_glist() net: ipv6: use common fib_default_rule_pref ...
2015-09-10net: ipv6: use common fib_default_rule_prefPhil Sutter1-2/+0
This switches IPv6 policy routing to use the shared fib_default_rule_pref() function of IPv4 and DECnet. It is also used in multicast routing for IPv4 as well as IPv6. The motivation for this patch is a complaint about iproute2 behaving inconsistent between IPv4 and IPv6 when adding policy rules: Formerly, IPv6 rules were assigned a fixed priority of 0x3FFF whereas for IPv4 the assigned priority value was decreased with each rule added. Since then all users of the default_pref field have been converted to assign the generic function fib_default_rule_pref(), fib_nl_newrule() may just use it directly instead. Therefore get rid of the function pointer altogether and make fib_default_rule_pref() static, as it's not used outside fib_rules.c anymore. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-09Merge tag 'for-linus' of ↵Linus Torvalds2-0/+38
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull inifiniband/rdma updates from Doug Ledford: "This is a fairly sizeable set of changes. I've put them through a decent amount of testing prior to sending the pull request due to that. There are still a few fixups that I know are coming, but I wanted to go ahead and get the big, sizable chunk into your hands sooner rather than waiting for those last few fixups. Of note is the fact that this creates what is intended to be a temporary area in the drivers/staging tree specifically for some cleanups and additions that are coming for the RDMA stack. We deprecated two drivers (ipath and amso1100) and are waiting to hear back if we can deprecate another one (ehca). We also put Intel's new hfi1 driver into this area because it needs to be refactored and a transfer library created out of the factored out code, and then it and the qib driver and the soft-roce driver should all be modified to use that library. I expect drivers/staging/rdma to be around for three or four kernel releases and then to go away as all of the work is completed and final deletions of deprecated drivers are done. Summary of changes for 4.3: - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits) IB/ipoib: Suppress warning for send only join failures IB/ipoib: Clean up send-only multicast joins IB/srp: Fix possible protection fault IB/core: Move SM class defines from ib_mad.h to ib_smi.h IB/core: Remove unnecessary defines from ib_mad.h IB/hfi1: Add PSM2 user space header to header_install IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY mlx5: Fix incorrect wc pkey_index assignment for GSI messages IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow IB/uverbs: reject invalid or unknown opcodes IB/cxgb4: Fix if statement in pick_local_ip6adddrs IB/sa: Fix rdma netlink message flags IB/ucma: HW Device hot-removal support IB/mlx4_ib: Disassociate support IB/uverbs: Enable device removal when there are active user space applications IB/uverbs: Explicitly pass ib_dev to uverbs commands IB/uverbs: Fix race between ib_uverbs_open and remove_one IB/uverbs: Fix reference counting usage of event files IB/core: Make ib_dealloc_pd return void IB/srp: Create an insecure all physical rkey only if needed ...
2015-09-09memcg: move memcg_proto_active from sock.hMichal Hocko1-5/+0
The only user is sock_update_memcg which is living in memcontrol.c so it doesn't make much sense to pollute sock.h by this inline helper. Move it to memcontrol.c and open code it into its only caller. Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-09memcg: export struct mem_cgroupMichal Hocko1-28/+0
mem_cgroup structure is defined in mm/memcontrol.c currently which means that the code outside of this file has to use external API even for trivial access stuff. This patch exports mm_struct with its dependencies and makes some of the exported functions inlines. This even helps to reduce the code size a bit (make defconfig + CONFIG_MEMCG=y) text data bss dec hex filename 12355346 1823792 1089536 15268674 e8fb42 vmlinux.before 12354970 1823792 1089536 15268298 e8f9ca vmlinux.after This is not much (370B) but better than nothing. We also save a function call in some hot paths like callers of mem_cgroup_count_vm_event which is used for accounting. The patch doesn't introduce any functional changes. [vdavykov@parallels.com: inline memcg_kmem_is_active] [vdavykov@parallels.com: do not expose type outside of CONFIG_MEMCG] [akpm@linux-foundation.org: memcontrol.h needs eventfd.h for eventfd_ctx] [akpm@linux-foundation.org: export mem_cgroup_from_task() to modules] Signed-off-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Vladimir Davydov <vdavydov@parallels.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-07Merge tag 'mac80211-for-davem-2015-09-04' of ↵David S. Miller1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== For the first round of fixes, we have this: * fix for the sizeof() pointer type issue * a fix for regulatory getting into a restore loop * a fix for rfkill global 'all' state, it needs to be stored everywhere to apply correctly to new rfkill instances * properly refuse CQM RSSI when it cannot actually be used * protect HT TDLS traffic properly in non-HT networks * don't incorrectly advertise 80 MHz support when not allowed ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller3-2/+3
Conflicts: include/net/netfilter/nf_conntrack.h The conflict was an overlap between changing the type of the zone argument to nf_ct_tmpl_alloc() whilst exporting nf_ct_tmpl_free. Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net, they are: 1) Oneliner to restore maps in nf_tables since we support addressing registers at 32 bits level. 2) Restore previous default behaviour in bridge netfilter when CONFIG_IPV6=n, oneliner from Bernhard Thaler. 3) Out of bound access in ipset hash:net* set types, reported by Dave Jones' KASan utility, patch from Jozsef Kadlecsik. 4) Fix ipset compilation with gcc 4.4.7 related to C99 initialization of unnamed unions, patch from Elad Raz. 5) Add a workaround to address inconsistent endianess in the res_id field of nfnetlink batch messages, reported by Florian Westphal. 6) Fix error paths of CT/synproxy since the conntrack template was moved to use kmalloc, patch from Daniel Borkmann. All of them look good to me to reach 4.2, I can route this to -stable myself too, just let me know what you prefer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-04mac80211: protect non-HT BSS when HT TDLS traffic existsAvri Altman1-1/+3
HT TDLS traffic should be protected in a non-HT BSS to avoid collisions. Therefore, when TDLS peers join/leave, check if protection is (now) needed and set the ht_operation_mode of the virtual interface according to the HT capabilities of the TDLS peer(s). This works because a non-HT BSS connection never sets (or otherwise uses) the ht_operation_mode; it just means that drivers must be aware that this field applies to all HT traffic for this virtual interface, not just the traffic within the BSS. Document that. Signed-off-by: Avri Altman <avri.altman@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-03netfilter: nf_conntrack: make nf_ct_zone_dflt built-inDaniel Borkmann1-18/+1
Fengguang reported, that some randconfig generated the following linker issue with nf_ct_zone_dflt object involved: [...] CC init/version.o LD init/built-in.o net/built-in.o: In function `ipv4_conntrack_defrag': nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt' net/built-in.o: In function `ipv6_defrag': nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt' make: *** [vmlinux] Error 1 Given that configurations exist where we have a built-in part, which is accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user() and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in area when netfilter is configured in general. Therefore, split the more generic parts into a common header under include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in section that already holds parts related to CONFIG_NF_CONNTRACK in the netfilter core. This fixes the issue on my side. Fixes: 308ac9143ee2 ("netfilter: nf_conntrack: push zone object into functions") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02flow_dissector: Use 'const' where possible.David S. Miller1-4/+4
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02flow_dissector: Don't use bit fields.David S. Miller1-3/+5
Just have a flags member instead. In file included from include/linux/linkage.h:4:0, from include/linux/kernel.h:6, from net/core/flow_dissector.c:1: In function 'flow_keys_hash_start', inlined from 'flow_hash_from_keys' at net/core/flow_dissector.c:553:34: >> include/linux/compiler.h:447:38: error: call to '__compiletime_assert_459' declared with attribute error: BUILD_BUG_ON failed: FLOW_KEYS_HASH_OFFSET % sizeof(u32) Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02flow_dissector: Add control/reporting of encapsulationTom Herbert1-0/+2
Add an input flag to flow dissector on rather dissection should stop when encapsulation is detected (IP/IP or GRE). Also, add a key_control flag that indicates encapsulation was encountered during the dissection. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02flow_dissector: Add flag to stop parsing when an IPv6 flow label is seenTom Herbert1-0/+1
Add an input flag to flow dissector on rather dissection should be stopped when a flow label is encountered. Presumably, the flow label is derived from a sufficient hash of an inner transport packet so further dissection is not needed (that is ports are not included in the flow hash). Using the flow label instead of ports has the additional benefit that packet fragments should hash to same value as non-fragments for a flow (assuming that the same flow label is used). We set this flag by default in for skb_get_hash. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02flow_dissector: Add flag to stop parsing at L3Tom Herbert1-0/+1
Add an input flag to flow dissector on rather dissection should be stopped when an L3 packet is encountered. This would be useful if a caller just wanted to get IP addresses of the outermost header (e.g. to do an L3 hash). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02flow_dissector: Add control/reporting of fragmentationTom Herbert1-0/+4
Add an input flag to flow dissector on rather dissection should be attempted on a first fragment. Also add key_control flags to indicate that a packet is a fragment or first fragment. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02flowi: Abstract out functions to get flow hash based on flowiTom Herbert2-0/+21
Create __get_hash_from_flowi6 and __get_hash_from_flowi4 to get the flow keys and hash based on flowi structures. These are called by __skb_get_hash_flowi6 and __skb_get_hash_flowi4. Also, created get_hash_from_flowi6 and get_hash_from_flowi4 which can be called when just the hash value for a flowi is needed. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02skbuff: Make __skb_set_sw_hash a general functionTom Herbert1-0/+5
Move __skb_set_sw_hash to skbuff.h and add __skb_set_hash which is a common method (between __skb_set_sw_hash and skb_set_hash) to set the hash in an skbuff. Also, move skb_clear_hash to be closer to __skb_set_hash. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02flow_dissector: Move skb related functions to skbuff.hTom Herbert1-50/+0
Move the flow dissector functions that are specific to skbuffs into skbuff.h out of flow_dissector.h. This makes flow_dissector.h have no dependencies on skbuff.h. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02net: Make table id type u32David Ahern2-13/+13
A number of VRF patches used 'int' for table id. It should be u32 to be consistent with the rest of the stack. Fixes: 4e3c89920cd3a ("net: Introduce VRF related flags and helpers") 15be405eb2ea9 ("net: Add inet_addr lookup by table") 30bbaa1950055 ("net: Fix up inet_addr_type checks") 021dd3b8a142d ("net: Add routes to the table associated with the device") dc028da54ed35 ("inet: Move VRF table lookup to inlined function") f6d3c19274c74 ("net: FIB tracepoints") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error pathsDaniel Borkmann1-0/+1
Commit 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates") migrated templates to the new allocator api, but forgot to update error paths for them in CT and synproxy to use nf_ct_tmpl_free() instead of nf_conntrack_free(). Due to that, memory is being freed into the wrong kmemcache, but also we drop the per net reference count of ct objects causing an imbalance. In Brad's case, this leads to a wrap-around of net->ct.count and thus lets __nf_conntrack_alloc() refuse to create a new ct object: [ 10.340913] xt_addrtype: ipv6 does not support BROADCAST matching [ 10.810168] nf_conntrack: table full, dropping packet [ 11.917416] r8169 0000:07:00.0 eth0: link up [ 11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 12.815902] nf_conntrack: table full, dropping packet [ 15.688561] nf_conntrack: table full, dropping packet [ 15.689365] nf_conntrack: table full, dropping packet [ 15.690169] nf_conntrack: table full, dropping packet [ 15.690967] nf_conntrack: table full, dropping packet [...] With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs. nf_conntrack_ffffffff81ce75c0) and reports poison overwrites, etc. Thus, to fix the problem, export and use nf_ct_tmpl_free() instead. Fixes: 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates") Reported-by: Brad Jackson <bjackson0971@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-01tun_dst: Remove opts_sizePravin B Shelar1-1/+0
opts_size is only written and never read. Following patch removes this unused variable. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01gro_cells: remove spinlock protecting receive queuesEric Dumazet1-13/+5
As David pointed out, spinlock are no longer needed to protect the per cpu queues used in gro cells infrastructure. Also use new napi_complete_done() API so that gro_flush_timeout tweaks have an effect. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31tcp: use dctcp if enabled on the route to the initiatorDaniel Borkmann2-1/+7
Currently, the following case doesn't use DCTCP, even if it should: A responder has f.e. Cubic as system wide default, but for a specific route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The initiating host then uses DCTCP as congestion control, but since the initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok, and we have to fall back to Reno after 3WHS completes. We were thinking on how to solve this in a minimal, non-intrusive way without bloating tcp_ecn_create_request() needlessly: lets cache the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0) is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows to only do a single metric feature lookup inside tcp_ecn_create_request(). Joint work with Florian Westphal. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31ip-tunnel: Use API to access tunnel metadata options.Pravin B Shelar2-53/+45
Currently tun-info options pointer is used in few cases to pass options around. But tunnel options can be accessed using ip_tunnel_info_opts() API without using the pointer. Following patch removes the redundant pointer and consistently make use of API. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31net: Introduce helper functions to get the per cpu dataRaghavendra K T1-0/+10
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31net/bonding: Export bond_option_active_slave_get_rcuMatan Barak1-0/+7
Some consumers of the netdev events API would like to know who is the active slave when a NETDEV_CHANGEUPPER or NETDEV_BONDING_FAILOVER events occur. For example, when managing RoCE GIDs, GIDs based on the bond's ips should only be set on the port which corresponds to active slave netdevice. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-31net/ipv6: Export addrconf_ifid_eui48Matan Barak1-0/+31
For loopback purposes, RoCE devices should have a default GID in the port GID table, even when the interface is down. In order to do so, we use the IPv6 link local address which would have been genenrated for the related Ethernet netdevice when it goes up as a default GID. addrconf_ifid_eui48 is used to gernerate this address, export it. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-29vxlan: do not receive IPv4 packets on IPv6 socketJiri Benc1-1/+2
By default (subject to the sysctl settings), IPv6 sockets listen also for IPv4 traffic. Vxlan is not prepared for that and expects IPv6 header in packets received through an IPv6 socket. In addition, it's currently not possible to have both IPv4 and IPv6 vxlan tunnel on the same port (unless bindv6only sysctl is enabled), as it's not possible to create and bind both IPv4 and IPv6 vxlan interfaces and there's no way to specify both IPv4 and IPv6 remote/group IP addresses. Set IPV6_V6ONLY on vxlan sockets to fix both of these issues. This is not done globally in udp_tunnel, as l2tp and tipc seems to work okay when receiving IPv4 packets on IPv6 socket and people may rely on this behavior. The other tunnels (geneve and fou) do not support IPv6. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29ip_tunnels: record IP version in tunnel infoJiri Benc2-0/+11
There's currently nothing preventing directing packets with IPv6 encapsulation data to IPv4 tunnels (and vice versa). If this happens, IPv6 addresses are incorrectly interpreted as IPv4 ones. Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this in ip_tunnel_info. Reject packets at appropriate places if they are supposed to be encapsulated into an incompatible protocol. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29ip_tunnels: convert the mode field of ip_tunnel_info to flagsJiri Benc2-8/+2
The mode field holds a single bit of information only (whether the ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags. This allows more mode flags to be added. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-8/+15
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. In sum, patches to address fallout from the previous round plus updates from the IPVS folks via Simon Horman, they are: 1) Add a new scheduler to IPVS: The weighted overflow scheduling algorithm directs network connections to the server with the highest weight that is currently available and overflows to the next when active connections exceed the node's weight. From Raducu Deaconu. 2) Fix locking ordering in IPVS, always take rtnl_lock in first place. Patch from Julian Anastasov. 3) Allow to indicate the MTU to the IPVS in-kernel state sync daemon. From Julian Anastasov. 4) Enhance multicast configuration for the IPVS state sync daemon. Also from Julian. 5) Resolve sparse warnings in the nf_dup modules. 6) Fix a linking problem when CONFIG_NF_DUP_IPV6 is not set. 7) Add ICMP codes 5 and 6 to IPv6 REJECT target, they are more informative subsets of code 1. From Andreas Herz. 8) Revert the jumpstack size calculation from mark_source_chains due to chain depth miscalculations, from Florian Westphal. 9) Calm down more sparse warning around the Netfilter tree, again from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28net: Add support for VRFs to inetpeer cacheDavid Ahern1-5/+12
inetpeer caches based on address only, so duplicate IP addresses within a namespace return the same cached entry. Enhance the ipv4 address key to contain both the IPv4 address and VRF device index. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>