summaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)AuthorFilesLines
2018-03-09net: do not create fallback tunnels for non-default namespacesEric Dumazet1-8/+12
fallback tunnels (like tunl0, gre0, gretap0, erspan0, sit0, ip6tnl0, ip6gre0) are automatically created when the corresponding module is loaded. These tunnels are also automatically created when a new network namespace is created, at a great cost. In many cases, netns are used for isolation purposes, and these extra network devices are a waste of resources. We are using thousands of netns per host, and hit the netns creation/delete bottleneck a lot. (Many thanks to Kirill for recent work on this) Add a new sysctl so that we can opt-out from this automatic creation. Note that these tunnels are still created for the initial namespace, to be the least intrusive for typical setups. Tested: lpk43:~# cat add_del_unshare.sh for i in `seq 1 40` do (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) & done wait lpk43:~# echo 0 >/proc/sys/net/core/fb_tunnels_only_for_init_net lpk43:~# time ./add_del_unshare.sh real 0m37.521s user 0m0.886s sys 7m7.084s lpk43:~# echo 1 >/proc/sys/net/core/fb_tunnels_only_for_init_net lpk43:~# time ./add_del_unshare.sh real 0m4.761s user 0m0.851s sys 1m8.343s lpk43:~# Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08net: Convert ipv4_net_opsKirill Tkhai1-0/+1
These pernet_operations register and unregister bunch of nf_conntrack_l4proto. Exit method unregisters related sysctl, init method calls init_net and get_net_proto. The whole builtin_l4proto4 array has pretty simple init_net and get_net_proto methods. The first one register sysctl table, the second one is just RO memory dereference. So, these pernet_operations are safe to be marked as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08net: Convert iptable_security_net_opsKirill Tkhai1-0/+1
These pernet_operations unregister net::ipv4::iptable_security table. Another net/pernet_operations do not send ipv4 packets to foreign net namespaces. So, we mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08net: Convert iptable_raw_net_opsKirill Tkhai1-0/+1
These pernet_operations unregister net::ipv4::iptable_raw table. Another net/pernet_operations do not send ipv4 packets to foreign net namespaces. So, we mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08net: Convert iptable_nat_net_opsKirill Tkhai1-0/+1
These pernet_operations unregister net::ipv4::nat_table table. Another net/pernet_operations do not send ipv4 packets to foreign net namespaces. So, we mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08net: Convert iptable_mangle_net_opsKirill Tkhai1-0/+1
These pernet_operations unregister net::ipv4::iptable_mangle table. Another net/pernet_operations do not send ipv4 packets to foreign net namespaces. So, we mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08net: Convert arptable_filter_net_opsKirill Tkhai1-0/+1
These pernet_operations unregister net::ipv4::arptable_filter. Another net/pernet_operations do not send arp packets to foreign net namespaces. So, we mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller10-44/+43
All of the conflicts were cases of overlapping changes. In net/core/devlink.c, we have to make care that the resouce size_params have become a struct member rather than a pointer to such an object. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05net: Convert fou_net_opsKirill Tkhai1-0/+1
These pernet_operations initialize and destroy pernet net_generic(net, fou_net_id) list. The rest of net_generic(net, fou_net_id) accesses may happen after netlink message, and in-tree pernet_operations do not send FOU_GENL_NAME messages. So, these pernet_operations are safe to be marked as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05net: Convert arp_tables_net_ops and ip6_tables_net_opsKirill Tkhai1-0/+1
These pernet_operations call xt_proto_init() and xt_proto_fini(), which just register and unregister /proc entries. They are safe to be marked as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05net: Convert log pernet_operationsKirill Tkhai2-0/+2
These pernet_operations use nf_log_set() and nf_log_unset() in their methods: nf_log_bridge_net_ops nf_log_arp_net_ops nf_log_ipv4_net_ops nf_log_ipv6_net_ops nf_log_netdev_net_ops Nobody can send such a packet to a net before it's became registered, nobody can send a packet after all netdevices are unregistered. So, these pernet_operations are able to be marked as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05tcp: add ca_state stat in SCM_TIMESTAMPING_OPT_STATSPriyaranjan Jha1-1/+2
This patch adds TCP_NLA_CA_STATE stat into SCM_TIMESTAMPING_OPT_STATS. It reports ca_state of socket, when timestamp is generated. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05tcp: add send queue size stat in SCM_TIMESTAMPING_OPT_STATSPriyaranjan Jha1-1/+3
This patch adds TCP_NLA_SENDQ_SIZE stat into SCM_TIMESTAMPING_OPT_STATS. It reports no. of bytes present in send queue, when timestamp is generated. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05gre: add sequence number for collect md mode.William Tu1-2/+5
Currently GRE sequence number can only be used in native tunnel mode. This patch adds sequence number support for gre collect metadata mode. RFC2890 defines GRE sequence number to be specific to the traffic flow identified by the key. However, this patch does not implement per-key seqno. The sequence number is shared in the same tunnel device. That is, different tunnel keys using the same collect_md tunnel share single sequence number. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05net: xfrm: use skb_gso_validate_network_len() to check gso sizesDaniel Axtens1-1/+2
Replace skb_gso_network_seglen() with skb_gso_validate_network_len(), as it considers the GSO_BY_FRAGS case. Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05net: rename skb_gso_validate_mtu -> skb_gso_validate_network_lenDaniel Axtens3-3/+3
If you take a GSO skb, and split it into packets, will the network length (L3 headers + L4 headers + payload) of those packets be small enough to fit within a given MTU? skb_gso_validate_mtu gives you the answer to that question. However, we recently added to add a way to validate the MAC length of a split GSO skb (L2+L3+L4+payload), and the names get confusing, so rename skb_gso_validate_mtu to skb_gso_validate_network_len Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04net: Rename NETEVENT_MULTIPATH_HASH_UPDATEDavid Ahern1-1/+1
Rename NETEVENT_MULTIPATH_HASH_UPDATE to NETEVENT_IPV4_MPATH_HASH_UPDATE to denote it relates to a change in the IPv4 hash policy. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04net/ipv4: Simplify fib_multipath_hash with optional flow keysDavid Ahern1-14/+10
As of commit e37b1e978bec5 ("ipv6: route: dissect flow in input path if fib rules need it") fib_multipath_hash takes an optional flow keys. If non-NULL it means the skb has already been dissected. If not set, then fib_multipath_hash needs to call skb_flow_dissect_flow_keys. Simplify the logic by setting flkeys to the local stack variable keys. Simplifies fib_multipath_hash by only have 1 set of instructions setting hash_keys. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04net: Align ip_multipath_l3_keys and ip6_multipath_l3_keysDavid Ahern1-9/+11
Symmetry is good and allows easy comparison that ipv4 and ipv6 are doing the same thing. To that end, change ip_multipath_l3_keys to set addresses at the end after the icmp compares, and move the initialization of ipv6 flow keys to rt6_multipath_hash. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04net/ipv4: Pass net to fib_multipath_hash instead of fib_infoDavid Ahern2-4/+3
fib_multipath_hash only needs net struct to check a sysctl. Make it clear by passing net instead of fib_info. In the end this allows alignment between the ipv4 and ipv6 versions. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2-5/+11
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Put back reference on CLUSTERIP configuration structure from the error path, patch from Florian Westphal. 2) Put reference on CLUSTERIP configuration instead of freeing it, another cpu may still be walking over it, also from Florian. 3) Refetch pointer to IPv6 header from nf_nat_ipv6_manip_pkt() given packet manipulation may reallocation the skbuff header, from Florian. 4) Missing match size sanity checks in ebt_among, from Florian. 5) Convert BUG_ON to WARN_ON in ebtables, from Florian. 6) Sanity check userspace offsets from ebtables kernel, from Florian. 7) Missing checksum replace call in flowtable IPv4 DNAT, from Felix Fietkau. 8) Bump the right stats on checksum error from bridge netfilter, from Taehee Yoo. 9) Unset interface flag in IPv6 fib lookups otherwise we get misleading routing lookup results, from Florian. 10) Missing sk_to_full_sk() in ip6_route_me_harder() from Eric Dumazet. 11) Don't allow devices to be part of multiple flowtables at the same time, this may break setups. 12) Missing netlink attribute validation in flowtable deletion. 13) Wrong array index in nf_unregister_net_hook() call from error path in flowtable addition path. 14) Fix FTP IPVS helper when NAT mangling is in place, patch from Julian Anastasov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-02tcp_bbr: remove bbr->tso_segs_goalEric Dumazet1-8/+4
Its value is computed then immediately used, there is no need to store it. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-02tcp_bbr: better deal with suboptimal GSO (II)Eric Dumazet2-17/+21
This is second part of dealing with suboptimal device gso parameters. In first patch (350c9f484bde "tcp_bbr: better deal with suboptimal GSO") we dealt with devices having low gso_max_segs Some devices lower gso_max_size from 64KB to 16 KB (r8152 is an example) In order to probe an optimal cwnd, we want BBR being not sensitive to whatever GSO constraint a device can have. This patch removes tso_segs_goal() CC callback in favor of min_tso_segs() for CC wanting to override sysctl_tcp_min_tso_segs Next patch will remove bbr->tso_segs_goal since it does not have to be persistent. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-02vrf: check forwarding on the original netdevice when generating ICMP dest ↵Stephen Suryaputra1-1/+10
unreachable When ip_error() is called the device is the l3mdev master instead of the original device. So the forwarding check should be on the original one. Changes from v2: - Handle the original device disappearing (per David Ahern) - Minimize the change in code order Changes from v1: - Only need to reset the device on which __in_dev_get_rcu() is done (per David Ahern). Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net: ipv4: avoid unused variable warning for sysctlArnd Bergmann1-2/+1
The newly introudced ip_min_valid_pmtu variable is only used when CONFIG_SYSCTL is set: net/ipv4/route.c:135:12: error: 'ip_min_valid_pmtu' defined but not used [-Werror=unused-variable] This moves it to the other variables like it, to avoid the harmless warning. Fixes: c7272c2f1229 ("net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite dumproute flowsYuval Mintz2-122/+162
The various MFC entries are being held in the same kind of mr_tables for both ipmr and ip6mr, and their traversal logic is identical. Also, with the exception of the addresses [and other small tidbits] the major bulk of the nla setting is identical. Unite as much of the dumping as possible between the two. Notice this requires creating an mr_table iterator for each, as the for-each preprocessor macro can't be used by the common logic. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite vif seq functionsYuval Mintz2-44/+38
Same as previously done with the mfc seq, the logic for the vif seq is refactored to be shared between ipmr and ip6mr. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite mfc seq logicYuval Mintz2-88/+67
With the exception of the final dump, ipmr and ip6mr have the exact same seq logic for traversing a given mr_table. Refactor that code and make it common. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite logic for searching in MFC cacheYuval Mintz2-57/+68
ipmr and ip6mr utilize the exact same methods for searching the hashed resolved connections, difference being only in the construction of the hash comparison key. In order to unite the flow, introduce an mr_table operation set that would contain the protocol specific information required for common flows, in this case - the hash parameters and a comparison key representing a (*,*) route. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Make mfc_cache a common structureYuval Mintz1-108/+125
mfc_cache and mfc6_cache are almost identical - the main difference is in the origin/group addresses and comparison-key. Make a common structure encapsulating most of the multicast routing logic - mr_mfc and convert both ipmr and ip6mr into using it. For easy conversion [casting, in this case] mr_mfc has to be the first field inside every multicast routing abstraction utilizing it. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr, ip6mr: Unite creation of new mr_tableYuval Mintz2-17/+37
Now that both ipmr and ip6mr are using the same mr_table structure, we can have a common function to allocate & initialize a new instance. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01mroute*: Make mr_table a common structYuval Mintz1-2/+0
Following previous changes to ip6mr, mr_table and mr6_table are basically the same [up to mr6_table having additional '6' suffixes to its variable names]. Move the common structure definition into a common header; This requires renaming all references in ip6mr to variables that had the distinct suffix. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipmr,ipmr6: Define a uniform vif_deviceYuval Mintz4-17/+49
The two implementations have almost identical structures - vif_device and mif_device. As a step toward uniforming the mr_tables, eliminate the mif_device and relocate the vif_device definition into a new common header file. Also, introduce a common initializing function for setting most of the vif_device fields in a new common source file. This requires modifying the ipv{4,6] Kconfig and ipv4 makefile as we're introducing a new common config option - CONFIG_IP_MROUTE_COMMON. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipv6: route: dissect flow in input path if fib rules need itRoopa Prabhu3-15/+38
Dissect flow in fwd path if fib rules require it. Controlled by a flag to avoid penatly for the common case. Flag is set when fib rules with sport, dport and proto match that require flow dissect are installed. Also passes the dissected hash keys to the multipath hash function when applicable to avoid dissecting the flow again. icmp packets will continue to use inner header for hash calculations (Thanks to Nikolay Aleksandrov for some review here). Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01ipv4: fib_rules: support match on sport, dport and ip protoRoopa Prabhu1-0/+11
support to match on src port, dst port and ip protocol. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net/tcp/illinois: replace broken algorithm reference linkJoey Pabalinas1-1/+1
The link to the pdf containing the algorithm description is now a dead link; it seems http://www.ifp.illinois.edu/~srikant/ has been moved to https://sites.google.com/a/illinois.edu/srikant/ and none of the original papers can be found there... I have replaced it with the only working copy I was able to find. n.b. there is also a copy available at: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.6350&rep=rep1&type=pdf However, this seems to only be a *cached* version, so I am unsure exactly how reliable that link can be expected to remain over time and have decided against using that one. Signed-off-by: Joey Pabalinas <joeypabalinas@gmail.com> 1 file changed, 1 insertion(+), 1 deletion(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28inet: whitespace cleanupStephen Hemminger4-4/+2
Ran simple script to find/remove trailing whitespace and blank lines at EOF because that kind of stuff git whines about and editors leave behind. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28tcp: purge write queue upon RSTSoheil Hassas Yeganeh1-0/+1
When the connection is reset, there is no point in keeping the packets on the write queue until the connection is closed. RFC 793 (page 70) and RFC 793-bis (page 64) both suggest purging the write queue upon RST: https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07 Moreover, this is essential for a correct MSG_ZEROCOPY implementation, because userspace cannot call close(fd) before receiving zerocopy signals even when the connection is reset. Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28tcp: revert F-RTO extension to detect more spurious timeoutsYuchung Cheng1-18/+12
This reverts commit 89fe18e44f7ee5ab1c90d0dff5835acee7751427. While the patch could detect more spurious timeouts, it could cause poor TCP performance on broken middle-boxes that modifies TCP packets (e.g. receive window, SACK options). Since the performance gain is much smaller compared to the potential loss. The best solution is to fully revert the change. Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts") Reported-by: Teodor Milkov <tm@del.bg> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28tcp: revert F-RTO middle-box workaroundYuchung Cheng1-10/+7
This reverts commit cc663f4d4c97b7297fb45135ab23cfd508b35a77. While fixing some broken middle-boxes that modifies receive window fields, it does not address middle-boxes that strip off SACK options. The best solution is to fully revert this patch and the root F-RTO enhancement. Fixes: cc663f4d4c97 ("tcp: restrict F-RTO to work-around broken middle-boxes") Reported-by: Teodor Milkov <tm@del.bg> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27ip_tunnel: Rename & publish init_tunnel_flowPetr Machata1-28/+12
Initializing struct flowi4 is useful for drivers that need to emulate routing decisions made by a tunnel interface. Publish the function (appropriately renamed) so that the drivers in question don't need to cut'n'paste it around. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27net: GRE: Add is_gretap_dev, is_ip6gretap_devPetr Machata1-0/+6
Determining whether a device is a GRE device is easily done by inspecting struct net_device.type. However, for the tap variants, the type is just ARPHRD_ETHER. Therefore introduce two predicate functions that use netdev_ops to tell the tap devices. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27ip_gre: fix IFLA_MTU ignored on NEWLINKXin Long1-5/+0
It's safe to remove the setting of dev's needed_headroom and mtu in __gre_tunnel_init, as discussed in [1], ip_tunnel_newlink can do it properly. Now Eric noticed that it could cover the mtu value set in do_setlink when creating a ip_gre dev. It makes IFLA_MTU param not take effect. So this patch is to remove them to make IFLA_MTU work, as in other ipv4 tunnels. [1]: https://patchwork.ozlabs.org/patch/823504/ Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Reported-by: Eric Garver <e@erig.me> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68Sabrina Dubroca1-2/+6
According to RFC 1191 sections 3 and 4, ICMP frag-needed messages indicating an MTU below 68 should be rejected: A host MUST never reduce its estimate of the Path MTU below 68 octets. and (talking about ICMP frag-needed's Next-Hop MTU field): This field will never contain a value less than 68, since every router "must be able to forward a datagram of 68 octets without fragmentation". Furthermore, by letting net.ipv4.route.min_pmtu be set to negative values, we can end up with a very large PMTU when (-1) is cast into u32. Let's also make ip_rt_min_pmtu a u32, since it's only ever compared to unsigned ints. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27net: Convert defrag4_net_opsKirill Tkhai1-0/+1
These pernet_operations only unregister nf hooks. So, they are able to be marked as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27net: Convert clusterip_net_opsKirill Tkhai1-0/+1
These pernet_operations register and unregister nf hooks, and populate and destroy /proc entry. So, they are able to be marked as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27net: Convert ipgre_net_ops, ipgre_tap_net_ops, erspan_net_ops, vti_net_ops ↵Kirill Tkhai3-0/+5
and ipip_net_ops These pernet_operations are similar to bond_net_ops. Exit methods unregisters all net ipgre/ipgre_tap/erspan/vti/ipip devices, and it looks like another pernet_operations are not interested in foreign net ipgre/ipgre_tap/erspan/vti/ipip list. Init method also does not intersect with something pernet-specific. So, it's possible to mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26ip_tunnel: Do not use mark in skb by defaultThomas Winter1-10/+3
This reverts commit 5c38bd1b82e1f76f9fa96c1e61c9897cabf1ce45. skb->mark contains the mark the encapsulated traffic which can result in incorrect routing decisions being made such as routing loops if the route chosen is via tunnel itself. The correct method should be to use tunnel->fwmark. Signed-off-by: Thomas Winter <thomas.winter@alliedtelesis.co.nz> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26net: make kmem caches as __ro_after_initAlexey Dobriyan3-4/+7
All kmem caches aren't reallocated once set up. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-25netfilter: nf_flow_table: fix checksum when handling DNATFelix Fietkau1-0/+1
Add a missing call to csum_replace4 like on SNAT. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>