summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2018-08-07net/sched: allow flower to match tunnel optionsPieter Jansen van Vuuren1-1/+243
Allow matching on options in Geneve tunnel headers. This makes use of existing tunnel metadata support. The options can be described in the form CLASS:TYPE:DATA/CLASS_MASK:TYPE_MASK:DATA_MASK, where CLASS is represented as a 16bit hexadecimal value, TYPE as an 8bit hexadecimal value and DATA as a variable length hexadecimal value. e.g. # ip link add name geneve0 type geneve dstport 0 external # tc qdisc add dev geneve0 ingress # tc filter add dev geneve0 protocol ip parent ffff: \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ geneve_opts 0102:80:1122334421314151/ffff:ff:ffffffffffffffff \ ip_proto udp \ action mirred egress redirect dev eth1 This patch adds support for matching Geneve options in the order supplied by the user. This leads to an efficient implementation in the software datapath (and in our opinion hardware datapaths that offload this feature). It is also compatible with Geneve options matching provided by the Open vSwitch kernel datapath which is relevant here as the Flower classifier may be used as a mechanism to program flows into hardware as a form of Open vSwitch datapath offload (sometimes referred to as OVS-TC). The netlink Kernel/Userspace API may be extended, for example by adding a flag, if other matching options are desired, for example matching given options in any order. This would require an implementation in the TC software datapath. And be done in a way that drivers that facilitate offload of the Flower classifier can reject or accept such flows based on hardware datapath capabilities. This approach was discussed and agreed on at Netconf 2017 in Seoul. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07flow_dissector: allow dissection of tunnel options from metadataSimon Horman1-1/+18
Allow the existing 'dissection' of tunnel metadata to 'dissect' options already present in tunnel metadata. This dissection is controlled by a new dissector key, FLOW_DISSECTOR_KEY_ENC_OPTS. This dissection only occurs when skb_flow_dissect_tunnel_info() is called, currently only the Flower classifier makes that call. So there should be no impact on other users of the flow dissector. This is in preparation for allowing the flower classifier to match on Geneve options. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller4-52/+152
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-08-07 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add cgroup local storage for BPF programs, which provides a fast accessible memory for storing various per-cgroup data like number of transmitted packets, etc, from Roman. 2) Support bpf_get_socket_cookie() BPF helper in several more program types that have a full socket available, from Andrey. 3) Significantly improve the performance of perf events which are reported from BPF offload. Also convert a couple of BPF AF_XDP samples overto use libbpf, both from Jakub. 4) seg6local LWT provides the End.DT6 action, which allows to decapsulate an outer IPv6 header containing a Segment Routing Header. Adds this action now to the seg6local BPF interface, from Mathieu. 5) Do not mark dst register as unbounded in MOV64 instruction when both src and dst register are the same, from Arthur. 6) Define u_smp_rmb() and u_smp_wmb() to their respective barrier instructions on arm64 for the AF_XDP sample code, from Brian. 7) Convert the tcp_client.py and tcp_server.py BPF selftest scripts over from Python 2 to Python 3, from Jeremy. 8) Enable BTF build flags to the BPF sample code Makefile, from Taeung. 9) Remove an unnecessary rcu_read_lock() in run_lwt_bpf(), from Taehee. 10) Several improvements to the README.rst from the BPF documentation to make it more consistent with RST format, from Tobin. 11) Replace all occurrences of strerror() by calls to strerror_r() in libbpf and fix a FORTIFY_SOURCE build error along with it, from Thomas. 12) Fix a bug in bpftool's get_btf() function to correctly propagate an error via PTR_ERR(), from Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07netfilter: nft_ct: enable conntrack for helpersPablo Neira Ayuso1-0/+14
Enable conntrack if the user defines a helper to be used from the ruleset policy. Fixes: 1a64edf54f55 ("netfilter: nft_ct: add helper set support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: nft_ct: add ct timeout supportHarsha Sharma1-1/+203
This patch allows to add, list and delete connection tracking timeout policies via nft objref infrastructure and assigning these timeout via nft rule. %./libnftnl/examples/nft-ct-timeout-add ip raw cttime tcp Ruleset: table ip raw { ct timeout cttime { protocol tcp; policy = {established: 111, close: 13 } } chain output { type filter hook output priority -300; policy accept; ct timeout set "cttime" } } %./libnftnl/examples/nft-rule-ct-timeout-add ip raw output cttime %conntrack -E [NEW] tcp 6 111 ESTABLISHED src=172.16.19.128 dst=172.16.19.1 sport=22 dport=41360 [UNREPLIED] src=172.16.19.1 dst=172.16.19.128 sport=41360 dport=22 %nft delete rule ip raw output handle <handle> %./libnftnl/examples/nft-ct-timeout-del ip raw cttime Joint work with Pablo Neira. Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: cttimeout: decouple timeout policy from nfnetlink_cttimeout objectPablo Neira Ayuso3-21/+26
The timeout policy is currently embedded into the nfnetlink_cttimeout object, move the policy into an independent object. This allows us to reuse part of the existing conntrack timeout extension from nf_tables without adding dependencies with the nfnetlink_cttimeout object layout. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: cttimeout: move ctnl_untimeout to nf_conntrackHarsha Sharma2-18/+19
As, ctnl_untimeout is required by nft_ct, so move ctnl_timeout from nfnetlink_cttimeout to nf_conntrack_timeout and rename as nf_ct_timeout. Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: nft_osf: use NFT_OSF_MAXGENRELEN instead of IFNAMSIZFernando Fernandez Mancera1-5/+3
As no "genre" on pf.os exceed 16 bytes of length, we reduce NFT_OSF_MAXGENRELEN parameter to 16 bytes and use it instead of IFNAMSIZ. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-06packet: refine ring v3 block size test to hold one frameWillem de Bruijn1-4/+6
TPACKET_V3 stores variable length frames in fixed length blocks. Blocks must be able to store a block header, optional private space and at least one minimum sized frame. Frames, even for a zero snaplen packet, store metadata headers and optional reserved space. In the block size bounds check, ensure that the frame of the chosen configuration fits. This includes sockaddr_ll and optional tp_reserve. Syzbot was able to construct a ring with insuffient room for the sockaddr_ll in the header of a zero-length frame, triggering an out-of-bounds write in dev_parse_header. Convert the comparison to less than, as zero is a valid snap len. This matches the test for minimum tp_frame_size immediately below. Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") Fixes: eb73190f4fbe ("net/packet: refine check for priv area size") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06Merge branch 'ieee802154-for-davem-2018-08-06' of ↵David S. Miller4-9/+49
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2018-08-06 An update from ieee802154 for *net-next* Romuald added a socket option to get the LQI value of the received datagram. Alexander added a new hardware simulation driver modelled after hwsim of the wireless people. It allows runtime configuration for new nodes and edges over a netlink interface (a config utlity is making its way into wpan-tools). We also have three fixes in here. One from Colin which is more of a cleanup and two from Alex fixing tailroom and single frame space problems. I would normally put the last two into my fixes tree, but given we are already in -rc8 I simply put them here and added a cc: stable to them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06ipv4: frags: precedence bug in ip_expire()Dan Carpenter1-1/+1
We accidentally removed the parentheses here, but they are required because '!' has higher precedence than '&'. Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06net: avoid unnecessary sock_flag() check when enable timestampYafang Shao2-8/+6
The sock_flag() check is alreay inside sock_enable_timestamp(), so it is unnecessary checking it in the caller. void sock_enable_timestamp(struct sock *sk, int flag) { if (!sock_flag(sk, flag)) { ... } } Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06net/bridge/br_multicast: remove redundant variable "err"zhong jiang1-7/+5
The err is not modified after initalization, So remove it and make it to be void function. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06Bluetooth: remove redundant variables 'adv_set' and 'cp'YueHaibing2-6/+0
Variables 'adv_set' and 'cp' are being assigned but are never used hence they are redundant and can be removed. Cleans up clang warnings: net/bluetooth/hci_event.c:1135:29: warning: variable 'adv_set' set but not used [-Wunused-but-set-variable] net/bluetooth/mgmt.c:3359:39: warning: variable 'cp' set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2018-08-06net: ieee802154: 6lowpan: remove redundant pointers 'fq' and 'net'Colin Ian King1-5/+0
Pointers fq and net are being assigned but are never used hence they are redundant and can be removed. Cleans up clang warnings: warning: variable 'fq' set but not used [-Wunused-but-set-variable] warning: variable 'net' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2018-08-06net: mac802154: tx: expand tailroom if necessaryAlexander Aring1-1/+14
This patch is necessary if case of AF_PACKET or other socket interface which I am aware of it and didn't allocated the necessary room. Reported-by: David Palma <david.palma@ntnu.no> Reported-by: Rabi Narayan Sahoo <rabinarayans0828@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2018-08-06net: 6lowpan: fix reserved space for single framesAlexander Aring1-3/+18
This patch fixes patch add handling to take care tail and headroom for single 6lowpan frames. We need to be sure we have a skb with the right head and tailroom for single frames. This patch do it by using skb_copy_expand() if head and tailroom is not enough allocated by upper layer. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195059 Reported-by: David Palma <david.palma@ntnu.no> Reported-by: Rabi Narayan Sahoo <rabinarayans0828@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2018-08-06Merge remote-tracking branch 'net-next/master'Stefan Schmidt364-5755/+12291
2018-08-06ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmitXin Long1-6/+2
According to RFC791, 68 bytes is the minimum size of IPv4 datagram every device must be able to forward without further fragmentation while 576 bytes is the minimum size of IPv4 datagram every device has to be able to receive, so in ip6_tnl_xmit(), 68(IPV4_MIN_MTU) should be the right value for the ipv4 min mtu check in ip6_tnl_xmit. While at it, change to use max() instead of if statement. Fixes: c9fefa08190f ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06Merge branch 'for-upstream' of ↵David S. Miller11-218/+1690
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2018-08-05 Here's the main bluetooth-next pull request for the 4.19 kernel. - Added support for Bluetooth Advertising Extensions - Added vendor driver support to hci_h5 HCI driver - Added serdev support to hci_h5 driver - Added support for Qualcomm wcn3990 controller - Added support for RTL8723BS and RTL8723DS controllers - btusb: Added new ID for Realtek 8723DE - Several other smaller fixes & cleanups Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06net: sched: cls_flower: Fix an error code in fl_tmplt_create()Dan Carpenter1-1/+3
We forgot to set the error code on this path, so we return NULL instead of an error pointer. In the current code kzalloc() won't fail for small allocations so this doesn't really affect runtime. Fixes: b95ec7eb3b4d ("net: sched: cls_flower: implement chain templates") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06net: check extack._msg before printLi RongQing1-1/+2
dev_set_mtu_ext is able to fail with a valid mtu value, at that condition, extack._msg is not set and random since it is in stack, then kernel will crash when print it. Fixes: 7a4c53bee3324a ("net: report invalid mtu value via netlink extack") Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06ipv6: fix double refcount of fib6_metricsCong Wang1-4/+0
All the callers of ip6_rt_copy_init()/rt6_set_from() hold refcnt of the "from" fib6_info, so there is no need to hold fib6_metrics refcnt again, because fib6_metrics refcnt is only released when fib6_info is gone, that is, they have the same life time, so the whole fib6_metrics refcnt can be removed actually. This fixes a kmemleak warning reported by Sabrina. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06ipv6: defrag: drop non-last frags smaller than min mtuFlorian Westphal2-0/+8
don't bother with pathological cases, they only waste cycles. IPv6 requires a minimum MTU of 1280 so we should never see fragments smaller than this (except last frag). v3: don't use awkward "-offset + len" v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68). There were concerns that there could be even smaller frags generated by intermediate nodes, e.g. on radio networks. Cc: Peter Oskolkov <posk@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06ip: use rb trees for IP frag queue.Peter Oskolkov4-87/+113
Similar to TCP OOO RX queue, it makes sense to use rb trees to store IP fragments, so that OOO fragments are inserted faster. Tested: - a follow-up patch contains a rather comprehensive ip defrag self-test (functional) - ran neper `udp_stream -c -H <host> -F 100 -l 300 -T 20`: netstat --statistics Ip: 282078937 total packets received 0 forwarded 0 incoming packets discarded 946760 incoming packets delivered 18743456 requests sent out 101 fragments dropped after timeout 282077129 reassemblies required 944952 packets reassembled ok 262734239 packet reassembles failed (The numbers/stats above are somewhat better re: reassemblies vs a kernel without this patchset. More comprehensive performance testing TBD). Reported-by: Jann Horn <jannh@google.com> Reported-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06net: modify skb_rbtree_purge to return the truesize of all purged skbs.Peter Oskolkov1-1/+5
Tested: see the next patch is the series. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06ip: discard IPv4 datagrams with overlapping segments.Peter Oskolkov2-56/+20
This behavior is required in IPv6, and there is little need to tolerate overlapping fragments in IPv4. This change simplifies the code and eliminates potential DDoS attack vectors. Tested: ran ip_defrag selftest (not yet available uptream). Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06net/tls: Mark the end in scatterlist tableVakul Garg1-0/+3
Function zerocopy_from_iter() unmarks the 'end' in input sgtable while adding new entries in it. The last entry in sgtable remained unmarked. This results in KASAN error report on using apis like sg_nents(). Before returning, the function needs to mark the 'end' in the last entry it adds. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06ipv6: icmp: Updating pmtu for link local routeGeorg Kohmann1-1/+1
When a ICMPV6_PKT_TOOBIG is received from a link local address the pmtu will be updated on a route with an arbitrary interface index. Subsequent packets sent back to the same link local address may therefore end up not considering the updated pmtu. Current behavior breaks TAHI v6LC4.1.4 Reduce PMTU On-link. Referring to RFC 1981: Section 3: "Note that Path MTU Discovery must be performed even in cases where a node "thinks" a destination is attached to the same link as itself. In a situation such as when a neighboring router acts as proxy [ND] for some destination, the destination can to appear to be directly connected but is in fact more than one hop away." Using the interface index from the incoming ICMPV6_PKT_TOOBIG when updating the pmtu. Signed-off-by: Georg Kohmann <geokohma@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller23-230/+1270
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree: 1) Support for transparent proxying for nf_tables, from Mate Eckl. 2) Patchset to add OS passive fingerprint recognition for nf_tables, from Fernando Fernandez. This takes common code from xt_osf and place it into the new nfnetlink_osf module for codebase sharing. 3) Lightweight tunneling support for nf_tables. 4) meta and lookup are likely going to be used in rulesets, make them direct calls. From Florian Westphal. A bunch of incremental updates: 5) use PTR_ERR_OR_ZERO() from nft_numgen, from YueHaibing. 6) Use kvmalloc_array() to allocate hashtables, from Li RongQing. 7) Explicit dependencies between nfnetlink_cttimeout and conntrack timeout extensions, from Harsha Sharma. 8) Simplify NLM_F_CREATE handling in nf_tables. 9) Removed unused variable in the get element command, from YueHaibing. 10) Expose bridge hook priorities through uapi, from Mate Eckl. And a few fixes for previous Netfilter batch for net-next: 11) Use per-netns mutex from flowtable event, from Florian Westphal. 12) Remove explicit dependency on iptables CT target from conntrack zones, from Florian. 13) Fix use-after-free in rmmod nf_conntrack path, also from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-10/+14
Lots of overlapping changes, mostly trivial in nature. The mlxsw conflict was resolving using the example resolution at: https://github.com/jpirko/linux_mlxsw/blob/combined_queue/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05netlink: Don't shift on 64 for ngroupsDmitry Safonov1-2/+2
It's legal to have 64 groups for netlink_sock. As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe only to first 32 groups. The check for correctness of .bind() userspace supplied parameter is done by applying mask made from ngroups shift. Which broke Android as they have 64 groups and the shift for mask resulted in an overflow. Fixes: 61f4b23769f0 ("netlink: Don't shift with UB on nlk->ngroups") Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Reported-and-Tested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-3/+1
Daniel Borkmann says: ==================== pull-request: bpf 2018-08-05 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix bpftool percpu_array dump by using correct roundup to next multiple of 8 for the value size, from Yonghong. 2) Fix in AF_XDP's __xsk_rcv_zc() to not returning frames back to allocator since driver will recycle frame anyway in case of an error, from Jakub. 3) Fix up BPF test_lwt_seg6local test cases to final iproute2 syntax, from Mathieu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-04net/smc: no cursor update send in state SMC_INITUrsula Braun1-1/+2
If a writer blocked condition is received without data, the current consumer cursor is immediately sent. Servers could already receive this condition in state SMC_INIT without finished tx-setup. This patch avoids sending a consumer cursor update in this case. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-04net: Remove some unneeded semicolonzhong jiang4-5/+5
These semicolons are not needed. Just remove them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-04tcp: remove unneeded variable 'err'YueHaibing1-2/+1
variable 'err' is unmodified after initalization, so simply cleans up it and returns 0. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-04af_unix: ensure POLLOUT on remote close() for connected dgram socketJason Baron1-1/+6
Applications use -ECONNREFUSED as returned from write() in order to determine that a socket should be closed. However, when using connected dgram unix sockets in a poll/write loop, a final POLLOUT event can be missed when the remote end closes. Thus, the poll is stuck forever: thread 1 (client) thread 2 (server) connect() to server write() returns -EAGAIN unix_dgram_poll() -> unix_recvq_full() is true close() ->unix_release_sock() ->wake_up_interruptible_all() unix_dgram_poll() (due to the wake_up_interruptible_all) -> unix_recvq_full() still is true ->free all skbs Now thread 1 is stuck and will not receive anymore wakeups. In this case, when thread 1 gets the -EAGAIN, it has not queued any skbs otherwise the 'free all skbs' step would in fact cause a wakeup and a POLLOUT return. So the race here is probably fairly rare because it means there are no skbs that thread 1 queued and that thread 1 schedules before the 'free all skbs' step. This issue was reported as a hang when /dev/log is closed. The fix is to signal POLLOUT if the socket is marked as SOCK_DEAD, which means a subsequent write() will get -ECONNREFUSED. Reported-by: Ian Lance Taylor <iant@golang.org> Cc: David Rientjes <rientjes@google.com> Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-04netfilter: nft_tunnel: fix sparse errorsPablo Neira Ayuso1-5/+3
[...] net/netfilter/nft_tunnel.c:117:25: expected unsigned int [unsigned] [usertype] flags net/netfilter/nft_tunnel.c:117:25: got restricted __be16 [usertype] <noident> [...] net/netfilter/nft_tunnel.c:246:33: expected restricted __be16 [addressable] [assigned] [usertype] tp_dst net/netfilter/nft_tunnel.c:246:33: got int Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03rxrpc: Push iov_iter up from rxrpc_kernel_recv_data() to callerDavid Howells1-22/+11
Push iov_iter up from rxrpc_kernel_recv_data() to its caller to allow non-contiguous iovs to be passed down, thereby permitting file reading to be simplified in the AFS filesystem in a future patch. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-03l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()Guillaume Nault1-4/+9
If 'session' is not NULL and is not a PPP pseudo-wire, then we fail to drop the reference taken by l2tp_session_get(). Fixes: ecd012e45ab5 ("l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-03netfilter: conntrack: avoid use-after free on rmmodFlorian Westphal1-4/+3
When the conntrack module is removed, we call nf_ct_iterate_destroy via nf_ct_l4proto_unregister(). Problem is that nf_conntrack_proto_fini() gets called after the conntrack hash table has already been freed. Just remove the l4proto unregister call, its unecessary as the nf_ct_protos[] array gets free'd right after anyway. v2: add comment wrt. missing unreg call. Fixes: a0ae2562c6c4b2 ("netfilter: conntrack: remove l3proto abstraction") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: kconfig: remove ct zone/label dependenciesFlorian Westphal1-3/+3
connection tracking zones currently depend on the xtables CT target. The reasoning was that it makes no sense to support zones if they can't be configured (which needed CT target). Nowadays zones can also be used by OVS and configured via nftables, so remove the dependency. connection tracking labels are handled via hidden dependency that gets auto-selected by the connlabel match. Make it a visible knob, as labels can be attached via ctnetlink or via nftables rules (nft_ct expression) too. This allows to use conntrack labels and zones with nftables-only build. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: nf_tables: simplify NLM_F_CREATE handlingPablo Neira Ayuso1-19/+10
* From nf_tables_newchain(), codepath provides context that allows us to infer if we are updating a chain (in that case, no module autoload is required) or adding a new one (then, module autoload is indeed needed). * We only need it in one single spot in nf_tables_newrule(). * Not needed for nf_tables_newset() at all. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: bridge: Expose nf_tables bridge hook priorities through uapiMáté Eckl3-0/+3
Netfilter exposes standard hook priorities in case of ipv4, ipv6 and arp but not in case of bridge. This patch exposes the hook priority values of the bridge family (which are different from the formerly mentioned) via uapi so that they can be used by user-space applications just like the others. Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: nf_tables: match on tunnel metadataPablo Neira Ayuso1-1/+111
This patch allows us to match on the tunnel metadata that is available of the packet. We can use this to validate if the packet comes from/goes to tunnel and the corresponding tunnel ID. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: nf_tables: add tunnel supportPablo Neira Ayuso4-0/+466
This patch implements the tunnel object type that can be used to configure tunnels via metadata template through the existing lightweight API from the ingress path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: nft_tproxy: Add missing config checkMáté Eckl1-0/+2
A config check was missing form the code when using nf_defrag_ipv6_enable with NFT_TPROXY != n and NF_DEFRAG_IPV6 = n and this caused the following error: ../net/netfilter/nft_tproxy.c: In function 'nft_tproxy_init': ../net/netfilter/nft_tproxy.c:237:3: error: implicit declaration of function +'nf_defrag_ipv6_enable' [-Werror=implicit-function-declaration] err = nf_defrag_ipv6_enable(ctx->net); This patch adds a check for NF_TABLES_IPV6 when NF_DEFRAG_IPV6 is selected by Kconfig. Reported-by: Randy Dunlap <rdunlap@infradead.org> Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Signed-off-by: Máté Eckl <ecklm94@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03l2tp: ignore L2TP_ATTR_MTUGuillaume Nault5-23/+9
This attribute's handling is broken. It can only be used when creating Ethernet pseudo-wires, in which case its value can be used as the initial MTU for the l2tpeth device. However, when handling update requests, L2TP_ATTR_MTU only modifies session->mtu. This value is never propagated to the l2tpeth device. Dump requests also return the value of session->mtu, which is not synchronised anymore with the device MTU. The same problem occurs if the device MTU is properly updated using the generic IFLA_MTU attribute. In this case, session->mtu is not updated, and L2TP_ATTR_MTU will report an invalid value again when dumping the session. It does not seem worthwhile to complexify l2tp_eth.c to synchronise session->mtu with the device MTU. Even the ip-l2tp manpage advises to use 'ip link' to initialise the MTU of l2tpeth devices (iproute2 does not handle L2TP_ATTR_MTU at all anyway). So let's just ignore it entirely. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-03l2tp: simplify MTU handling in l2tp_pppGuillaume Nault1-49/+18
The value of the session's .mtu field, as defined by pppol2tp_connect() or pppol2tp_session_create(), is later overwritten by pppol2tp_session_init() (unless getting the tunnel's socket PMTU fails). This field is then only used when setting the PPP channel's MTU in pppol2tp_connect(). Furthermore, the SIOC[GS]IFMTU ioctls only act on the session's .mtu without propagating this value to the PPP channel, making them useless. This patch initialises the PPP channel's MTU directly and ignores the session's .mtu entirely. MTU is still computed by subtracting the PPPOL2TP_HEADER_OVERHEAD constant. It is not optimal, but that doesn't really matter: po->chan.mtu is only used when the channel is part of a multilink PPP bundle. Running multilink PPP over packet switched networks is certainly not going to be efficient, so not picking the best MTU does not harm (in the worst case, packets will just be fragmented by the underlay). The SIOC[GS]IFMTU ioctls are removed entirely (as opposed to simply ignored), because these ioctls commands are part of the requests that should be handled generically by the socket layer. PX_PROTO_OL2TP was the only socket type abusing these ioctls. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-03l2tp: define l2tp_tunnel_dst_mtu()Guillaume Nault3-21/+26
Consolidate retrieval of tunnel's socket mtu in order to simplify l2tp_eth and l2tp_ppp a bit. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>