summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2020-01-25net: sched: sch_tbf: Don't overwrite backlog before dumpingPetr Machata1-1/+0
In 2011, in commit b0460e4484f9 ("sch_tbf: report backlog information"), TBF started copying backlog depth from the child Qdisc before dumping, with the motivation that the backlog was otherwise not visible in "tc -s qdisc show". Later, in 2016, in commit 8d5958f424b6 ("sch_tbf: update backlog as well"), TBF got a full-blown backlog tracking. However it kept copying the child's backlog over before dumping. That line is now unnecessary, so remove it. As shown in the following example, backlog is still reported correctly: # tc -s qdisc show dev veth0 invisible qdisc tbf 1: root refcnt 2 rate 1Mbit burst 128Kb lat 82.8s Sent 505475370 bytes 406985 pkt (dropped 0, overlimits 812544 requeues 0) backlog 81972b 66p requeues 0 qdisc bfifo 0: parent 1:1 limit 10Mb Sent 505475370 bytes 406985 pkt (dropped 0, overlimits 0 requeues 0) backlog 81972b 66p requeues 0 Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25devlink: Add health recover notifications on devlink flowsMoshe Shemesh1-87/+89
Devlink health recover notifications were added only on driver direct updates of health_state through devlink_health_reporter_state_update(). Add notifications on updates of health_state by devlink flows of report and recover. Moved functions devlink_nl_health_reporter_fill() and devlink_recover_notify() to avoid forward declaration. Fixes: 97ff3bd37fac ("devlink: add devink notification when reporter update health state") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25net: atm: use %*ph to print small bufferAndy Shevchenko2-70/+28
Use %*ph format to print small buffer as hex string. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mptcp: Fix code formattingMat Martineau2-3/+3
checkpatch.pl had a few complaints in the last set of MPTCP patches: ERROR: code indent should use tabs where possible +^I subflow, sk->sk_family, icsk->icsk_af_ops, target, mapped);$ CHECK: Comparison to NULL could be written "!new_ctx" + if (new_ctx == NULL) { ERROR: "foo * bar" should be "foo *bar" +static const struct proto_ops * tcp_proto_ops(struct sock *sk) Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-25mptcp: do not inherit inet proto opsFlorian Westphal1-19/+51
We need to initialise the struct ourselves, else we expose tcp-specific callbacks such as tcp_splice_read which will then trigger splat because the socket is an mptcp one: BUG: KASAN: slab-out-of-bounds in tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57 Write of size 8 at addr ffff888116aa21d0 by task syz-executor.0/5478 CPU: 1 PID: 5478 Comm: syz-executor.0 Not tainted 5.5.0-rc6 #3 Call Trace: tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57 tcp_rcv_space_adjust+0x72/0x7f0 net/ipv4/tcp_input.c:612 tcp_read_sock+0x622/0x990 net/ipv4/tcp.c:1674 tcp_splice_read+0x20b/0xb40 net/ipv4/tcp.c:791 do_splice+0x1259/0x1560 fs/splice.c:1205 To prevent build error with ipv6, add the recv/sendmsg function declaration to ipv6.h. The functions are already accessible "thanks" to retpoline related work, but they are currently only made visible by socket.c specific INDIRECT_CALLABLE macros. Reported-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: cope with later TCP fallbackPaolo Abeni2-17/+103
With MPTCP v1, passive connections can fallback to TCP after the subflow becomes established: syn + MP_CAPABLE -> <- syn, ack + MP_CAPABLE ack, seq = 3 -> // OoO packet is accepted because in-sequence // passive socket is created, is in ESTABLISHED // status and tentatively as MP_CAPABLE ack, seq = 2 -> // no MP_CAPABLE opt, subflow should fallback to TCP We can't use the 'subflow' socket fallback, as we don't have it available for passive connection. Instead, when the fallback is detected, replace the mptcp socket with the underlying TCP subflow. Beyond covering the above scenario, it makes a TCP fallback socket as efficient as plain TCP ones. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: process MP_CAPABLE data optionChristoph Paasch4-27/+95
This patch implements the handling of MP_CAPABLE + data option, as per RFC 6824 bis / RFC 8684: MPTCP v1. On the server side we can receive the remote key after that the connection is established. We need to explicitly track the 'missing remote key' status and avoid emitting a mptcp ack until we get such info. When a late/retransmitted/OoO pkt carrying MP_CAPABLE[+data] option is received, we have to propagate the mptcp seq number info to the msk socket. To avoid ABBA locking issue, explicitly check for that in recvmsg(), where we own msk and subflow sock locks. The above also means that an established mp_capable subflow - still waiting for the remote key - can be 'downgraded' to plain TCP. Such change could potentially block a reader waiting for new data forever - as they hook to msk, while later wake-up after the downgrade will be on subflow only. The above issue is not handled here, we likely have to get rid of msk->fallback to handle that cleanly. Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: parse and emit MP_CAPABLE option according to v1 specChristoph Paasch5-38/+148
This implements MP_CAPABLE options parsing and writing according to RFC 6824 bis / RFC 8684: MPTCP v1. Local key is sent on syn/ack, and both keys are sent on 3rd ack. MP_CAPABLE messages len are updated accordingly. We need the skbuff to correctly emit the above, so we push the skbuff struct as an argument all the way from tcp code to the relevant mptcp callbacks. When processing incoming MP_CAPABLE + data, build a full blown DSS-like map info, to simplify later processing. On child socket creation, we need to record the remote key, if available. Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: move from sha1 (v0) to sha256 (v1)Paolo Abeni4-59/+104
For simplicity's sake use directly sha256 primitives (and pull them as a required build dep). Add optional, boot-time self-tests for the hmac function. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: new sysctl to control the activation per NSMatthieu Baerts4-5/+146
New MPTCP sockets will return -ENOPROTOOPT if MPTCP support is disabled for the current net namespace. We are providing here a way to control access to the feature for those that need to turn it on or off. The value of this new sysctl can be different per namespace. We can then restrict the usage of MPTCP to the selected NS. In case of serious issues with MPTCP, administrators can now easily turn MPTCP off. Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: allow collapsing consecutive sendpages on the same substreamPaolo Abeni1-15/+60
If the current sendmsg() lands on the same subflow we used last, we can try to collapse the data. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: recvmsg() can drain data from multiple subflowsPaolo Abeni1-10/+168
With the previous patch in place, the msk can detect which subflow has the current map with a simple walk, let's update the main loop to always select the 'current' subflow. The exit conditions now closely mirror tcp_recvmsg() to get expected timeout and signal behavior. Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: add subflow write space signalling and mptcp_pollFlorian Westphal3-0/+57
Add new SEND_SPACE flag to indicate that a subflow has enough space to accept more data for transmission. It gets cleared at the end of mptcp_sendmsg() in case ssk has run below the free watermark. It is (re-set) from the wspace callback. This allows us to use msk->flags to determine the poll mask. Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Implement MPTCP receive pathMat Martineau5-5/+591
Parses incoming DSS options and populates outgoing MPTCP ACK fields. MPTCP fields are parsed from the TCP option header and placed in an skb extension, allowing the upper MPTCP layer to access MPTCP options after the skb has gone through the TCP stack. The subflow implements its own data_ready() ops, which ensures that the pending data is in sequence - according to MPTCP seq number - dropping out-of-seq skbs. The DATA_READY bit flag is set if this is the case. This allows the MPTCP socket layer to determine if more data is available without having to consult the individual subflows. It additionally validates the current mapping and propagates EoF events to the connection socket. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Write MPTCP DSS headers to outgoing data packetsMat Martineau3-6/+285
Per-packet metadata required to write the MPTCP DSS option is written to the skb_ext area. One write to the socket may contain more than one packet of data, which is copied to page fragments and mapped in to MPTCP DSS segments with size determined by the available page fragments and the maximum mapping length allowed by the MPTCP specification. If do_tcp_sendpages() splits a DSS segment in to multiple skbs, that's ok - the later skbs can either have duplicated DSS mapping information or none at all, and the receiver can handle that. The current implementation uses the subflow frag cache and tcp sendpages to avoid excessive code duplication. More work is required to ensure that it works correctly under memory pressure and to support MPTCP-level retransmissions. The MPTCP DSS checksum is not yet implemented. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Add setsockopt()/getsockopt() socket operationsPeter Krystad1-0/+58
set/getsockopt behaviour with multiple subflows is undefined. Therefore, for now, we return -EOPNOTSUPP unless we're in fallback mode. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Add shutdown() socket operationPeter Krystad1-0/+66
Call shutdown on all subflows in use on the given socket, or on the fallback socket. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Add key generation and token treePeter Krystad6-8/+428
Generate the local keys, IDSN, and token when creating a new socket. Introduce the token tree to track all tokens in use using a radix tree with the MPTCP token itself as the index. Override the rebuild_header callback in inet_connection_sock_af_ops for creating the local key on a new outgoing connection. Override the init_req callback of tcp_request_sock_ops for creating the local key on a new incoming connection. Will be used to obtain the MPTCP parent socket to handle incoming joins. Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Create SUBFLOW socket for incoming connectionsPeter Krystad1-5/+231
Add subflow_request_sock type that extends tcp_request_sock and add an is_mptcp flag to tcp_request_sock distinguish them. Override the listen() and accept() methods of the MPTCP socket proto_ops so they may act on the subflow socket. Override the conn_request() and syn_recv_sock() handlers in the inet_connection_sock to handle incoming MPTCP SYNs and the ACK to the response SYN. Add handling in tcp_output.c to add MP_CAPABLE to an outgoing SYN-ACK response for a subflow_request_sock. Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Handle MP_CAPABLE options for outgoing connectionsPeter Krystad7-24/+603
Add hooks to tcp_output.c to add MP_CAPABLE to an outgoing SYN request, to capture the MP_CAPABLE in the received SYN-ACK, to add MP_CAPABLE to the final ACK of the three-way handshake. Use the .sk_rx_dst_set() handler in the subflow proto to capture when the responding SYN-ACK is received and notify the MPTCP connection layer. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Associate MPTCP context with TCP socketPeter Krystad4-7/+272
Use ULP to associate a subflow_context structure with each TCP subflow socket. Creating these sockets requires new bind and connect functions to make sure ULP is set up immediately when the subflow sockets are created. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Handle MPTCP TCP optionsPeter Krystad5-1/+145
Add hooks to parse and format the MP_CAPABLE option. This option is handled according to MPTCP version 0 (RFC6824). MPTCP version 1 MP_CAPABLE (RFC6824bis/RFC8684) will be added later in coordination with related code changes. Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24mptcp: Add MPTCP socket stubsMat Martineau8-0/+195
Implements the infrastructure for MPTCP sockets. MPTCP sockets open one in-kernel TCP socket per subflow. These subflow sockets are only managed by the MPTCP socket that owns them and are not visible from userspace. This commit allows a userspace program to open an MPTCP socket with: sock = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); The resulting socket is simply a wrapper around a single regular TCP socket, without any of the MPTCP protocol implemented over the wire. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com> Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24net: bridge: vlan: add per-vlan stateNikolay Aleksandrov5-19/+133
The first per-vlan option added is state, it is needed for EVPN and for per-vlan STP. The state allows to control the forwarding on per-vlan basis. The vlan state is considered only if the port state is forwarding in order to avoid conflicts and be consistent. br_allowed_egress is called only when the state is forwarding, but the ingress case is a bit more complicated due to the fact that we may have the transition between port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still allow the bridge to learn from the packet after vlan filtering and it will be dropped after that. Also to optimize the pvid state check we keep a copy in the vlan group to avoid one lookup. The state members are modified with *_ONCE() to annotate the lockless access. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24net: bridge: vlan: add basic option setting supportNikolay Aleksandrov3-7/+129
This patch adds support for option modification of single vlans and ranges. It allows to only modify options, i.e. skip create/delete by using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range option changes we try to pack the notifications as much as possible. v2: do full port (all vlans) notification only when creating/deleting vlans for compatibility, rework the range detection when changing options, add more verbose extack errors and check if a vlan should be used (br_vlan_should_use checks) Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24net: bridge: vlan: add basic option dumping supportNikolay Aleksandrov4-7/+48
We'll be dumping the options for the whole range if they're equal. The first range vlan will be used to extract the options. The commit doesn't change anything yet it just adds the skeleton for the support. The dump will happen when the first option is added. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24net: bridge: check port state before br_allowed_egressNikolay Aleksandrov1-1/+1
If we make sure that br_allowed_egress is called only when we have BR_STATE_FORWARDING state then we can avoid a test later when we add per-vlan state. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: add Flow Queue PIE packet schedulerMohit P. Tahiliani3-0/+576
Principles: - Packets are classified on flows. - This is a Stochastic model (as we use a hash, several flows might be hashed to the same slot) - Each flow has a PIE managed queue. - Flows are linked onto two (Round Robin) lists, so that new flows have priority on old ones. - For a given flow, packets are not reordered. - Drops during enqueue only. - ECN capability is off by default. - ECN threshold (if ECN is enabled) is at 10% by default. - Uses timestamps to calculate queue delay by default. Usage: tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ] [ target TIME ] [ tupdate TIME ] [ alpha NUMBER ] [ beta NUMBER ] [ quantum BYTES ] [ memory_limit BYTES ] [ ecnprob PERCENTAGE ] [ [no]ecn ] [ [no]bytemode ] [ [no_]dq_rate_estimator ] defaults: limit: 10240 packets, flows: 1024 target: 15 ms, tupdate: 15 ms (in jiffies) alpha: 1/8, beta : 5/4 quantum: device MTU, memory_limit: 32 Mb ecnprob: 10%, ecn: off bytemode: off, dq_rate_estimator: off Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: V. Saicharan <vsaicharan1998@gmail.com> Signed-off-by: Mohit Bhasi <mohitbhasi1998@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: export symbols to be reused by FQ-PIEMohit P. Tahiliani1-85/+88
This patch makes the drop_early(), calculate_probability() and pie_process_dequeue() functions generic enough to be used by both PIE and FQ-PIE (to be added in a future commit). The major change here is in the way the functions take in arguments. This patch exports these functions and makes FQ-PIE dependent on sch_pie. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: fix alignment in struct instancesMohit P. Tahiliani1-9/+9
Make the alignment in the initialization of the struct instances consistent in the file. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: fix commentingMohit P. Tahiliani1-5/+5
Fix punctuation and logical mistakes in the comments. The logical mistake was that "dequeue_rate" is no longer the default way to calculate queuing delay and is not needed. The default way to calculate queue delay was changed in commit cec2975f2b70 ("net: sched: pie: enable timestamp based delay calculation"). Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23pie: rearrange structure members and their initializationsMohit P. Tahiliani1-1/+1
Rearrange the members of the structure such that closely referenced members appear together and/or fit in the same cacheline. Also, change the order of their initializations to match the order in which they appear in the structure. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: move common code to pie.hMohit P. Tahiliani1-85/+1
This patch moves macros, structures and small functions common to PIE and FQ-PIE (to be added in a future commit) from the file net/sched/sch_pie.c to the header file include/net/pie.h. All the moved functions are made inline. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller11-92/+303
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-01-22 The following pull-request contains BPF updates for your *net-next* tree. We've added 92 non-merge commits during the last 16 day(s) which contain a total of 320 files changed, 7532 insertions(+), 1448 deletions(-). The main changes are: 1) function by function verification and program extensions from Alexei. 2) massive cleanup of selftests/bpf from Toke and Andrii. 3) batched bpf map operations from Brian and Yonghong. 4) tcp congestion control in bpf from Martin. 5) bulking for non-map xdp_redirect form Toke. 6) bpf_send_signal_thread helper from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23bpf: Add BPF_FUNC_jiffies64Martin KaFai Lau1-0/+2
This patch adds a helper to read the 64bit jiffies. It will be used in a later patch to implement the bpf_cubic.c. The helper is inlined for jit_requested and 64 BITS_PER_LONG as the map_gen_lookup(). Other cases could be considered together with map_gen_lookup() if needed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200122233646.903260-1-kafai@fb.com
2020-01-22xsk, net: Make sock_def_readable() have external linkageBjörn Töpel2-2/+2
XDP sockets use the default implementation of struct sock's sk_data_ready callback, which is sock_def_readable(). This function is called in the XDP socket fast-path, and involves a retpoline. By letting sock_def_readable() have external linkage, and being called directly, the retpoline can be avoided. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200120092917.13949-1-bjorn.topel@gmail.com
2020-01-21Merge branch 'master' of ↵David S. Miller11-43/+819
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2020-01-21 1) Add support for TCP encapsulation of IKE and ESP messages, as defined by RFC 8229. Patchset from Sabrina Dubroca. Please note that there is a merge conflict in: net/unix/af_unix.c between commit: 3c32da19a858 ("unix: Show number of pending scm files of receive queue in fdinfo") from the net-next tree and commit: b50b0580d27b ("net: add queue argument to __skb_wait_for_more_packets and __skb_{,try_}recv_datagram") from the ipsec-next tree. The conflict can be solved as done in linux-next. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21tcp/ipv4: remove AF_INET_FAMILYAlex Shi1-6/+0
After commit 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") the macro isn't used anymore. remove it. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21net/hsr: remove seq_nr_after_or_eqAlex Shi1-1/+0
It's never used after introduced. So maybe better to remove. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Arvid Brodin <arvid.brodin@alten.se> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21net/smc: allow unprivileged users to read pnet tableHans Wippel1-1/+1
The current flags of the SMC_PNET_GET command only allow privileged users to retrieve entries from the pnet table via netlink. The content of the pnet table may be useful for all users though, e.g., for debugging smc connection problems. This patch removes the GENL_ADMIN_PERM flag so that unprivileged users can read the pnet table. Signed-off-by: Hans Wippel <ndev@hwipl.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21Merge tag 'rds-odp-for-5.5' of ↵David S. Miller7-58/+257
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma Leon Romanovsky says: ==================== Use ODP MRs for kernel ULPs The following series extends MR creation routines to allow creation of user MRs through kernel ULPs as a proxy. The immediate use case is to allow RDS to work over FS-DAX, which requires ODP (on-demand-paging) MRs to be created and such MRs were not possible to create prior this series. The first part of this patchset extends RDMA to have special verb ib_reg_user_mr(). The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP acts as an agent for the userspace application. The second part provides advise MR functionality for ULPs. This is integral part of ODP flows and used to trigger pagefaults in advance to prepare memory before running working set. The third part is actual user of those in-kernel APIs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-20Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller35-166/+290
2020-01-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds33-159/+266
Pull networking fixes from David Miller: 1) Fix non-blocking connect() in x25, from Martin Schiller. 2) Fix spurious decryption errors in kTLS, from Jakub Kicinski. 3) Netfilter use-after-free in mtype_destroy(), from Cong Wang. 4) Limit size of TSO packets properly in lan78xx driver, from Eric Dumazet. 5) r8152 probe needs an endpoint sanity check, from Johan Hovold. 6) Prevent looping in tcp_bpf_unhash() during sockmap/tls free, from John Fastabend. 7) hns3 needs short frames padded on transmit, from Yunsheng Lin. 8) Fix netfilter ICMP header corruption, from Eyal Birger. 9) Fix soft lockup when low on memory in hns3, from Yonglong Liu. 10) Fix NTUPLE firmware command failures in bnxt_en, from Michael Chan. 11) Fix memory leak in act_ctinfo, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits) cxgb4: reject overlapped queues in TC-MQPRIO offload cxgb4: fix Tx multi channel port rate limit net: sched: act_ctinfo: fix memory leak bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal. bnxt_en: Fix ipv6 RFS filter matching logic. bnxt_en: Fix NTUPLE firmware command failures. net: systemport: Fixed queue mapping in internal ring map net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec net: dsa: sja1105: Don't error out on disabled ports with no phy-mode net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset net: hns: fix soft lockup when there is not enough memory net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key() net/sched: act_ife: initalize ife->metalist earlier netfilter: nat: fix ICMP header corruption on ICMP errors net: wan: lapbether.c: Use built-in RCU list checking netfilter: nf_tables: fix flowtable list del corruption netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks() netfilter: nf_tables: remove WARN and add NLA_STRING upper limits netfilter: nft_tunnel: ERSPAN_VERSION must not be null netfilter: nft_tunnel: fix null-attribute check ...
2020-01-19devlink: Add overlay source MAC is multicast trapAmit Cohen1-0/+1
Add packet trap that can report NVE packets that the device decided to drop because their overlay source MAC is multicast. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19devlink: Add tunnel generic packet trapsAmit Cohen1-0/+2
Add packet traps that can report packets that were dropped during tunnel decapsulation. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19devlink: Add non-routable packet trapAmit Cohen1-0/+1
Add packet trap that can report packets that reached the router, but are non-routable. For example, IGMP queries can be flooded by the device in layer 2 and reach the router. Such packets should not be routed and instead dropped. Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19net: sched: act_ctinfo: fix memory leakEric Dumazet1-0/+11
Implement a cleanup method to properly free ci->params BUG: memory leak unreferenced object 0xffff88811746e2c0 (size 64): comm "syz-executor617", pid 7106, jiffies 4294943055 (age 14.250s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ c0 34 60 84 ff ff ff ff 00 00 00 00 00 00 00 00 .4`............. backtrace: [<0000000015aa236f>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<0000000015aa236f>] slab_post_alloc_hook mm/slab.h:586 [inline] [<0000000015aa236f>] slab_alloc mm/slab.c:3320 [inline] [<0000000015aa236f>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549 [<000000002c946bd1>] kmalloc include/linux/slab.h:556 [inline] [<000000002c946bd1>] kzalloc include/linux/slab.h:670 [inline] [<000000002c946bd1>] tcf_ctinfo_init+0x21a/0x530 net/sched/act_ctinfo.c:236 [<0000000086952cca>] tcf_action_init_1+0x400/0x5b0 net/sched/act_api.c:944 [<000000005ab29bf8>] tcf_action_init+0x135/0x1c0 net/sched/act_api.c:1000 [<00000000392f56f9>] tcf_action_add+0x9a/0x200 net/sched/act_api.c:1410 [<0000000088f3c5dd>] tc_ctl_action+0x14d/0x1bb net/sched/act_api.c:1465 [<000000006b39d986>] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424 [<00000000fd6ecace>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477 [<0000000047493d02>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442 [<00000000bdcf8286>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] [<00000000bdcf8286>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328 [<00000000fc5b92d9>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917 [<00000000da84d076>] sock_sendmsg_nosec net/socket.c:639 [inline] [<00000000da84d076>] sock_sendmsg+0x54/0x70 net/socket.c:659 [<0000000042fb2eee>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330 [<000000008f23f67e>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384 [<00000000d838e4f6>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417 [<00000000289a9cb1>] __do_sys_sendmsg net/socket.c:2426 [inline] [<00000000289a9cb1>] __se_sys_sendmsg net/socket.c:2424 [inline] [<00000000289a9cb1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424 Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller7-154/+314
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) Incorrect uapi header comment in bitwise, from Jeremy Sowden. 2) Fetch flow statistics if flow is still active. 3) Restrict flow matching on hardware based on input device. 4) Add nf_flow_offload_work_alloc() helper function. 5) Remove the last client of the FLOW_OFFLOAD_DYING flag, use teardown instead. 6) Use atomic bitwise operation to operate with flow flags. 7) Add nf_flowtable_hw_offload() helper function to check for the NF_FLOWTABLE_HW_OFFLOAD flag. 8) Add NF_FLOW_HW_REFRESH to retry hardware offload from the flowtable software datapath. 9) Remove indirect calls in xt_hashlimit, from Florian Westphal. 10) Add nf_flow_offload_tuple() helper to consolidate code. 11) Add nf_flow_table_offload_cmd() helper function. 12) A few whitespace cleanups in nf_tables in bitwise and the bitmap/hash set types, from Jeremy Sowden. 13) Cleanup netlink attribute checks in bitwise, from Jeremy Sowden. 14) Replace goto by return in error path of nft_bitwise_dump(), from Jeremy Sowden. 15) Add bitwise operation netlink attribute, also from Jeremy. 16) Add nft_bitwise_init_bool(), from Jeremy Sowden. 17) Add nft_bitwise_eval_bool(), also from Jeremy. 18) Add nft_bitwise_dump_bool(), from Jeremy Sowden. 19) Disallow hardware offload for other that NFT_BITWISE_BOOL, from Jeremy Sowden. 20) Add NFTA_BITWISE_DATA netlink attribute, again from Jeremy. 21) Add support for bitwise shift operation, from Jeremy Sowden. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-18net/rds: Use prefetch for On-Demand-Paging MRHans Westgaard Ry1-0/+9
Try prefetching pages when using On-Demand-Paging MR using ib_advise_mr. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-18net/rds: Handle ODP mr registration/unregistrationHans Westgaard Ry7-56/+244
On-Demand-Paging MRs are registered using ib_reg_user_mr and unregistered with ib_dereg_mr. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>