summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-06-15net_sched: make tcf_hash_check() booleanWANG Cong7-11/+16
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net: vrf: Handle ipv6 multicast and link-local addressesDavid Ahern2-3/+4
IPv6 multicast and link-local addresses require special handling by the VRF driver: 1. Rather than using the VRF device index and full FIB lookups, packets to/from these addresses should use direct FIB lookups based on the VRF device table. 2. fail sends/receives on a VRF device to/from a multicast address (e.g, make ping6 ff02::1%<vrf> fail) 3. move the setting of the flow oif to the first dst lookup and revert the change in icmpv6_echo_reply made in ca254490c8dfd ("net: Add VRF support to IPv6 stack"). Linklocal/mcast addresses require use of the skb->dev. With this change connections into and out of a VRF enslaved device work for multicast and link-local addresses work (icmp, tcp, and udp) e.g., 1. packets into VM with VRF config: ping6 -c3 fe80::e0:f9ff:fe1c:b974%br1 ping6 -c3 ff02::1%br1 ssh -6 fe80::e0:f9ff:fe1c:b974%br1 2. packets going out a VRF enslaved device: ping6 -c3 fe80::18f8:83ff:fe4b:7a2e%eth1 ping6 -c3 ff02::1%eth1 ssh -6 root@fe80::18f8:83ff:fe4b:7a2e%eth1 Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net: ipv6: Do not add multicast route for l3 master devicesDavid Ahern1-1/+1
L3 master devices are virtual devices similar to the loopback device. Link local and multicast routes for these devices do not make sense. The ipv6 addrconf code already skips adding a linklocal address; do the same for the mcast route. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net: l3mdev: Remove const from flowi6 arg to get_rt6_dstDavid Ahern1-1/+1
Allow drivers to pass flow arg to functions where the arg is not const and allow the driver to make updates as needed (eg., setting oif). Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15af_iucv: use paged SKBs for big inbound messagesEugene Crosser1-6/+50
When an inbound message is bigger than a page, allocate a paged SKB, and subsequently use IUCV receive primitive with IPBUFLST flag. This relaxes the pressure to allocate big contiguous kernel buffers. Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15af_iucv: remove fragment_skb() to use paged SKBsEugene Crosser1-56/+3
Before introducing paged skbs in the receive path, get rid of the function `iucv_fragment_skb()` that replaces one large linear skb with several smaller linear skbs. Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15af_iucv: use paged SKBs for big outbound messagesEugene Crosser1-47/+77
When an outbound message is bigger than a page, allocate and fill a paged SKB, and subsequently use IUCV send primitive with IPBUFLST flag. This relaxes the pressure to allocate big contiguous kernel buffers. Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15act_police: rename tcf_act_police_locate() to tcf_act_police_init()WANG Cong1-4/+4
This function is just ->init(), rename it to make it obvious. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net_sched: remove internal use of TC_POLICE_*WANG Cong2-3/+3
These should be gone when we removed CONFIG_NET_CLS_POLICE. We can not totally remove them since they are exposed to userspace. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: Update rds_conn_destroy to be MP capableSowmini Varadhan1-20/+39
Refactor rds_conn_destroy() so that the per-path dismantling is done in rds_conn_path_destroy, and then iterate as needed over rds_conn_path_destroy(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: Update rds_conn_shutdown to work with rds_conn_pathSowmini Varadhan5-38/+44
This commit changes rds_conn_shutdown to take a rds_conn_path * argument, allowing it to shutdown paths other than c_path[0] for MP-capable transports. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: Initialize all RDS_MPATH_WORKERS in __rds_conn_createSowmini Varadhan1-20/+45
Add a for() loop in __rds_conn_create to initialize all the conn_paths, in preparate for MP capable transports. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: Add rds_conn_path_error()Sowmini Varadhan3-1/+18
rds_conn_path_error() is the MP-aware analog of rds_conn_error, to be used by multipath-capable callers. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: update rds-info related functions to traverse multiple conn_pathsSowmini Varadhan1-27/+82
This commit updates the callbacks related to the rds-info command so that they walk through all the rds_conn_path structures and report the requested info. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: Add rds_conn_path_connect_if_down() for MP-aware callersSowmini Varadhan3-8/+14
rds_conn_path_connect_if_down() works on the rds_conn_path that it is passed. Callers who are not t_m_capable may continue calling rds_conn_connect_if_down, which will invoke rds_conn_path_connect_if_down() with the default c_path[0]. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: Make rds_send_pong() take a rds_conn_path argumentSowmini Varadhan3-14/+14
This commit allows rds_send_pong() callers to send back the rds pong message on some path other than c_path[0] by passing in a struct rds_conn_path * argument. It also removes the last dependency on the #defines in rds_single.h from send.c Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: Extract rds_conn_path from i_conn_path in rds_send_drop_to() for ↵Sowmini Varadhan1-3/+8
MP-capable transports Explicitly set up rds_conn_path, either from i_conn_path (for MP capable transpots) or as c_path[0], and use this in rds_send_drop_to() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: Pass rds_conn_path to rds_send_xmit()Sowmini Varadhan4-70/+87
Pass a struct rds_conn_path to rds_send_xmit so that MP capable transports can transmit packets on something other than c_path[0]. The eventual goal for MP capable transports is to hash the rds socket to a path based on the bound local address/port, and use this path as the argument to rds_send_xmit() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: Make rds_send_queue_rm() rds_conn_path awareSowmini Varadhan1-6/+11
Pass the rds_conn_path to rds_send_queue_rm, and use it to initialize the i_conn_path field in struct rds_incoming. This commit also makes rds_send_queue_rm() MP capable, because it now takes locks specific to the rds_conn_path passed in, instead of defaulting to the c_path[0] based defines from rds_single_path.h Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: Remove stale function rds_send_get_message()Sowmini Varadhan2-38/+0
The only caller of rds_send_get_message() was rds_iw_send_cq_comp_handler() which was removed as part of commit dcdede0406d3 ("RDS: Drop stale iWARP RDMA transport"), so remove rds_send_get_message() for the same reason. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: Add rds_send_path_drop_acked()Sowmini Varadhan2-5/+15
rds_send_path_drop_acked() is the path-specific version of rds_send_drop_acked() to be invoked by MP capable callers. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: Add rds_send_path_reset()Sowmini Varadhan1-17/+22
rds_send_path_reset() is the path specific version of rds_send_reset() intended for MP capable callers. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: rds_inc_path_init() helper function for MP capable transportsSowmini Varadhan2-0/+16
t_mp_capable transports can use rds_inc_path_init to initialize all fields in struct rds_incoming, including the i_conn_path. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: recv path gets the conn_path from rds_incoming for MP capable transportsSowmini Varadhan2-4/+9
Transports that are t_mp_capable should set the rds_conn_path on which the datagram was recived in the ->i_conn_path field of struct rds_incoming. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: add t_mp_capable bit to be set by MP capable transportsSowmini Varadhan1-1/+6
The t_mp_capable bit will be used in the core rds module to support multipathing logic when the transport supports it. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15RDS: split out connection specific state from rds_connection to rds_conn_pathSowmini Varadhan19-93/+199
In preparation for multipath RDS, split the rds_connection structure into a base structure, and a per-path struct rds_conn_path. The base structure tracks information and locks common to all paths. The workqs for send/recv/shutdown etc are tracked per rds_conn_path. Thus the workq callbacks now work with rds_conn_path. This commit allows for one rds_conn_path per rds_connection, and will be extended into multiple conn_paths in subsequent commits. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15tcp: return sizeof tcp_dctcp_info in dctcp_get_info()Neal Cardwell1-2/+2
Make sure that dctcp_get_info() returns only the size of the info->dctcp struct that it zeroes out and fills in. Previously it had been returning the size of the enclosing tcp_cc_info union, sizeof(*info). There is no problem yet, but that union that may one day be larger than struct tcp_dctcp_info, in which case the TCP_CC_INFO code might accidentally copy uninitialized bytes from the stack. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15sctp: fix error return code in sctp_init()Wei Yongjun1-1/+2
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15Merge tag 'rxrpc-rewrite-20160613' of ↵David S. Miller18-86/+86
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Rename rxrpc source files Here's the next part of the AF_RXRPC rewrite. In this set I rename some of the files in the net/rxrpc/ directory and adjust the Makefile and ar-internal.h to reflect the changes. The aim is twofold: (1) Remove the "ar-" prefix on those files that have it as it's not really useful, especially now that I'm building rxkad in. (2) To aid splitting the local, peer, connection and call handling code into separate files for object and event handling in future patches by making it easier to come up with new filenames. There are two commits: (1) The first commit does a bunch of renames of .c files and alters the Makefile. ar-internal.h isn't renamed at this time to avoid having to change the contents of the files being renamed. (2) The second commit changes the section label comments in ar-internal.h to reflect the changed filenames and reorders the file so that the sections are back in filename order. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite Tagged thusly: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-rewrite-20160613 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net/sched: flower: Return error when hw can't offload and skip_sw is setAmir Vadai1-17/+25
When skip_sw is set and hardware fails to apply filter, return error to user. This will make error propagation logic similar to the one currently used in u32 classifier. Also, changed code to use tc_skip_sw() utility function. Signed-off-by: Amir Vadai <amirva@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-13rxrpc: Update the comments in ar-internal.h to reflect renamesDavid Howells1-69/+69
Update the section comments in ar-internal.h that indicate the locations of the referenced items to reflect the renames done to the .c files in net/rxrpc/. This also involves some rearrangement to reflect keep the sections in order of filename. Signed-off-by: David Howells <dhowells@redhat.com>
2016-06-13rxrpc: Rename files matching ar-*.c to git rid of the "ar-" prefixDavid Howells17-17/+17
Rename files matching net/rxrpc/ar-*.c to get rid of the "ar-" prefix. This will aid splitting those files by making easier to come up with new names. Note that the not all files are simply renamed from ar-X.c to X.c. The following exceptions are made: (*) ar-call.c -> call_object.c ar-ack.c -> call_event.c call_object.c is going to contain the core of the call object handling. Call event handling is all going to be in call_event.c. (*) ar-accept.c -> call_accept.c Incoming call handling is going to be here. (*) ar-connection.c -> conn_object.c ar-connevent.c -> conn_event.c The former file is going to have the basic connection object handling, but there will likely be some differentiation between client connections and service connections in additional files later. The latter file will have all the connection-level event handling. (*) ar-local.c -> local_object.c This will have the local endpoint object handling code. The local endpoint event handling code will later be split out into local_event.c. (*) ar-peer.c -> peer_object.c This will have the peer endpoint object handling code. Peer event handling code will be placed in peer_event.c (for the moment, there is none). (*) ar-error.c -> peer_event.c This will become the peer event handling code, though for the moment it's actually driven from the local endpoint's perspective. Note that I haven't renamed ar-transport.c to transport_object.c as the intention is to delete it when the rxrpc_transport struct is excised. The only file that actually has its contents changed is net/rxrpc/Makefile. net/rxrpc/ar-internal.h will need its section marker comments updating, but I'll do that in a separate patch to make it easier for git to follow the history across the rename. I may also want to rename ar-internal.h at some point - but that would mean updating all the #includes and I'd rather do that in a separate step. Signed-off-by: David Howells <dhowells@redhat.com.
2016-06-13sched: remove NET_XMIT_POLICEDFlorian Westphal5-7/+4
sch_atm returns this when TC_ACT_SHOT classification occurs. But all other schedulers that use tc_classify (htb, hfsc, drr, fq_codel ...) return NET_XMIT_SUCCESS | __BYPASS in this case so just do that in atm. BATMAN uses it as an intermediate return value to signal forwarding vs. buffering, but it did not return POLICED to callers outside of BATMAN. Reviewed-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-12ipv6: use TOS marks from sockets for routing decisionHannes Frederic Sowa6-11/+23
In IPv6 the ToS values are part of the flowlabel in flowi6 and get extracted during fib rule lookup, but we forgot to correctly initialize the flowlabel before the routing lookup. Reported-by: <liam.mcbirnie@boeing.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11net_sched: remove generic throttled managementEric Dumazet7-17/+4
__QDISC_STATE_THROTTLED bit manipulation is rather expensive for HTB and few others. I already removed it for sch_fq in commit f2600cf02b5b ("net: sched: avoid costly atomic operation in fq_dequeue()") and so far nobody complained. When one ore more packets are stuck in one or more throttled HTB class, a htb dequeue() performs two atomic operations to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc lock is held. Removing this pair of atomic operations bring me a 8 % performance increase on 200 TCP_RR tests, in presence of throttled classes. This patch has no side effect, since nothing actually uses disc_is_throttled() anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11net_sched: netem: remove qdisc_is_throttled() useEric Dumazet1-3/+0
Looks like it is only there as some optimization attempt. Since __QDISC_STATE_THROTTLED set/unset is way too expensive, and netem is the last user, just remove this check. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11net_sched: cbq: remove a flaky use of qdisc_is_throttled()Eric Dumazet1-1/+1
So far no qdisc ever unset the throttled bit at enqueue() time, so CBQ usage of qdisc_is_throttled() was flaky. Since __QDISC_STATE_THROTTLED set/unset is way too expensive considering that only CBQ was eventually caring for this status, it would make sense to implement a Qdisc ops ->is_throttled() if we find that this is needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11net_sched: sch_plug: use a private throttled statusEric Dumazet1-6/+8
We want to get rid of generic qdisc throttled management, so this qdisc has to use a private flag. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11sctp: sctp should change socket state when shutdown is receivedXin Long2-3/+9
Now sctp doesn't change socket state upon shutdown reception. It changes just the assoc state, even though it's a TCP-style socket. For some cases, if we really need to check sk->sk_state, it's necessary to fix this issue, at least when we use ss or netstat to dump, we can get a more exact information. As an improvement, we will change sk->sk_state when we change asoc->state to SHUTDOWN_RECEIVED, and also do it in sctp_shutdown to keep consistent with sctp_close. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11Merge tag 'mac80211-next-for-davem-2016-06-09' of ↵David S. Miller15-199/+745
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== For the next cycle, we have the following: * the biggest change is Michał's work on integrating FQ/codel with the mac80211 internal software queues * cfg80211 connect result gets clarified for the "no connection at all" case * advertisement of per-interface type capabilities, in case they differ (which makes a lot of sense for some capabilities) * most of the nl80211 & hwsim unprivileged namespace operation changes * human-readable VHT capabilities in debugfs * some other cleanups, like spelling ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11tcp: add NV congestion controlLawrence Brakmo3-0/+493
TCP-NV (New Vegas) is a major update to TCP-Vegas. An earlier version of NV was presented at 2010's LPC. It is a delayed based congestion avoidance for the data center. This version has been tested within a 10G rack where the HW RTTs are 20-50us and with 1 to 400 flows. A description of TCP-NV, including implementation details as well as experimental results, can be found at: http://www.brakmo.org/networking/tcp-nv/TCPNV.html Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11tcp: add in_flight to tcp_skb_cbLawrence Brakmo2-2/+7
Add in_flight (bytes in flight when packet was sent) field to tx component of tcp_skb_cb and make it available to congestion modules' pkts_acked() function through the ack_sample function argument. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11packet: use common code for virtio_net_hdr and skb GSO conversionMike Rapoport1-34/+2
Replace open coded conversion between virtio_net_hdr to skb GSO info with virtio_net_hdr_from_skb Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11RDS: IB: Remove deprecated create_workqueueBhaktipriya Shridhar1-1/+1
alloc_workqueue replaces deprecated create_workqueue(). Since the driver is infiniband which can be used as block device and the workqueue seems involved in regular operation of the device, so a dedicated workqueue has been used with WQ_MEM_RECLAIM set to guarantee forward progress under memory pressure. Since there are only a fixed number of work items, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11rxrpc: Limit the listening backlogDavid Howells4-8/+28
Limit the socket incoming call backlog queue size so that a remote client can't pump in sufficient new calls that the server runs out of memory. Note that this is partially theoretical at the moment since whilst the number of calls is limited, the number of packets trying to set up new calls is not. This will be addressed in a later patch. If the caller of listen() specifies a backlog INT_MAX, then they get the current maximum; anything else greater than max_backlog or anything negative incurs EINVAL. The limit on the maximum queue size can be set by: echo N >/proc/sys/net/rxrpc/max_backlog where 4<=N<=32. Further, set the default backlog to 0, requiring listen() to be called before we start actually queueing new calls. Whilst this kind of is a change in the UAPI, the caller can't actually *accept* new calls anyway unless they've first called listen() to put the socket into the LISTENING state - thus the aforementioned new calls would otherwise just sit there, eating up kernel memory. (Note that sockets that don't have a non-zero service ID bound don't get incoming calls anyway.) Given that the default backlog is now 0, make the AFS filesystem call kernel_listen() to set the maximum backlog for itself. Possible improvements include: (1) Trimming a too-large backlog to max_backlog when listen is called. (2) Trimming the backlog value whenever the value is used so that changes to max_backlog are applied to an open socket automatically. Note that the AFS filesystem opens one socket and keeps it open for extended periods, so would miss out on changes to max_backlog. (3) Having a separate setting for the AFS filesystem. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11rxrpc: Trim line-terminal whitespaceDavid Howells2-2/+2
Trim line-terminal whitespace in net/rxrpc/ Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11net, cls: allow for deleting all filters for given parentDaniel Borkmann1-4/+33
Add a possibility where the user can just specify the parent and all filters under that parent are then being purged. Currently, for example for scripting, one needs to specify pref/prio to have a well-defined number for 'tc filter del' command for addressing the previously created instance or additionally filter handle in case of priorities being the same. Improve usage by allowing the option for tc to specify the parent and removing the whole chain for that given parent. Example usage after patch, no tc changes required: # tc qdisc replace dev foo clsact # tc filter add dev foo egress bpf da obj ./bpf.o # tc filter add dev foo egress bpf da obj ./bpf.o # tc filter show dev foo egress filter protocol all pref 49151 bpf filter protocol all pref 49151 bpf handle 0x1 bpf.o:[classifier] direct-action filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 bpf.o:[classifier] direct-action # tc filter del dev foo egress # tc filter show dev foo egress # Previously, RTM_DELTFILTER requests with invalid prio of 0 were rejected, so only netlink requests with RTM_NEWTFILTER and NLM_F_CREATE flag were allowed where the kernel would auto-generate a pref/prio. We can piggyback on that and use prio of 0 as a wildcard for requests of RTM_DELTFILTER. For notifying tc netlink monitoring users (e.g. libnl uses this for caching), there are two options, that is, sending individual tfilter_notify() notifications for each tcf_proto, or sending a single one indicating wildcard removal. I tried both and there are pros and cons for each, eventually I decided for sending individual tfilter_notify(), so that user space can support this seamlessly and there won't be a mess of changing each and every application to make sure expectations from the kernel won't break when they don't understand single notification. Since linear chains don't really scale, I expect only a handful of classifiers to be attached at max for a given parent anyway. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11bpf: reject wrong sized filters earlierDaniel Borkmann1-8/+15
Add a bpf_check_basics_ok() and reject filters that are of invalid size much earlier, so we don't do any useless work such as invoking bpf_prog_alloc(). Currently, rejection happens in bpf_check_classic() only, but it's really unnecessarily late and they should be rejected at earliest point. While at it, also clean up one bpf_prog_size() to make it consistent with the remaining invocations. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11bpf: enforce recursion limit on redirectsDaniel Borkmann2-25/+36
Respect the stack's xmit_recursion limit for calls into dev_queue_xmit(). Currently, they are not handeled by the limiter when attached to clsact's egress parent, for example, and a buggy program redirecting it to the same device again could run into stack overflow eventually. It would be good if we could notify an admin to give him a chance to react. We reuse xmit_recursion instead of having one private to eBPF, so that the stack's current recursion depth will be taken into account as well. Follow-up to commit 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper") and 27b29f63058d ("bpf: add bpf_redirect() helper"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11openvswitch: Add packet truncation support.William Tu5-17/+67
The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to truncate packets. A 'max_len' is added for setting up the maximum packet size, and a 'cutlen' field is to record the number of bytes to trim the packet when the packet is outputting to a port, or when the packet is sent to userspace. Signed-off-by: William Tu <u9012063@gmail.com> Cc: Pravin Shelar <pshelar@nicira.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>