summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2015-02-10net/bonding: Fix potential bad memory access during bonding eventsMoni Shoua1-1/+1
When queuing work to send the NETDEV_BONDING_INFO netdev event, it's possible that when the work is executed, the pointer to the slave becomes invalid. This can happen if between queuing the event and the execution of the work, the net-device was un-ensvaled and re-enslaved. Fix that by queuing a work with the data of the slave instead of the slave structure. Fixes: 69e6113343cf ('net/bonding: Notify state change on slaves') Reported-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-10tipc: convert legacy nl link stat to nl compatRichard Alpe1-0/+10
Add functionality for safely appending string data to a TLV without keeping write count in the caller. Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-10tipc: convert legacy nl bearer enable/disable to nl compatRichard Alpe1-0/+5
Introduce a framework for transcoding legacy nl action into actions (.doit) calls from the new nl API. This is done by converting the incoming TLV data into netlink data with nested netlink attributes. Unfortunately due to the randomness of the legacy API we can't do this generically so each legacy netlink command requires a specific transcoding recipe. In this case for bearer enable and bearer disable. Convert TIPC_CMD_ENABLE_BEARER and TIPC_CMD_DISABLE_BEARER into doit compat calls. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-10tipc: convert legacy nl bearer dump to nl compatRichard Alpe1-0/+5
Introduce a framework for dumping netlink data from the new netlink API and formatting it to the old legacy API format. This is done by looping the dump data and calling a format handler for each entity, in this case a bearer. We dump until either all data is dumped or we reach the limited buffer size of the legacy API. Remember, the legacy API doesn't scale. In this commit we convert TIPC_CMD_GET_BEARER_NAMES to use the compat layer. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09Merge tag 'wireless-drivers-next-for-davem-2015-02-07' of ↵David S. Miller4-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Major changes: iwlwifi: * more work for new devices (4165 / 8260) * cleanups / improvemnts in rate control * fixes for TDLS * major statistics work from Johannes - more to come * improvements for the fw error dump infrastructure * usual amount of small fixes here and there (scan, D0i3 etc...) * add support for beamforming * enable stuck queue detection for iwlmvm * a few fixes for EBS scan * fixes for various failure paths * improvements for TDLS Offchannel wil6210: * performance tuning * some AP features brcm80211: * rework some code in SDIO part of the brcmfmac driver related to suspend/resume that were found doing stress testing * in PCIe part scheduling of worker thread needed to be relaxed * minor fixes and exposing firmware revision information to user-space, ie. ethtool. mwifiex: * enhancements for change virtual interface handling * remove coupling between netdev and FW supported interface combination, now conversion from any type of supported interface types to any other type is possible * DFS support in AP mode ath9k: * fix calibration issues on some boards * Wake-on-WLAN improvements ath10k: * add support for qca6174 hardware * enable RX batching to reduce CPU load Conflicts: drivers/net/wireless/rtlwifi/pci.c Conflict resolution is to get rid of the 'end' label and keep the rest. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09net:rfs: adjust table size checkingEric Dumazet1-1/+2
Make sure root user does not try something stupid. Also make sure mask field in struct rps_sock_flow_table does not share a cache line with the potentially often dirtied flow table. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 567e4b79731c ("net: rfs: add hash collision detection") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09net: rfs: add hash collision detectionEric Dumazet2-40/+18
Receive Flow Steering is a nice solution but suffers from hash collisions when a mix of connected and unconnected traffic is received on the host, when flow hash table is populated. Also, clearing flow in inet_release() makes RFS not very good for short lived flows, as many packets can follow close(). (FIN , ACK packets, ...) This patch extends the information stored into global hash table to not only include cpu number, but upper part of the hash value. I use a 32bit value, and dynamically split it in two parts. For host with less than 64 possible cpus, this gives 6 bits for the cpu number, and 26 (32-6) bits for the upper part of the hash. Since hash bucket selection use low order bits of the hash, we have a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big enough. If the hash found in flow table does not match, we fallback to RPS (if it is enabled for the rxqueue). This means that a packet for an non connected flow can avoid the IPI through a unrelated/victim CPU. This also means we no longer have to clear the table at socket close time, and this helps short lived flows performance. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09net: fix a typo in skb_checksum_validate_zero_checkSabrina Dubroca1-1/+1
Remove trailing underscore. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08tcp: mitigate ACK loops for connections as tcp_timewait_sockNeal Cardwell1-0/+4
Ensure that in state FIN_WAIT2 or TIME_WAIT, where the connection is represented by a tcp_timewait_sock, we rate limit dupacks in response to incoming packets (a) with TCP timestamps that fail PAWS checks, or (b) with sequence numbers that are out of the acceptable window. We do not send a dupack in response to out-of-window packets if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a dupack in response to an out-of-window packet. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08tcp: mitigate ACK loops for connections as tcp_sockNeal Cardwell1-0/+1
Ensure that in state ESTABLISHED, where the connection is represented by a tcp_sock, we rate limit dupacks in response to incoming packets (a) with TCP timestamps that fail PAWS checks, or (b) with sequence numbers or ACK numbers that are out of the acceptable window. We do not send a dupack in response to out-of-window packets if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a dupack in response to an out-of-window packet. There is already a similar (although global) rate-limiting mechanism for "challenge ACKs". When deciding whether to send a challence ACK, we first consult the new per-connection rate limit, and then the global rate limit. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08tcp: mitigate ACK loops for connections as tcp_request_sockNeal Cardwell2-0/+2
In the SYN_RECV state, where the TCP connection is represented by tcp_request_sock, we now rate-limit SYNACKs in response to a client's retransmitted SYNs: we do not send a SYNACK in response to client SYN if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a SYNACK in response to a client's retransmitted SYN. This allows the vast majority of legitimate client connections to proceed unimpeded, even for the most aggressive platforms, iOS and MacOS, which actually retransmit SYNs 1-second intervals for several times in a row. They use SYN RTO timeouts following the progression: 1,1,1,1,1,2,4,8,16,32. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacksNeal Cardwell2-0/+38
Helpers for mitigating ACK loops by rate-limiting dupacks sent in response to incoming out-of-window packets. This patch includes: - rate-limiting logic - sysctl to control how often we allow dupacks to out-of-window packets - SNMP counter for cases where we rate-limited our dupack sending The rate-limiting logic in this patch decides to not send dupacks in response to out-of-window segments if (a) they are SYNs or pure ACKs and (b) the remote endpoint is sending them faster than the configured rate limit. We rate-limit our responses rather than blocking them entirely or resetting the connection, because legitimate connections can rely on dupacks in response to some out-of-window segments. For example, zero window probes are typically sent with a sequence number that is below the current window, and ZWPs thus expect to thus elicit a dupack in response. We allow dupacks in response to TCP segments with data, because these may be spurious retransmissions for which the remote endpoint wants to receive DSACKs. This is safe because segments with data can't realistically be part of ACK loops, which by their nature consist of each side sending pure/data-less ACKs to each other. The dupack interval is controlled by a new sysctl knob, tcp_invalid_ratelimit, given in milliseconds, in case an administrator needs to dial this upward in the face of a high-rate DoS attack. The name and units are chosen to be analogous to the existing analogous knob for ICMP, icmp_ratelimit. The default value for tcp_invalid_ratelimit is 500ms, which allows at most one such dupack per 500ms. This is chosen to be 2x faster than the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule 2.4). We allow the extra 2x factor because network delay variations can cause packets sent at 1 second intervals to be compressed and arrive much closer. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08net: openvswitch: Support masked set actions.Jarno Rajahalme1-1/+21
OVS userspace already probes the openvswitch kernel module for OVS_ACTION_ATTR_SET_MASKED support. This patch adds the kernel module implementation of masked set actions. The existing set action sets many fields at once. When only a subset of the IP header fields, for example, should be modified, all the IP fields need to be exact matched so that the other field values can be copied to the set action. A masked set action allows modification of an arbitrary subset of the supported header bits without requiring the rest to be matched. Masked set action is now supported for all writeable key types, except for the tunnel key. The set tunnel action is an exception as any input tunnel info is cleared before action processing starts, so there is no tunnel info to mask. The kernel module converts all (non-tunnel) set actions to masked set actions. This makes action processing more uniform, and results in less branching and duplicating the action processing code. When returning actions to userspace, the fully masked set actions are converted back to normal set actions. We use a kernel internal action code to be able to tell the userspace provided and converted masked set actions apart. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08Merge tag 'nfc-next-3.20-2' of ↵David S. Miller4-7/+255
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next NFC: 3.20 second pull request This is the second NFC pull request for 3.20. It brings: - NCI NFCEE (NFC Execution Environment, typically an embedded or external secure element) discovery and enabling/disabling support. In order to communicate with an NFCEE, we also added NCI's logical connections support to the NCI stack. - HCI over NCI protocol support. Some secure elements only understand HCI and thus we need to send them HCI frames when they're part of an NCI chipset. - NFC_EVT_TRANSACTION userspace API addition. Whenever an application running on a secure element needs to notify its host counterpart, we send an NFC_EVENT_SE_TRANSACTION event to userspace through the NFC netlink socket. - Secure element and HCI transaction event support for the st21nfcb chipset. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller15-44/+124
Conflicts: drivers/net/vxlan.c drivers/vhost/net.c include/linux/if_vlan.h net/core/dev.c The net/core/dev.c conflict was the overlap of one commit marking an existing function static whilst another was adding a new function. In the include/linux/if_vlan.h case, the type used for a local variable was changed in 'net', whereas the function got rewritten to fix a stacked vlan bug in 'net-next'. In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next' overlapped with an endainness fix for VHOST 1.0 in 'net'. In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter in 'net-next' whereas in 'net' there was a bug fix to pass in the correct network namespace pointer in calls to this function. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05MMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds9-31/+66
Pull networking fixes from David Miller: 1) Stretch ACKs can kill performance with Reno and CUBIC congestion control, largely due to LRO and GRO. Fix from Neal Cardwell. 2) Fix userland breakage because we accidently emit zero length netlink messages from the bridging code. From Roopa Prabhu. 3) Carry handling in generic csum_tcpudp_nofold is broken, fix from Karl Beldan. 4) Remove bogus dev_set_net() calls from CAIF driver, from Nicolas Dichtel. 5) Make sure PPP deflation never returns a length greater then the output buffer, otherwise we overflow and trigger skb_over_panic(). Fix from Florian Westphal. 6) COSA driver needs VIRT_TO_BUS Kconfig dependencies, from Arnd Bergmann. 7) Don't increase route cached MTU on datagram too big ICMPs. From Li Wei. 8) Fix error path leaks in nf_tables, from Pablo Neira Ayuso. 9) Fix bitmask handling regression in netlink that broke things like acpi userland tools. From Pablo Neira Ayuso. 10) Wrong header pointer passed to param_type2af() in SCTP code, from Saran Maruti Ramanara. 11) Stacked vlans not handled correctly by vlan_get_protocol(), from Toshiaki Makita. 12) Add missing DMA memory barrier to xgene driver, from Iyappan Subramanian. 13) Fix crash in rate estimators, from Eric Dumazet. 14) We've been adding various workarounds, one after another, for the change which added the per-net tcp_sock. It was meant to reduce socket contention but added lots of problems. Reduce this instead to a proper per-cpu socket and that rids us of all the daemons. From Eric Dumazet. 15) Fix memory corruption and OOPS in mlx4 driver, from Jack Morgenstein. 16) When we disabled UFO in the virtio_net device, it introduces some serious performance regressions. The orignal problem was IPV6 fragment ID generation, so fix that properly instead. From Vlad Yasevich. 17) sr9700 driver build breaks on xtensa because it defines macros with the same name as those used by the arch code. Use more unique names. From Chen Gang. 18) Fix endianness in new virio 1.0 mode of the vhost net driver, from Michael S Tsirkin. 19) Several sysctls were setting the maxlen attribute incorrectly, from Sasha Levin. 20) Don't accept an FQ scheduler quantum of zero, that leads to crashes. From Kenneth Klette Jonassen. 21) Fix dumping of non-existing actions in the packet scheduler classifier. From Ignacy Gawędzki. 22) Return the write work_done value when doing TX work in the qlcnic driver. 23) ip6gre_err accesses the info field with the wrong endianness, from Sabrina Dubroca. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits) sit: fix some __be16/u16 mismatches ipv6: fix sparse errors in ip6_make_flowlabel() net: remove some sparse warnings flow_keys: n_proto type should be __be16 ip6_gre: fix endianness errors in ip6gre_err qlcnic: Fix NAPI poll routine for Tx completion amd-xgbe: Set RSS enablement based on hardware features amd-xgbe: Adjust for zero-based traffic class count cls_api.c: Fix dumping of non-existing actions' stats. pkt_sched: fq: avoid hang when quantum 0 net: rds: use correct size for max unacked packets and bytes vhost/net: fix up num_buffers endian-ness gianfar: correct the bad expression while writing bit-pattern net: usb: sr9700: Use 'SR_' prefix for the common register macros Revert "drivers/net: Disable UFO through virtio" Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets" ipv6: Select fragment id during UFO segmentation if not set. xen-netback: stop the guest rx thread after a fatal error net/mlx4_core: Fix kernel Oops (mem corruption) when working with more than 80 VFs isdn: off by one in connect_res() ...
2015-02-05ipv6: fix sparse errors in ip6_make_flowlabel()Eric Dumazet1-2/+2
include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types) include/net/ipv6.h:713:22: expected restricted __be32 [usertype] hash include/net/ipv6.h:713:22: got unsigned int include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer include/net/ipv6.h:719:22: warning: invalid assignment: ^= include/net/ipv6.h:719:22: left side has type restricted __be32 include/net/ipv6.h:719:22: right side has type unsigned int Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05flow_keys: n_proto type should be __be16Eric Dumazet1-3/+3
(struct flow_keys)->n_proto is in network order, use proper type for this. Fixes following sparse errors : net/core/flow_dissector.c:139:39: warning: incorrect type in assignment (different base types) net/core/flow_dissector.c:139:39: expected unsigned short [unsigned] [usertype] n_proto net/core/flow_dissector.c:139:39: got restricted __be16 [assigned] [usertype] proto net/core/flow_dissector.c:237:23: warning: incorrect type in assignment (different base types) net/core/flow_dissector.c:237:23: expected unsigned short [unsigned] [usertype] n_proto net/core/flow_dissector.c:237:23: got restricted __be16 [assigned] [usertype] proto Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: e0f31d849867 ("flow_keys: Record IP layer protocol in skb_flow_dissect()") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05pkt_sched: fq: better control of DDOS trafficEric Dumazet1-0/+2
FQ has a fast path for skb attached to a socket, as it does not have to compute a flow hash. But for other packets, FQ being non stochastic means that hosts exposed to random Internet traffic can allocate million of flows structure (104 bytes each) pretty easily. Not only host can OOM, but lookup in RB trees can take too much cpu and memory resources. This patch adds a new attribute, orphan_mask, that is adding possibility of having a stochastic hash for orphaned skb. Its default value is 1024 slots, to mimic SFQ behavior. Note: This does not apply to locally generated TCP traffic, and no locally generated traffic will share a flow structure with another perfect or stochastic flow. This patch also handles the specific case of SYNACK messages: They are attached to the listener socket, and therefore all map to a single hash bucket. If listener have set SO_MAX_PACING_RATE, hoping to have new accepted socket inherit this rate, SYNACK might be paced and even dropped. This is very similar to an internal patch Google have used more than one year. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05Merge branch 'for-davem' of ↵David S. Miller8-37/+18
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs More iov_iter work from Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05tcp: do not pace pure ack packetsEric Dumazet1-0/+15
When we added pacing to TCP, we decided to let sch_fq take care of actual pacing. All TCP had to do was to compute sk->pacing_rate using simple formula: sk->pacing_rate = 2 * cwnd * mss / rtt It works well for senders (bulk flows), but not very well for receivers or even RPC : cwnd on the receiver can be less than 10, rtt can be around 100ms, so we can end up pacing ACK packets, slowing down the sender. Really, only the sender should pace, according to its own logic. Instead of adding a new bit in skb, or call yet another flow dissection, we tweak skb->truesize to a small value (2), and we instruct sch_fq to use new helper and not pace pure ack. Note this also helps TCP small queue, as ack packets present in qdisc/NIC do not prevent sending a data packet (RPC workload) This helps to reduce tx completion overhead, ack packets can use regular sock_wfree() instead of tcp_wfree() which is a bit more expensive. This has no impact in the case packets are sent to loopback interface, as we do not coalesce ack packets (were we would detect skb->truesize lie) In case netem (with a delay) is used, skb_orphan_partial() also sets skb->truesize to 1. This patch is a combination of two patches we used for about one year at Google. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05rhashtable: Introduce rhashtable_walk_*Herbert Xu1-0/+35
Some existing rhashtable users get too intimate with it by walking the buckets directly. This prevents us from easily changing the internals of rhashtable. This patch adds the helpers rhashtable_walk_init/exit/start/next/stop which will replace these custom walkers. They are meant to be usable for both procfs seq_file walks as well as walking by a netlink dump. The iterator structure should fit inside a netlink dump cb structure, with at least one element to spare. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05net/mlx4_core: Port aggregation upper layer interfaceMoni Shoua2-0/+20
Supply interface functions to bond and unbond ports of a mlx4 internal interfaces. Example for such an interface is the one registered by the mlx4 IB driver under RoCE. There are 1. Functions to go in/out to/from bonded mode 2. Function to remap virtual ports to physical ports The bond_mutex prevents simultaneous access to data that keep status of the device in bonded mode. The upper mlx4 interface marks to the mlx4 core module that they want to be subject for such bonding by setting the MLX4_INTFF_BONDING flag. Interface which goes to/from bonded mode is re-created. The mlx4 Ethernet driver does not set this flag when registering the interface, the IB driver does. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05net/mlx4_core: Port aggregation low level interfaceMoni Shoua3-1/+17
Implement the hardware interface required for port aggregation. 1. Disable RX port check on receive - don't perform a validity check that matches to QP's port and the port where the packet is received. 2. Virtual to physical port remap - configure virtual to physical port mapping. Port remap capability for virtual functions. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05net/bonding: Notify state change on slavesMoni Shoua1-0/+12
Use notifier chain to dispatch an event upon a change in slave state. Event is dispatched with slave specific info. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05net/bonding: Move slave state changes to a helper functionMoni Shoua1-0/+5
Move slave state changes to a helper function, this is a pre-step for adding functionality of dispatching an event when this helper is called. This commit doesn't add new functionality. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05net/core: Add event for a change in slave stateMoni Shoua1-0/+15
Add event which provides an indication on a change in the state of a bonding slave. The event handler should cast the pointer to the appropriate type (struct netdev_bonding_info) in order to get the full info about the slave. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05Merge tag 'mac80211-next-for-davem-2015-02-03' of ↵David S. Miller4-13/+84
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Last round of updates for net-next: * revert a patch that caused a regression with mesh userspace (Bob) * fix a number of suspend/resume related races (from Emmanuel, Luca and myself - we'll look at backporting later) * add software implementations for new ciphers (Jouni) * add a new ACPI ID for Broadcom's rfkill (Mika) * allow using netns FD for wireless (Vadim) * some other cleanups (various) Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05Merge branch 'for-upstream' of ↵David S. Miller2-5/+10
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-02-03 Here's what's likely the last bluetooth-next pull request for 3.20. Notable changes include: - xHCI workaround + a new id for the ath3k driver - Several new ids for the btusb driver - Support for new Intel Bluetooth controllers - Minor cleanups to ieee802154 code - Nested sleep warning fix in socket accept() code path - Fixes for Out of Band pairing handling - Support for LE scan restarting for HCI_QUIRK_STRICT_DUPLICATE_FILTER - Improvements to data we expose through debugfs - Proper handling of Hardware Error HCI events Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05net: add skb functions to process remote checksum offloadTom Herbert2-0/+36
This patch adds skb_remcsum_process and skb_gro_remcsum_process to perform the appropriate adjustments to the skb when receiving remote checksum offload. Updated vxlan and gue to use these functions. Tested: Ran TCP_RR and TCP_STREAM netperf for VXLAN and GUE, did not see any change in performance. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05xps: fix xps for stacked devicesEric Dumazet1-2/+5
A typical qdisc setup is the following : bond0 : bonding device, using HTB hierarchy eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc XPS allows to spread packets on specific tx queues, based on the cpu doing the send. Problem is that dequeues from bond0 qdisc can happen on random cpus, due to the fact that qdisc_run() can dequeue a batch of packets. CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1 CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0 CPUB -> dequeue packet P1 from bond0 enqueue packet on eth1/eth2 CPUC -> dequeue packet P2 from bond0 enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0) get_xps_queue() then might select wrong queue for P1, since current cpu might be different than CPUA. P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping), if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock) Effect of this bug is TCP reorders, and more generally not optimal TX queue placement. (A victim bulk flow can be migrated to the wrong TX queue for a while) To fix this, we have to record sender cpu number the first time dev_queue_xmit() is called for one tx skb. We can union napi_id (used on receive path) and sender_cpu, granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for this union idea) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04NFC: nci: Move logical connection structure allocationChristophe Ricard1-0/+1
conn_info is currently allocated only after nfcee_discovery_ntf which is not generic enough for logical connection other than NFCEE. The corresponding conn_info is now created in nci_core_conn_create_rsp(). Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-04NFC: nci: Change credits field to credits_cntChristophe Ricard1-1/+1
For consistency sake change nci_core_conn_create_rsp structure credits field to credits_cnt. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-04NFC: nci: Support all destinations type when creating a connectionChristophe Ricard2-9/+13
The current implementation limits nci_core_conn_create_req() to only manage NCI_DESTINATION_NFCEE. Add new parameters to nci_core_conn_create() to support all destination types described in the NCI specification. Because there are some parameters with variable size dynamic buffer allocation is needed. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-04NFC: nci: Add reference to the RF logical connectionChristophe Ricard1-0/+1
The NCI_STATIC_RF_CONN_ID logical connection is the most used connection. Keeping it directly accessible in the nci_dev structure will simplify and optimize the access. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-04ipv6: Select fragment id during UFO segmentation if not set.Vlad Yasevich1-0/+3
If the IPv6 fragment id has not been set and we perform fragmentation due to UFO, select a new fragment id. We now consider a fragment id of 0 as unset and if id selection process returns 0 (after all the pertrubations), we set it to 0x80000000, thus giving us ample space not to create collisions with the next packet we may have to fragment. When doing UFO integrity checking, we also select the fragment id if it has not be set yet. This is stored into the skb_shinfo() thus allowing UFO to function correclty. This patch also removes duplicate fragment id generation code and moves ipv6_select_ident() into the header as it may be used during GSO. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04vhost: vhost_scsi_handle_vq() should just use copy_from_user()Al Viro1-2/+0
it has just verified that it asks no more than the length of the first segment of iovec. And with that the last user of stuff in lib/iovec.c is gone. RIP. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()Al Viro1-3/+0
Cc: Michael S. Tsirkin <mst@redhat.com> Cc: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec()Al Viro1-1/+0
Cc: Michael S. Tsirkin <mst@redhat.com> Cc: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04crypto: switch af_alg_make_sg() to iov_iterAl Viro1-2/+1
With that, all ->sendmsg() instances are converted to iov_iter primitives and are agnostic wrt the kind of iov_iter they are working with. So's the last remaining ->recvmsg() instance that wasn't kind-agnostic yet. All ->sendmsg() and ->recvmsg() advance ->msg_iter by the amount actually copied and none of them modifies the underlying iovec, etc. Cc: linux-crypto@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04net: bury net/core/iovec.c - nothing in there is used anymoreAl Viro1-7/+0
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04net: switch memcpy_fromiovec()/memcpy_fromiovecend() users to copy_from_iter()Al Viro2-4/+2
That takes care of the majority of ->sendmsg() instances - most of them via memcpy_to_msg() or assorted getfrag() callbacks. One place where we still keep memcpy_fromiovecend() is tipc - there we potentially read the same data over and over; separate patch, that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04ip: convert tcp_sendmsg() to iov_iter primitivesAl Viro1-10/+8
patch is actually smaller than it seems to be - most of it is unindenting the inner loop body in tcp_sendmsg() itself... the bit in tcp_input.c is going to get reverted very soon - that's what memcpy_from_msg() will become, but not in this commit; let's keep it reasonably contained... There's one potentially subtle change here: in case of short copy from userland, mainline tcp_send_syn_data() discards the skb it has allocated and falls back to normal path, where we'll send as much as possible after rereading the same data again. This patch trims SYN+data skb instead - that way we don't need to copy from the same place twice. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04ip: stash a pointer to msghdr in struct ping_fakehdrAl Viro1-1/+1
... instead of storing its ->mgs_iter.iov there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04rxrpc: switch rxrpc_send_data() to iov_iter primitivesAl Viro1-6/+5
Convert skb_add_data() to iov_iter; allows to get rid of the explicit messing with iovec in its only caller - skb_add_data() will keep advancing ->msg_iter for us, so there's no need to similate that manually. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04vmci: propagate msghdr all way down to __qp_memcpy_to_queue()Al Viro1-1/+1
Switch from passing msg->iov_iter.iov to passing msg itself Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-03net/mlx4_core: Fix kernel Oops (mem corruption) when working with more than ↵Jack Morgenstein1-1/+1
80 VFs Commit de966c592802 (net/mlx4_core: Support more than 64 VFs) was meant to allow up to 126 VFs. However, due to leaving MLX4_MFUNC_MAX too low, using more than 80 VFs resulted in memory corruptions (and Oopses) when more than 80 VFs were requested. In addition, the number of slaves was left too high. This commit fixes these issues. Fixes: de966c592802 ("net/mlx4_core: Support more than 64 VFs") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-0/+2
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Validate hooks for nf_tables NAT expressions, otherwise users can crash the kernel when using them from the wrong hook. We already got one user trapped on this when configuring masquerading. 2) Fix a BUG splat in nf_tables with CONFIG_DEBUG_PREEMPT=y. Reported by Andreas Schultz. 3) Avoid unnecessary reroute of traffic in the local input path in IPVS that triggers a crash in in xfrm. Reported by Florian Wiessner and fixes by Julian Anastasov. 4) Fix memory and module refcount leak from the error path of nf_tables_newchain(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03ipv6: introduce ipv6_make_skbVlad Yasevich1-0/+19
This commit is very similar to commit 1c32c5ad6fac8cee1a77449f5abf211e911ff830 Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Tue Mar 1 02:36:47 2011 +0000 inet: Add ip_make_skb and ip_finish_skb It adds IPv6 version of the helpers ip6_make_skb and ip6_finish_skb. The job of ip6_make_skb is to collect messages into an ipv6 packet and poplulate ipv6 eader. The job of ip6_finish_skb is to transmit the generated skb. Together they replicated the job of ip6_push_pending_frames() while also provide the capability to be called independently. This will be needed to add lockless UDP sendmsg support. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03ipv6: pull cork initialization into its own function.Vlad Yasevich1-5/+7
Pull IPv6 cork initialization into its own function that can be re-used. IPv6 specific cork data did not have an explicit data structure. This patch creats eone so that just ipv6 cork data can be as arguemts. Also, since IPv6 tries to save the flow label into inet_cork_full tructure, pass the full cork. Adjust ip6_cork_release() to take cork data structures. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>