summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2018-08-05Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+0
Lots of overlapping changes, mostly trivial in nature. The mlxsw conflict was resolving using the example resolution at: https://github.com/jpirko/linux_mlxsw/blob/combined_queue/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05ethtool: Remove trailing semicolon for static inlineFlorian Fainelli1-2/+2
Android's header sanitization tool chokes on static inline functions having a trailing semicolon, leading to an incorrectly parsed header file. While the tool should obviously be fixed, also fix the header files for the two affected functions: ethtool_get_flow_spec_ring() and ethtool_get_flow_spec_ring_vf(). Fixes: 8cf6f497de40 ("ethtool: Add helper routines to pass vf to rx_flow_spec") Reporetd-by: Blair Prescott <blair.prescott@broadcom.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-04include/net/bond_3ad: Simplify the code by using the ARRAY_SIZEzhong jiang1-1/+1
We prefer to ARRAY_SIZE rather than the open code to calculate size. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-03rxrpc: Push iov_iter up from rxrpc_kernel_recv_data() to callerDavid Howells1-1/+1
Push iov_iter up from rxrpc_kernel_recv_data() to its caller to allow non-contiguous iovs to be passed down, thereby permitting file reading to be simplified in the AFS filesystem in a future patch. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-03l2tp: ignore L2TP_ATTR_MTUGuillaume Nault1-1/+1
This attribute's handling is broken. It can only be used when creating Ethernet pseudo-wires, in which case its value can be used as the initial MTU for the l2tpeth device. However, when handling update requests, L2TP_ATTR_MTU only modifies session->mtu. This value is never propagated to the l2tpeth device. Dump requests also return the value of session->mtu, which is not synchronised anymore with the device MTU. The same problem occurs if the device MTU is properly updated using the generic IFLA_MTU attribute. In this case, session->mtu is not updated, and L2TP_ATTR_MTU will report an invalid value again when dumping the session. It does not seem worthwhile to complexify l2tp_eth.c to synchronise session->mtu with the device MTU. Even the ip-l2tp manpage advises to use 'ip link' to initialise the MTU of l2tpeth devices (iproute2 does not handle L2TP_ATTR_MTU at all anyway). So let's just ignore it entirely. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02Merge tag 'pci-v4.18-fixes-5' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Fix integer overflow in new mobiveil driver (Dan Carpenter) - Fix race during NVMe removal/rescan (Hari Vyas) * tag 'pci-v4.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Fix is_added/is_busmaster race condition PCI: mobiveil: Avoid integer overflow in IB_WIN_SIZE
2018-08-02Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller8-1/+44
The BTF conflicts were simple overlapping changes. The virtio_net conflict was an overlap of a fix of statistics counter, happening alongisde a move over to a bonafide statistics structure rather than counting value on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: don't declare IPv6 non-local bind helper if CONFIG_IPV6 undefinedVincent Bernat1-7/+7
Fixes: 83ba4645152d ("net: add helpers checking if socket can be bound to nonlocal address") Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01mm: do not initialize TLB stack vma's with vma_init()Linus Torvalds1-0/+3
Commit 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack and data segments") tried to initialize various left-over ad-hoc vma's "properly", but actually made things worse for the temporary vma's used for TLB flushing. vma_init() doesn't actually initialize all of the vma, just a few fields, so doing something like - struct vm_area_struct vma = { .vm_mm = tlb->mm, }; + struct vm_area_struct vma; + + vma_init(&vma, tlb->mm); was actually very bad: instead of having a nicely initialized vma with every field but "vm_mm" zeroed, you'd have an entirely uninitialized vma with only a couple of fields initialized. And they weren't even fields that the code in question mostly cared about. The flush_tlb_range() function takes a "struct vma" rather than a "struct mm_struct", because a few architectures actually care about what kind of range it is - being able to only do an ITLB flush if it's a range that doesn't have data accesses enabled, for example. And all the normal users already have the vma for doing the range invalidation. But a few people want to call flush_tlb_range() with a range they just made up, so they also end up using a made-up vma. x86 just has a special "flush_tlb_mm_range()" function for this, but other architectures (arm and ia64) do the "use fake vma" thing instead, and thus got caught up in the vma_init() changes. At the same time, the TLB flushing code really doesn't care about most other fields in the vma, so vma_init() is just unnecessary and pointless. This fixes things by having an explicit "this is just an initializer for the TLB flush" initializer macro, which is used by the arm/arm64/ia64 people who mis-use this interface with just a dummy vma. Fixes: 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack and data segments") Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01Merge tag 'rxrpc-next-20180801' of ↵David S. Miller1-36/+93
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Development Here are some patches that add some more tracepoints to AF_RXRPC and fix some issues therein. The most significant points are: (1) Display the call timeout information in /proc/net/rxrpc/calls. (2) Save the call's debug_id in the rxrpc_channel struct so that it can be used in traces after the rxrpc_call struct has been destroyed. (3) Increase the size of the kAFS Rx window from 32 to 63 to be about the same as the Auristor server. (4) Propose the terminal ACK for a client call after it has received all its data to be transmitted after a short interval so that it will get transmitted if not first superseded by a new call on the same channel. (5) Flush ACKs during the data reception if we detect that we've run out of data.[*] (6) Trace successful packet transmission and softirq to process context socket notification. [*] Note that on a uncontended gigabit network, rxrpc runs in to trouble with ACK packets getting batched together (up to ~32 at a time) somewhere between the IP transmit queue on the client and the ethernet receive queue on the server. I can see the kernel afs filesystem client and Auristor userspace server stalling occasionally on a 512MB single read. Sticking tracepoints in the network driver at either end seems to show that, although the ACK transmissions made by the client are reasonably spaced timewise, the received ACKs come in batches from the network card on the server. I'm not sure what, if anything, can be done about this. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: sched: make tcf_chain_{get,put}() staticJiri Pirko1-3/+0
These are no longer used outside of cls_api.c so make them static. Move tcf_chain_flush() to avoid fwd declaration of tcf_chain_put(). Signed-off-by: Jiri Pirko <jiri@mellanox.com> v1->v2: - new patch Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add stat of data packet reordering eventsWei Wang2-2/+4
Introduce a new TCP stats to record the number of reordering events seen and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Application can use this stats to track the frequency of the reordering events in addition to the existing reordering stats which tracks the magnitude of the latest reordering event. Note: this new stats tracks reordering events triggered by ACKs, which could often be fewer than the actual number of packets being delivered out-of-order. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add dsack blocks received statsWei Wang2-0/+5
Introduce a new TCP stat to record the number of DSACK blocks received (RFC4989 tcpEStatsStackDSACKDups) and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add data bytes retransmitted statsWei Wang2-0/+5
Introduce a new TCP stat to record the number of bytes retransmitted (RFC4898 tcpEStatsPerfOctetsRetrans) and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01tcp: add data bytes sent statsWei Wang2-1/+6
Introduce a new TCP stat to record the number of bytes sent (RFC4898 tcpEStatsPerfHCDataOctetsOut) and expose it in both tcp_info (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS). Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: ipv4: Notify about changes to ip_forward_update_priorityPetr Machata1-0/+1
Drivers may make offloading decision based on whether ip_forward_update_priority is enabled or not. Therefore distribute netevent notifications to give them a chance to react to a change. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: ipv4: Control SKB reprioritization after forwardingPetr Machata1-0/+1
After IPv4 packets are forwarded, the priority of the corresponding SKB is updated according to the TOS field of IPv4 header. This overrides any prioritization done earlier by e.g. an skbedit action or ingress-qos-map defined at a vlan device. Such overriding may not always be desirable. Even if the packet ends up being routed, which implies this is an L3 network node, an administrator may wish to preserve whatever prioritization was done earlier on in the pipeline. Therefore introduce a sysctl that controls this behavior. Keep the default value at 1 to maintain backward-compatible behavior. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: add helpers checking if socket can be bound to nonlocal addressVincent Bernat2-0/+15
The construction "net->ipv4.sysctl_ip_nonlocal_bind || inet->freebind || inet->transparent" is present three times and its IPv6 counterpart is also present three times. We introduce two small helpers to characterize these tests uniformly. Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01rxrpc: Trace socket notificationDavid Howells1-0/+20
Trace notifications from the softirq side of the socket to the process-context side. Signed-off-by: David Howells <dhowells@redhat.com>
2018-08-01rxrpc: Fix ACK proposal tracepoint David Howells1-1/+1
Fix the ACK proposal tracepoint outcomes list by making the one that's an empty string not an empty string - which gets rendered as a hex number string instead. Signed-off-by: David Howells <dhowells@redhat.com>
2018-08-01rxrpc: Trace packet transmissionDavid Howells1-35/+72
Trace successful packet transmission (kernel_sendmsg() succeeded, that is) in AF_RXRPC. We can share the enum that defines the transmission points with the trace_rxrpc_tx_fail() tracepoint, so rename its constants to be applicable to both. Also, save the internal call->debug_id in the rxrpc_channel struct so that it can be used in retransmission trace lines. Signed-off-by: David Howells <dhowells@redhat.com>
2018-07-31net: remove bogus RCU annotations on socket.wqChristoph Hellwig2-2/+2
We never use RCU protection for it, just a lot of cargo-cult rcu_deference_protects calls. Note that we do keep the kfree_rcu call for it, as the references through struct sock are RCU protected and thus might require a grace period before freeing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31xsk: don't allow umem replace at stack levelJakub Kicinski1-3/+4
Currently drivers have to check if they already have a umem installed for a given queue and return an error if so. Make better use of XDP_QUERY_XSK_UMEM and move this functionality to the core. We need to keep rtnl across the calls now. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31net: update real_num_rx_queues even when !CONFIG_SYSFSJakub Kicinski1-1/+2
We used to depend on real_num_rx_queues as a upper bound for sanity checks. For AF_XDP socket validation it's useful if the check behaves the same regardless of CONFIG_SYSFS setting. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31PCI: Fix is_added/is_busmaster race conditionHari Vyas1-1/+0
When a PCI device is detected, pdev->is_added is set to 1 and proc and sysfs entries are created. When the device is removed, pdev->is_added is checked for one and then device is detached with clearing of proc and sys entries and at end, pdev->is_added is set to 0. is_added and is_busmaster are bit fields in pci_dev structure sharing same memory location. A strange issue was observed with multiple removal and rescan of a PCIe NVMe device using sysfs commands where is_added flag was observed as zero instead of one while removing device and proc,sys entries are not cleared. This causes issue in later device addition with warning message "proc_dir_entry" already registered. Debugging revealed a race condition between the PCI core setting the is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue setting the is_busmaster bit in pci_set_master(). As these fields are not handled atomically, that clears the is_added bit. Move the is_added bit to a separate private flag variable and use atomic functions to set and retrieve the device addition state. This avoids the race because is_added no longer shares a memory location with is_busmaster. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283 Signed-off-by: Hari Vyas <hari.vyas@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: - AMD IBS data corruptor fix (uncovered by UBSAN) - an Intel PEBS entry unwind error fix - a HW-tracing crash fix - a MAINTAINERS update" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix crash when using HW tracing kernel filters perf/x86/intel: Fix unwind errors from PEBS entries (mk-II) MAINTAINERS: Add Naveen N. Rao as kprobes co-maintainer perf/x86/amd/ibs: Don't access non-started event
2018-07-30Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "A paravirt UP-patching fix, and an I2C MUX driver lockdep warning fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/pvqspinlock/x86: Use LOCK_PREFIX in __pv_queued_spin_unlock() assembly code i2c/mux, locking/core: Annotate the nested rt_mutex usage locking/rtmutex: Allow specifying a subclass for nested locking
2018-07-30net/tc: introduce TC_ACT_REINSERT.Paolo Abeni2-0/+31
This is similar TC_ACT_REDIRECT, but with a slightly different semantic: - on ingress the mirred skbs are passed to the target device network stack without any additional check not scrubbing. - the rcu-protected stats provided via the tcf_result struct are updated on error conditions. This new tcfa_action value is not exposed to the user-space and can be used only internally by clsact. v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce a new action type instead v2 -> v3: - rename the new action value TC_ACT_REINJECT, update the helper accordingly - take care of uncloned reinjected packets in XDP generic hook v3 -> v4: - renamed again the new action value (JiriP) v4 -> v5: - fix build error with !NET_CLS_ACT (kbuild bot) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30tc/act: remove unneeded RCU lock in action callbackPaolo Abeni2-1/+3
Each lockless action currently does its own RCU locking in ->act(). This allows using plain RCU accessor, even if the context is really RCU BH. This change drops the per action RCU lock, replace the accessors with the _bh variant, cleans up a bit the surrounding code and documents the RCU status in the relevant header. No functional nor performance change is intended. The goal of this patch is clarifying that the RCU critical section used by the tc actions extends up to the classifier's caller. v1 -> v2: - preserve rcu lock in act_bpf: it's needed by eBPF helpers, as pointed out by Daniel v3 -> v4: - fixed some typos in the commit message (JiriP) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net/sched: user-space can't set unknown tcfa_action valuesPaolo Abeni1-2/+4
Currently, when initializing an action, the user-space can specify and use arbitrary values for the tcfa_action field. If the value is unknown by the kernel, is implicitly threaded as TC_ACT_UNSPEC. This change explicitly checks for unknown values at action creation time, and explicitly convert them to TC_ACT_UNSPEC. No functional changes are introduced, but this will allow introducing tcfa_action values not exposed to user-space in a later patch. Note: we can't use the above to hide TC_ACT_REDIRECT from user-space, as the latter is already part of uAPI. v3 -> v4: - use an helper to check for action validity (JiriP) - emit an extack for invalid actions (JiriP) v4 -> v5: - keep messages on a single line, drop net_warn (Marcelo) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net: remove sock_poll_busy_flagChristoph Hellwig1-6/+0
Fold it into the only caller to make the code simpler and easier to read. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net: remove sock_poll_busy_loopChristoph Hellwig1-9/+0
There is no point in hiding this logic in a helper. Also remove the useless events != 0 check and only busy loop once we know we actually have a poll method. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net: don not detour through struct sock to find the poll waitqueueChristoph Hellwig1-3/+2
For any open socket file descriptor sock->sk->sk_wq->wait will always point to sock->wq->wait. That means we can do the shorter dereference and removal a NULL check and don't have to not worry about any RCU protection. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net: simplify sock_poll_waitChristoph Hellwig1-5/+6
The wait_address argument is always directly derived from the filp argument, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29Merge tag 'mlx5e-updates-2018-07-27' of ↵David S. Miller2-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-07-27 (Vxlan updates) This series from Gal and Saeed provides updates to mlx5 vxlan implementation. Gal, started with three cleanups to reflect the actual hardware vxlan state - reflect 4789 UDP port default addition to software database - check maximum number of vxlan UDP ports - cleanup an unused member in vxlan work Then Gal provides performance optimization by replacing the vxlan radix tree with a hash table. Measuring mlx5e_vxlan_lookup_port execution time: Radix Tree Hash Table --------------- ------------ ------------ Single Stream 161 ns 79 ns (51% improvement) Multi Stream 259 ns 136 ns (47% improvement) Measuring UDP stream packet rate, single fully utilized TX core: Radix Tree: 498,300 PPS Hash Table: 555,468 PPS (11% improvement) Next, from Saeed, vxlan refactoring to allow sharing the vxlan table between different mlx5 netdevice instances like PF and VF representors, this is done by making mlx5 vxlan interface more generic and decoupling it from PF netdevice structures and logic, then moving it into mlx5 core as a low level interface so it can be used by VF representors, which is illustrated in the last patch of the serious. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29net: report invalid mtu value via netlink extackStephen Hemminger1-0/+2
If an invalid MTU value is set through rtnetlink return extra error information instead of putting message in kernel log. For other cases where there is no visible API, keep the error report in the log. Example: # ip li set dev enp12s0 mtu 10000 Error: mtu greater than device maximum. # ifconfig enp12s0 mtu 10000 SIOCSIFMTU: Invalid argument # dmesg | tail -1 [ 2047.795467] enp12s0: mtu greater than device maximum Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29net: report min and max mtu network device settingsStephen Hemminger1-0/+2
Report the minimum and maximum MTU allowed on a device via netlink so that it can be displayed by tools like ip link. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29net: dcb: add DSCP to comment about priority selector typesJakub Kicinski1-1/+2
Commit ee2059819450 ("net/dcb: Add dscp to priority selector type") added a define for the new DSCP selector type created by IEEE 802.1Qcd, but missed the comment enumerating all selector types. Update the comment. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29route: add support for directed broadcast forwardingXin Long3-0/+3
This patch implements the feature described in rfc1812#section-5.3.5.2 and rfc2644. It allows the router to forward directed broadcast when sysctl bc_forwarding is enabled. Note that this feature could be done by iptables -j TEE, but it would cause some problems: - target TEE's gateway param has to be set with a specific address, and it's not flexible especially when the route wants forward all directed broadcasts. - this duplicates the directed broadcasts so this may cause side effects to applications. Besides, to keep consistent with other os router like BSD, it's also necessary to implement it in the route rx path. Note that route cache needs to be flushed when bc_forwarding is changed. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29Merge tag 'linux-can-next-for-4.19-20180727' of ↵David S. Miller2-2/+7
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2018-01-16 this is a pull request for net-next/master consisting of 38 patches. Dan Murphy's patch fixes the path to a file in the comment of the CAN Error Message Frame Mask structure. A patch by Colin Ian King fixes a typo in the cc770 driver. The next patch is by me an sorts the Kconfigand Makefile entries of the CAN-USB driver subdir alphabetically. The patch by Jakob Unterwurzacher adds support for the UCAN USB-CAN adapter. YueHaibing's patch replaces a open coded skb_put()+memset() by skb_put_zero() in the CAN-dev infrastructure. Zhu Yi provides a patch to enable multi-queue CAN devices. Three patches by Luc Van Oostenryck fix the return value of several driver's xmit function, I contribute a patch for the a fourth driver. Fabio Estevam's patch switches the flexcan driver to SPDX identifier. Two patches by Jia-Ju Bai replace the mdelay() by a usleep_range() in the sja1000 drivers. The next 6 patches are by Anssi Hannula and refactor the xilinx CAN driver and add support for the xilinx CAN FD core. A patch by Gustavo A. R. Silva adds fallthrough annotation to the peak_usb driver. 5 patches by Stephane Grosjean for the peak CANFD driver do some cleanups and provide more improvements for further firmware releases. The remaining 13 patches are by Jimmy Assarsson and the first clean up the kvaser_usb driver, so that the later patches add support for the Kvaser USB hydra family. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-28net/mlx5e: Vxlan, move vxlan logic to core driverSaeed Mahameed1-0/+2
Move vxlan logic and objects to mlx5 core dirver. Since it going to be used from different mlx5 interfaces. e.g. mlx5e PF NIC netdev and mlx5e E-Switch representors. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2018-07-27net/mlx5e: Vxlan, check maximum number of UDP portsGal Pressman1-1/+3
The NIC has a limited number of offloaded VXLAN UDP ports (usually 4). Instead of letting the firmware fail when trying to add more ports than it can handle, let the driver check it on its own. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-27l2tp: drop ->mru from struct l2tp_sessionGuillaume Nault1-1/+1
This field is not used. Treat PPPIOC*MRU the same way as PPPIOC*FLAGS: "get" requests return 0, while "set" requests vadidate the user supplied pointer but discard its value. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27l2tp: ignore L2TP_ATTR_VLAN_ID netlink attributeGuillaume Nault1-2/+2
The value of this attribute is never used. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27l2tp: ignore L2TP_ATTR_DATA_SEQ netlink attributeGuillaume Nault1-3/+4
The value of this attribute is never used. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27net: dcb: Add priority-to-DSCP map gettersPetr Machata1-0/+13
On ingress, a network device such as a switch assigns to packets priority based on various criteria. Common options include interpreting PCP and DSCP fields according to user configuration. When a packet egresses the switch, a reverse process may rewrite PCP and/or DSCP values according to packet priority. The following three functions support a) obtaining a DSCP-to-priority map or vice versa, and b) finding default-priority entries in APP database. The DCB subsystem supports for APP entries a very generous M:N mapping between priorities and protocol identifiers. Understandably, several (say) DSCP values can map to the same priority. But this asymmetry holds the other way around as well--one priority can map to several DSCP values. For this reason, the following functions operate in terms of bitmaps, with ones in positions that match some APP entry. - dcb_ieee_getapp_dscp_prio_mask_map() to compute for a given netdevice a map of DSCP-to-priority-mask, which gives for each DSCP value a bitmap of priorities related to that DSCP value by APP, along the lines of dcb_ieee_getapp_mask(). - dcb_ieee_getapp_prio_dscp_mask_map() similarly to compute for a given netdevice a map from priorities to a bitmap of DSCPs. - dcb_ieee_getapp_default_prio_mask() which finds all default-priority rules for a given port in APP database, and returns a mask of priorities allowed by these default-priority rules. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27Merge tag 'for-linus-20180727' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+14
Pull block fixes from Jens Axboe: "Bigger than usual at this time, mostly due to the O_DIRECT corruption issue and the fact that I was on vacation last week. This contains: - NVMe pull request with two fixes for the FC code, and two target fixes (Christoph) - a DIF bio reset iteration fix (Greg Edwards) - two nbd reply and requeue fixes (Josef) - SCSI timeout fixup (Keith) - a small series that fixes an issue with bio_iov_iter_get_pages(), which ended up causing corruption for larger sized O_DIRECT writes that ended up racing with buffered writes (Martin Wilck)" * tag 'for-linus-20180727' of git://git.kernel.dk/linux-block: block: reset bi_iter.bi_done after splitting bio block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs blkdev: __blkdev_direct_IO_simple: fix leak in error case block: bio_iov_iter_get_pages: fix size of last iovec nvmet: only check for filebacking on -ENOTBLK nvmet: fixup crash on NULL device path scsi: set timed out out mq requests to complete blk-mq: export setting request completion state nvme: if_ready checks to fail io to deleting controller nvmet-fc: fix target sgl list on large transfers nbd: handle unexpected replies better nbd: don't requeue the same request twice.
2018-07-27Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-1/+16
Merge misc fixes from Andrew Morton: "11 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kvm, mm: account shadow page tables to kmemcg zswap: re-check zswap_is_full() after do zswap_shrink() include/linux/eventfd.h: include linux/errno.h mm: fix vma_is_anonymous() false-positives mm: use vma_init() to initialize VMAs on stack and data segments mm: introduce vma_init() mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL ipc/sem.c: prevent queue.status tearing in semop mm: disallow mappings that conflict for devm_memremap_pages() kasan: only select SLUB_DEBUG with SYSFS=y delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
2018-07-27Merge tag 'trace-v4.18-rc6' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various fixes to the tracing infrastructure: - Fix double free when the reg() call fails in event_trigger_callback() - Fix anomoly of snapshot causing tracing_on flag to change - Add selftest to test snapshot and tracing_on affecting each other - Fix setting of tracepoint flag on error that prevents probes from being deleted. - Fix another possible double free that is similar to event_trigger_callback() - Quiet a gcc warning of a false positive unused variable - Fix crash of partial exposed task->comm to trace events" * tag 'trace-v4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: kthread, tracing: Don't expose half-written comm when creating kthreads tracing: Quiet gcc warning about maybe unused link variable tracing: Fix possible double free in event_enable_trigger_func() tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure selftests/ftrace: Add snapshot and tracing_on test case ring_buffer: tracing: Inherit the tracing setting to next ring buffer tracing: Fix double free of event_trigger_data
2018-07-27net: sched: don't dump chains only held by actionsJiri Pirko2-0/+4
In case a chain is empty and not explicitly created by a user, such chain should not exist. The only exception is if there is an action "goto chain" pointing to it. In that case, don't show the chain in the dump. Track the chain references held by actions and use them to find out if a chain should or should not be shown in chain dump. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>