summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-06-27s390/qeth: move cast type selection into fill_header()Julian Wiedmann4-33/+26
The cast type currently gets selected in .ndo_start_xmit, and is then piped through several layers until it's stored into the HW header. Push the selection down into qeth_l?_fill_header() to (1) reduce the number of xmit-wide parameters, and (2) merge the two route validation checks into just one. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27s390/qeth: extract helper for route validationJulian Wiedmann2-28/+34
As follow-up to commit 0cd6783d3c7d ("s390/qeth: check dst entry before use"), consolidate the dst_check() logic into a single helper and add a wrapper around the cast type selection. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27s390/qeth: consolidate skb RX processing in L3 driverJulian Wiedmann1-18/+12
Use napi_gro_receive() to pass up all types of packets that a L3 device may receive. 1) For proper L2 packets received by the IQD sniffer, this is the obvious thing to do. 2) For af_iucv (which doesn't provide a GRO assist), the GRO code will transparently fall back to netif_receive_skb(). So there's no need to special-case this traffic in our code. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27s390/qeth: consolidate pm codeJulian Wiedmann4-80/+19
De-duplicate the pm callback implementations from the two sub-drivers, replacing them with core helpers that delegate to the .set_online and .set_offline callbacks. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27s390/qeth: streamline SNMP cmd codeJulian Wiedmann1-31/+18
Apply some cleanups to qeth_snmp_command() and its callback: 1. when accessing the user data, use the proper struct instead of hard-coded offsets. Also copy the request data straight into the allocated cmd, skipping the extra memdup_user() to a tmp buffer. 2. capping the request length is no longer needed, the same check gets applied at a base level in qeth_alloc_cmd(). 3. clean up some duplicated (and misindented) trace statements. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27s390/qeth: remove static cmd buffer infrastructureJulian Wiedmann4-244/+59
Now that all cmds are dynamically allocated, the code for static cmd buffers can go away entirely. Resulting in a nice reduction of code/data size & complexity, while removing the risk that qeth_clear_cmd_buffers() releases cmds that are still in-flight. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27s390/qeth: dynamically allocate MPC cmdsJulian Wiedmann1-17/+19
The base MPC cmds are the last remaining user of the static cmd buffers. Port them over to use dynamic allocation, and stop backing the write channel's cmd buffers with pages. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27s390/qeth: dynamically allocate vnicc cmdsJulian Wiedmann2-74/+62
The VNICC code is somewhat quirky in that it defers the whole cmd setup to a common helper qeth_l2_vnicc_request(). Some of the cmd specifics are then passed in via parameter, while others are simply hard-coded. Split the whole machinery up into the usual format: one helper that allocates the cmd & fills in the common fields, while all the cmd originators take care of their sub-cmd type specific work. This makes it much easier to calculate the cmd's precise length, and reduces code complexity. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27s390/qeth: dynamically allocate diag cmdsJulian Wiedmann4-11/+30
Add a new wrapper that allocates DIAG cmds of the right size, and fills in the common fields. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27s390/qeth: dynamically allocate various cmds with sub-typesJulian Wiedmann5-83/+78
This patch converts the adapter, assist and bridgeport cmd paths to dynamic allocation. Most of the work is about re-organizing the cmd headers, calculating the correct cmd length, and filling in the right value in the sub-cmd's length field. Since we now also set the correct length for cmds that are not reflected by a fixed struct (ie SNMP), we can remove the work-around from qeth_snmp_command(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27s390/qeth: clarify parameter for simple assist cmdsJulian Wiedmann4-23/+26
For code that uses qeth_send_simple_setassparms_prot(), we currently can't differentiate whether the cmd should contain (1) no parameter, or (2) a 4-byte parameter with value 0. At the moment this doesn't cause any trouble. But when using dynamically allocated cmds, we need to know whether to allocate & transmit an additional 4 bytes of zeroes. So instead of the raw parameter value, pass a parameter pointer (or NULL) to qeth_send_simple_setassparms_prot(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27s390/qeth: dynamically allocate simple IPA cmdsJulian Wiedmann5-18/+56
This patch reduces the usage of the write channel's static cmd buffers, by dynamically allocating all simple IPA cmds (eg. STARTLAN, SETVMAC). It also converts the OSN path. Doing so requires some changes to how we calculate the cmd length. Currently when building IPA cmds, we're quite generous in how much data we send down to the device (basically the size of the biggest cmd we know). This is no real concern at the moment, since the static cmd buffers are backed with zeroed pages. But for dynamic allocations, the exact length matters. So this patch also adds the needed length calculations to each cmd path. Commands that have multiple subtypes (eg. SETADP) of differing length will be converted with follow-up patches. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27net/smc: common release code for non-accepted socketsUrsula Braun1-41/+32
There are common steps when releasing an accepted or unaccepted socket. Move this code into a common routine. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27Merge branch 'net-ipv4-fix-circular-list-infinite-loop'David S. Miller2-1/+22
Florian Westphal says: ==================== net: ipv4: fix circular-list infinite loop Tariq and Ran reported a regression caused by net-next commit 2638eb8b50cf ("net: ipv4: provide __rcu annotation for ifa_list"). This happens when net.ipv4.conf.$dev.promote_secondaries sysctl is enabled -- we can arrange for ifa->next to point at ifa, so next process that tries to walk the list loops forever. Fix this and extend rtnetlink.sh with a small test case for this. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27selftests: rtnetlink: add small test case with 'promote_secondaries' enabledFlorian Westphal1-0/+20
This exercises the 'promote_secondaries' code path. Without previous fix, this triggers infinite loop/soft lockup: ifconfig process spinning at 100%, never to return. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27net: ipv4: fix infinite loop on secondary addr promotionFlorian Westphal1-1/+2
secondary address promotion causes infinite loop -- it arranges for ifa->ifa_next to point back to itself. Problem is that 'prev_prom' and 'last_prim' might point at the same entry, so 'last_sec' pointer must be obtained after prev_prom->next update. Fixes: 2638eb8b50cf ("net: ipv4: provide __rcu annotation for ifa_list") Reported-by: Ran Rozenstein <ranro@mellanox.com> Reported-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27net: ethtool: Allow parsing ETHER_FLOW types when using flow_ruleMaxime Chevallier1-0/+24
When parsing an ethtool_rx_flow_spec, users can specify an ethernet flow which could contain matches based on the ethernet header, such as the MAC address, the VLAN tag or the ethertype. ETHER_FLOW uses the src and dst ethernet addresses, along with the ethertype as keys. Matches based on the vlan tag are also possible, but they are specified using the special FLOW_EXT flag. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Acked-by: Pablo Neira Ayuso <pablo@gnumonks.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27proc: remove useless d_is_dir() checkChristian Brauner1-2/+1
Remove the d_is_dir() check from tgid_pidfd_to_pid(). It is pointless since you should never get &proc_tgid_base_operations for f_op on a non-directory. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-27copy_process(): don't use ksys_close() on cleanupsAl Viro1-28/+18
anon_inode_getfd() should be used *ONLY* in situations when we are guaranteed to be past the last failure point (including copying the descriptor number to userland, at that). And ksys_close() should not be used for cleanups at all. anon_inode_getfile() is there for all nontrivial cases like that. Just use that... Fixes: b3e583825266 ("clone: add CLONE_PIDFD") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-27af_packet: Block execution of tasks waiting for transmit to complete in ↵Neil Horman2-3/+18
AF_PACKET When an application is run that: a) Sets its scheduler to be SCHED_FIFO and b) Opens a memory mapped AF_PACKET socket, and sends frames with the MSG_DONTWAIT flag cleared, its possible for the application to hang forever in the kernel. This occurs because when waiting, the code in tpacket_snd calls schedule, which under normal circumstances allows other tasks to run, including ksoftirqd, which in some cases is responsible for freeing the transmitted skb (which in AF_PACKET calls a destructor that flips the status bit of the transmitted frame back to available, allowing the transmitting task to complete). However, when the calling application is SCHED_FIFO, its priority is such that the schedule call immediately places the task back on the cpu, preventing ksoftirqd from freeing the skb, which in turn prevents the transmitting task from detecting that the transmission is complete. We can fix this by converting the schedule call to a completion mechanism. By using a completion queue, we force the calling task, when it detects there are no more frames to send, to schedule itself off the cpu until such time as the last transmitted skb is freed, allowing forward progress to be made. Tested by myself and the reporter, with good results Change Notes: V1->V2: Enhance the sleep logic to support being interruptible and allowing for honoring to SK_SNDTIMEO (Willem de Bruijn) V2->V3: Rearrage the point at which we wait for the completion queue, to avoid needing to check for ph/skb being null at the end of the loop. Also move the complete call to the skb destructor to avoid needing to modify __packet_set_status. Also gate calling complete on packet_read_pending returning zero to avoid multiple calls to complete. (Willem de Bruijn) Move timeo computation within loop, to re-fetch the socket timeout since we also use the timeo variable to record the return code from the wait_for_complete call (Neil Horman) V3->V4: Willem has requested that the control flow be restored to the previous state. Doing so lets us eliminate the need for the po->wait_on_complete flag variable, and lets us get rid of the packet_next_frame function, but introduces another complexity. Specifically, but using the packet pending count, we can, if an applications calls sendmsg multiple times with MSG_DONTWAIT set, each set of transmitted frames, when complete, will cause tpacket_destruct_skb to issue a complete call, for which there will never be a wait_on_completion call. This imbalance will lead to any future call to wait_for_completion here to return early, when the frames they sent may not have completed. To correct this, we need to re-init the completion queue on every call to tpacket_snd before we enter the loop so as to ensure we wait properly for the frames we send in this iteration. Change the timeout and interrupted gotos to out_put rather than out_status so that we don't try to free a non-existant skb Clean up some extra newlines (Willem de Bruijn) Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27sctp: change to hold sk after auth shkey is created successfullyXin Long1-4/+4
Now in sctp_endpoint_init(), it holds the sk then creates auth shkey. But when the creation fails, it doesn't release the sk, which causes a sk defcnf leak, Here to fix it by only holding the sk when auth shkey is created successfully. Fixes: a29a5bd4f5c3 ("[SCTP]: Implement SCTP-AUTH initializations.") Reported-by: syzbot+afabda3890cc2f765041@syzkaller.appspotmail.com Reported-by: syzbot+276ca1c77a19977c0130@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27Merge branch 'macb-build-fixes'David S. Miller1-5/+5
Palmer Dabbelt says: ==================== net: macb: Fix compilation on systems without COMMON_CLK, v2 Our patch to add support for the FU540-C000 broke compilation on at least powerpc allyesconfig, which was found as part of the linux-next build regression tests. This must have somehow slipped through the cracks, as the patch has been reverted in linux-next for a while now. This patch applies on top of the offending commit, which is the only one I've even tried it on as I'm not sure how this subsystem makes it to Linus. This patch set fixes the issue by adding a dependency of COMMON_CLK to the MACB Kconfig entry, which avoids the build failure by disabling MACB on systems where it wouldn't compile. All known users of MACB have COMMON_CLK, so this shouldn't cause any issues. This is a significantly simpler approach than disabling just the FU540-C000 support. I've also included a second patch to indicate this is a driver for a Cadence device that was originally written by an engineer at Atmel. The only relation is that I stumbled across it when writing the first patch. Changes since v1 <20190624061603.1704-1-palmer@sifive.com>: * Disable MACB on systems without COMMON_CLK, instead of just disabling the FU540-C000 support on these systems. * Update the commit message to reflect the driver was written by Atmel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27net: macb: Kconfig: Rename Atmel to CadencePalmer Dabbelt1-3/+3
The help text makes it look like NET_VENDOR_CADENCE enables support for Atmel devices, when in reality it's a driver written by Atmel that supports Cadence devices. This may confuse users that have this device on a non-Atmel SoC. The fix is just s/Atmel/Cadence/, but I did go and re-wrap the Kconfig help text as that change caused it to go over 80 characters. Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27net: macb: Kconfig: Make MACB depend on COMMON_CLKPalmer Dabbelt1-2/+2
commit c218ad559020 ("macb: Add support for SiFive FU540-C000") added a dependency on the common clock framework to the macb driver, but didn't express that dependency in Kconfig. As a result macb now fails to compile on systems without COMMON_CLK, which specifically causes a build failure on powerpc allyesconfig. This patch adds the dependency, which results in the macb driver no longer being selectable on systems without the common clock framework. All known systems that have this device already support the common clock framework, so this should not cause trouble for any uses. Supporting both the FU540-C000 and systems without COMMON_CLK is quite ugly. I've build tested this on powerpc allyesconfig and RISC-V defconfig (which selects MACB), but I have not even booted the resulting kernels. Fixes: c218ad559020 ("macb: Add support for SiFive FU540-C000") Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26Merge branch 'ipv6-fix-neighbour-resolution-with-raw-socket'David S. Miller6-8/+9
Nicolas Dichtel says: ==================== ipv6: fix neighbour resolution with raw socket The first patch prepares the fix, it constify rt6_nexthop(). The detail of the bug is explained in the second patch. v1 -> v2: - fix compilation warnings - split the initial patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26ipv6: fix neighbour resolution with raw socketNicolas Dichtel1-1/+2
The scenario is the following: the user uses a raw socket to send an ipv6 packet, destinated to a not-connected network, and specify a connected nh. Here is the corresponding python script to reproduce this scenario: import socket IPPROTO_RAW = 255 send_s = socket.socket(socket.AF_INET6, socket.SOCK_RAW, IPPROTO_RAW) # scapy # p = IPv6(src='fd00:100::1', dst='fd00:200::fa')/ICMPv6EchoRequest() # str(p) req = b'`\x00\x00\x00\x00\x08:@\xfd\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xfd\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xfa\x80\x00\x81\xc0\x00\x00\x00\x00' send_s.sendto(req, ('fd00:175::2', 0, 0, 0)) fd00:175::/64 is a connected route and fd00:200::fa is not a connected host. With this scenario, the kernel starts by sending a NS to resolve fd00:175::2. When it receives the NA, it flushes its queue and try to send the initial packet. But instead of sending it, it sends another NS to resolve fd00:200::fa, which obvioulsy fails, thus the packet is dropped. If the user sends again the packet, it now uses the right nh (fd00:175::2). The problem is that ip6_dst_lookup_neigh() uses the rt6i_gateway, which is :: because the associated route is a connected route, thus it uses the dst addr of the packet. Let's use rt6_nexthop() to choose the right nh. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26ipv6: constify rt6_nexthop()Nicolas Dichtel5-7/+7
There is no functional change in this patch, it only prepares the next one. rt6_nexthop() will be used by ip6_dst_lookup_neigh(), which uses const variables. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26net: dsa: microchip: Use gpiod_set_value_cansleep()Marek Vasut1-3/+3
Replace gpiod_set_value() with gpiod_set_value_cansleep(), as the switch reset GPIO can be connected to e.g. I2C GPIO expander and it is perfectly fine for the kernel to sleep for a bit in ksz_switch_register(). Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26Allow 0.0.0.0/8 as a valid address rangeDave Taht1-1/+1
The longstanding prohibition against using 0.0.0.0/8 dates back to two issues with the early internet. There was an interoperability problem with BSD 4.2 in 1984, fixed in BSD 4.3 in 1986. BSD 4.2 has long since been retired. Secondly, addresses of the form 0.x.y.z were initially defined only as a source address in an ICMP datagram, indicating "node number x.y.z on this IPv4 network", by nodes that know their address on their local network, but do not yet know their network prefix, in RFC0792 (page 19). This usage of 0.x.y.z was later repealed in RFC1122 (section 3.2.2.7), because the original ICMP-based mechanism for learning the network prefix was unworkable on many networks such as Ethernet (which have longer addresses that would not fit into the 24 "node number" bits). Modern networks use reverse ARP (RFC0903) or BOOTP (RFC0951) or DHCP (RFC2131) to find their full 32-bit address and CIDR netmask (and other parameters such as default gateways). 0.x.y.z has had 16,777,215 addresses in 0.0.0.0/8 space left unused and reserved for future use, since 1989. This patch allows for these 16m new IPv4 addresses to appear within a box or on the wire. Layer 2 switches don't care. 0.0.0.0/32 is still prohibited, of course. Signed-off-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: John Gilmore <gnu@toad.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26net: aquantia: fix vlans not working over bridged networkDmitry Bogdanov4-8/+23
In configuration of vlan over bridge over aquantia device it was found that vlan tagged traffic is dropped on chip. The reason is that bridge device enables promisc mode, but in atlantic chip vlan filters will still apply. So we have to corellate promisc settings with vlan configuration. The solution is to track in a separate state variable the need of vlan forced promisc. And also consider generic promisc configuration when doing vlan filter config. Fixes: 7975d2aff5af ("net: aquantia: add support of rx-vlan-filter offload") Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26rtnetlink: skip metrics loop for dst_default_metricsDavid Ahern1-0/+4
dst_default_metrics has all of the metrics initialized to 0, so nothing will be added to the skb in rtnetlink_put_metrics. Avoid the loop if metrics is from dst_default_metrics. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26Merge branch 'skfp-cleanups'David S. Miller2-228/+6
Puranjay Mohan says: ==================== net: fddi: skfp: Use PCI generic definitions instead of private duplicates This patch series removes the private duplicates of PCI definitions in favour of generic definitions defined in pci_regs.h. This driver only uses some of the generic PCI definitons, which are included from pci_regs.h and thier private versions are removed from skfbi.h with all other private defines. The skfbi.h defines PCI_REV_ID and other private defines with different names, these are renamed to Generic PCI names to make them compatible with defines in pci_regs.h. All unused defines are removed from skfbi.h. Changes in v5: Removed unused PCI definitions which were left in v4 Changes in v4: Removed unused PCI definitions which were left in v3 Changes in v3: Renamed all local PCI definitions to Generic names. Corrected coding style mistakes. Changes in v2: Converted individual patches to a series. Made sure that individual patches build correctly ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26net: fddi: skfp: Remove unused private PCI definitionsPuranjay Mohan1-224/+1
Remove unused private PCI definitions from skfbi.h because generic PCI symbols are already included from pci_regs.h. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26net: fddi: skfp: Include generic PCI definitionsPuranjay Mohan1-0/+1
Include the uapi/linux/pci_regs.h header file which contains the generic PCI defines. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26net: fddi: skfp: Rename local PCI defines to match generic PCI definesPuranjay Mohan2-3/+3
Rename the PCI_REV_ID and other local defines to Generic PCI define names in skfbi.h and drvfbi.c to make it compatible with the pci_regs.h. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26ipv4: reset rt_iif for recirculated mcast/bcast out pktsStephen Suryaputra3-0/+46
Multicast or broadcast egress packets have rt_iif set to the oif. These packets might be recirculated back as input and lookup to the raw sockets may fail because they are bound to the incoming interface (skb_iif). If rt_iif is not zero, during the lookup, inet_iif() function returns rt_iif instead of skb_iif. Hence, the lookup fails. v2: Make it non vrf specific (David Ahern). Reword the changelog to reflect it. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26net/mlx5: E-Switch, Enable vport metadata matching if firmware supports itJianbo Liu1-0/+23
As the ingress ACL rules save vhca id and vport number to packet's metadata REG_C_0, and the metadata matching for the rules in both fast path and slow path are all added, enable this feature if supported. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26RDMA/mlx5: Add vport metadata matching for IB representorsJianbo Liu1-9/+36
If vport metadata matching is enabled in eswitch, the rule created must be changed to match on the metadata, instead of source port. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: E-Switch, Add match on vport metadata for rule in slow pathJianbo Liu1-35/+107
In slow path, packet that not matched by any offloaded rule is forwarded to eswitch vport manager for further processing. Add matching on metadata for peer miss rules in FDB, and rules which forward packet to correct representor in esw manager NIC_RX table. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: E-Switch, Pass metadata from FDB to eswitch managerJianbo Liu1-0/+64
In order to do matching on metadata in slow path when demuxing traffic to representors, explicitly enable the feature that allows HW to pass metadata REG_C_0 from FDB to eswitch manager NIC_RX table. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: E-Switch, Add query and modify esw vport context functionsJianbo Liu2-0/+29
Add esw vport query and modify functions, and exposing them is needed for enabling or disabling registers passed as metatdata to vport NIC_RX table in slow path. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: E-Switch, Add match on vport metadata for rule in fast pathJianbo Liu1-34/+51
If FW's capabilities and configurations meet the requirement of vport metadata matching, this feature will be used. As the information about vport number and vhca_id related to packet is already stored to its metadata register, which is used as an indicator for perticular vport, now we can change to match on this metadata for all the offloading rules in fast path. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5e: Specifying known origin of packets matching the flowJianbo Liu3-0/+6
In vport metadata matching, source port number is replaced by metadata. While FW has no idea about what it is in the metadata, a syndrome will happen. Specify a known origin to avoid the syndrome. However, there is no functional change because ANY_VPORT (0) is filled in flow_source, the same default value as before, as a pre-step towards metadata matching for fast path. There are two other values can be filled in flow_source. When setting 0x1, packet matching this rule is from uplink, while 0x2 is for packet from other local vports. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ↵Jianbo Liu4-36/+172
ingress ACLs When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport as it is not represented by single VHCA only in this case. So we change to match on metadata instead of source vport. To do that, a rule is created in all vports and uplink ingress ACLs, to save the source vport number and vhca id in the packet's metadata in order to match on it later. The metadata register used is the first of the 32-bit type C registers. It can be used for matching and header modify operations. The higher 16 bits of this register are for vhca id, and the lower 16 ones is for vport number. This change is not for dual-port RoCE only. If HW and FW allow, the vport metadata matching is enabled by default. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Add flow context for flow tagJianbo Liu11-45/+71
Refactor the flow data structures, add new flow_context and move flow_tag into it, as flow_tag doesn't belong to the rule action. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Introduce a helper API to check VF vportParav Pandit2-0/+8
Introduce a helper API mlx5_eswitch_is_vf_vport() to check if a given vport_num belongs to VF or not. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Support allocating modify header context from ingress ACLJianbo Liu1-0/+4
That modify header action can be then attached to a steering rule in the ingress ACL. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Get vport ACL namespace by vport indexJianbo Liu1-2/+2
The ingress and egress ACL root namespaces are created per vport and stored into arrays. However, the vport number is not the same as the index. Passing the array index, instead of vport number, to get the correct ingress and egress acl namespace. Fixes: 9b93ab981e3b ("net/mlx5: Separate ingress/egress namespaces for each vport") Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Introduce vport metadata matching bits and enum constantsJianbo Liu1-7/+49
When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport. So we replace the match on source port with the match on metadata that was configured in ingress ACL, and that metadata will be passed further also to the NIC RX table of the eswitch manager. Introduce vport metadata matching bits and enum constants as a pre-step towards metadata matching. o metadata type C registers in the misc parameters 2 fields. o esw_uplink_ingress_acl bit in esw cap. If it set, the device supports ingress ACL for the uplink vport. o fdb_to_vport_reg_* bits in flow table cap and esw vport context, to support propagating the metadata to the nic rx through the loopback path. o flow_source in flow context, to indicate the known origin of packets. o enum constants, to support the above bits. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26team: Always enable vlan tx offloadYueHaibing1-1/+1
We should rather have vlan_tci filled all the way down to the transmitting netdevice and let it do the hw/sw vlan implementation. Suggested-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>