summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-05-01tcp: Add clean acked data hookIlya Lesokhin3-0/+35
Called when a TCP segment is acknowledged. Could be used by application protocols who hold additional metadata associated with the stream data. This is required by TLS device offload to release metadata associated with acknowledged TLS records. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01Merge branch '40GbE' of ↵David S. Miller21-129/+401
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-04-30 This series contains updates to i40e and i40evf only. Jia-Ju Bai replaces an instance of GFP_ATOMIC to GFP_KERNEL, since i40evf is not in atomic context when i40evf_add_vlan() is called. Jake cleans up function header comments to ensure that the function parameter comments actually match the function parameters. Fixed a possible overflow error in the PTP clock code. Fixed warnings regarding restricted __be32 type usage. Mariusz fixes the reading of the LLDP configuration, which moves from using relative values to calculating the absolute address. Jakub adds a check for 10G LR mode for i40e. Paweł fixes an issue, where changing the MTU would turn on TSO, GSO and GRO. Alex fixes a couple of issues with the UDP tunnel filter configuration. First being that the tunnels did not have mutual exclusion in place to prevent a race condition between a user request to add/remove a port and an update. The second issue was we were deleting filters that were not associated with the actual filter we wanted to delete. Harshitha ensures that the queue map sent by the VF is taken into account when enabling/disabling queues in the VF VSI. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30Merge branch 'mlxsw-SPAN-Support-routes-pointing-at-bridges'David S. Miller10-36/+326
Ido Schimmel says: ==================== mlxsw: SPAN: Support routes pointing at bridges Petr says: When mirroring to a gretap or ip6gretap netdevice, the route that directs the encapsulated packets can reference a bridge. In that case, in the software model, the packet is switched. Thus when offloading mirroring like that, take into consideration FDB, STP, PVID configured at the bridge, and whether that VLAN ID should be tagged on egress. Patch #1 introduces functions to get bridge PVID, VLAN flags and to look up an FDB entry. Patches #2 and #3 refactor some existing code and introduce a new accessor function. With patches #4 and #5 mlxsw calls mlxsw_sp_span_respin() on switchdev events as well. There is no impact yet, because bridge as an underlay device is still not allowed. That is implemented in patch #6, which uses the new interfaces to figure out on which one port the mirroring should be configured, and whether the mirrored packets should be VLAN-tagged and how. Changes from v2 to v3: - Rename the suite of bridge accessor function to br_vlan_get_pvid(), br_vlan_get_info() and br_fdb_find_port(). The _get bit is to avoid clashing with an existing static function. Changes from v1 to v2: - Change the suite of bridge accessor functions to br_vlan_pvid_rtnl(), br_vlan_info_rtnl(), br_fdb_find_port_rtnl(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30mlxsw: spectrum_span: Allow bridge for gretap mirrorPetr Machata2-6/+90
When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the underlay address (i.e. the remote address of the tunnel) may be routed to a bridge. In that case, look up the resolved neighbor Ethernet address in that bridge's FDB. Then configure the offload to direct the mirrored traffic to that port, possibly with tagging. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30mlxsw: Respin SPAN on switchdev eventsPetr Machata1-4/+59
Changes to switchdev artifact can make a SPAN entry offloadable or unoffloadable. To that end: - Listen to SWITCHDEV_FDB_*_TO_BRIDGE notifications in addition to the *_TO_DEVICE ones, to catch whatever activity is sent to the bridge (likely by mlxsw itself). On each FDB notification, respin SPAN to reconcile it with the FDB changes. - Also respin on switchdev port attribute changes (which currently covers changes to STP state of ports) and port object additions and removals. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30mlxsw: spectrum: Register SPAN before switchdevPetr Machata1-12/+12
Since switchdev events can trigger SPAN respin, it is necessary that the data structures are available. Register SPAN first, with a commentary on what the dependencies are. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30mlxsw: spectrum_switchdev: Publish two functionsPetr Machata2-1/+51
Publish the existing function mlxsw_sp_bridge_port_find(), and add another service accessor mlxsw_sp_bridge_port_stp_state(). Publish both in a new file spectrum_switchdev.h. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30mlxsw: spectrum: Extract mlxsw_sp_stp_spms_state()Petr Machata2-13/+14
Instead of duplicating the decision regarding port forwarding state made by mlxsw_sp_port_vid_stp_set(), extract the decision-making into a new function and reuse. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30net: bridge: Publish bridge accessor functionsPetr Machata4-0/+100
Add a couple new functions to allow querying FDB and vlan settings of a bridge. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30i40e: use %pI4b instead of byte swapping before dev_errJacob Keller1-4/+2
Fix warnings regarding restricted __be32 type usage by strictly specifying the type of the ipv4 address being printed in the dev_err statement. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-30i40e/i40evf: take into account queue map from vf when handling queuesHarshitha Ramamurthy3-13/+99
The expectation of the ops VIRTCHNL_OP_ENABLE_QUEUES and VIRTCHNL_OP_DISABLE_QUEUES is that the queue map sent by the VF is taken into account when enabling/disabling queues in the VF VSI. This patch makes sure that happens. By breaking out the individual queue set up functions so that they can be called directly from the i40e_virtchnl_pf.c file, only the queues as specified by the queue bit map that accompanies the enable/disable queues ops will be handled. Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-30i40e: avoid overflow in i40e_ptp_adjfreq()Jacob Keller2-15/+28
When operating at 1GbE, the base incval for the PTP clock is so large that multiplying it by numbers close to the max_adj can overflow the u64. Rather than attempting to limit the max_adj to a value small enough to avoid overflow, instead calculate the incvalue adjustment based on the 40GbE incvalue, and then multiply that by the scaling factor for the link speed. This sacrifices a small amount of precision in the adjustment but we avoid erratic behavior of the clock due to the overflow caused if ppb is very near the maximum adjustment. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-30i40e: Fix multiple issues with UDP tunnel offload filter configurationAlexander Duyck2-12/+56
This fixes at least 2 issues I have found with the UDP tunnel filter configuration. The first issue is the fact that the tunnels didn't have any sort of mutual exclusion in place to prevent an update from racing with a user request to add/remove a port. As such you could request to add and remove a port before the port update code had a chance to respond which would result in a very confusing result. To address it I have added 2 changes. First I added the RTNL mutex wrapper around our updating of the pending, port, and filter_index bits. Second I added logic so that we cannot use a port that has a pending deletion since we need to free the space in hardware before we can allow software to reuse it. The second issue addressed is the fact that we were not recording the actual filter index provided to us by the admin queue. As a result we were deleting filters that were not associated with the actual filter we wanted to delete. To fix that I added a filter_index member to the UDP port tracking structure. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-30i40evf: Fix turning TSO, GSO and GRO on afterPaweł Jabłoński1-0/+18
This patch fixes the problem where each MTU change turns TSO, GSO and GRO on from off state. Now when TSO, GSO or GRO is turned off, MTU change does not turn them on. Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-30i40e: Add advertising 10G LR modeJakub Pawlak1-1/+3
The advertising 10G LR mode should be possible to set but in the function i40e_set_link_ksettings() check for this is missed. This patch adds check for 10000baseLR_Full flag for 10G modes. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-30ipv6: sr: extract the right key values for "seg6_make_flowlabel"Ahmed Abdelsalam1-1/+1
The seg6_make_flowlabel() is used by seg6_do_srh_encap() to compute the flowlabel from a given skb. It relies on skb_get_hash() which eventually calls __skb_flow_dissect() to extract the flow_keys struct values from the skb. In case of IPv4 traffic, calling seg6_make_flowlabel() after skb_push(), skb_reset_network_header(), and skb_mac_header_rebuild() will results in flow_keys struct of all key values set to zero. This patch calls seg6_make_flowlabel() before resetting the headers of skb to get the right key values. Extracted Key values are based on the type inner packet as follows: 1) IPv6 traffic: src_IP, dst_IP, L4 proto, and flowlabel of inner packet. 2) IPv4 traffic: src_IP, dst_IP, L4 proto, src_port, and dst_port 3) L2 traffic: depends on what kind of traffic carried into the L2 frame. IPv6 and IPv4 traffic works as discussed 1) and 2) Here a hex_dump of struct flow_keys for IPv4 and IPv6 traffic 10.100.1.100: 47302 > 30.0.0.2: 5001 00000000: 14 00 02 00 00 00 00 00 08 00 11 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 13 89 b8 c6 1e 00 00 02 00000020: 0a 64 01 64 fc00:a1:a > b2::2 00000000: 28 00 03 00 00 00 00 00 86 dd 11 00 99 f9 02 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 b2 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 02 fc 00 00 a1 00000030: 00 00 00 00 00 00 00 00 00 00 00 0a Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30i40e: fix reading LLDP configurationMariusz Stachura3-10/+99
Previous method for reading LLDP config was based on hard-coded offsets. It happened to work, because of structured architecture of the NVM memory. In the new approach, known as FLAT, we need to calculate the absolute address, instead of using relative values. Needed defines for memory location were added. Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-30i40e/i40evf: cleanup incorrect function doxygen commentsJacob Keller17-73/+95
Recent versions of the Linux kernel now warn about incorrect parameter definitions for function comments. Fix up several function comments to correctly reflect the current function arguments. This cleans up the warnings and helps ensure our documentation is accurate. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-30libcxgb,cxgb4: use __skb_put_zero to simplfy codeYueHaibing3-14/+7
use helper __skb_put_zero to replace the pattern of __skb_put() && memset() Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30erspan: auto detect truncated packets.William Tu2-0/+12
Currently the truncated bit is set only when the mirrored packet is larger than mtu. For certain cases, the packet might already been truncated before sending to the erspan tunnel. In this case, the patch detect whether the IP header's total length is larger than the actual skb->len. If true, this indicated that the mirrored packet is truncated and set the erspan truncate bit. I tested the patch using bpf_skb_change_tail helper function to shrink the packet size and send to erspan tunnel. Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30i40evf: Replace GFP_ATOMIC with GFP_KERNEL in i40evf_add_vlanJia-Ju Bai1-1/+1
i40evf_add_vlan() is never called in atomic context. i40evf_add_vlan() is only called by i40evf_vlan_rx_add_vid(), which is only set as ".ndo_vlan_rx_add_vid" in struct net_device_ops. ".ndo_vlan_rx_add_vid" is not called in atomic context. Despite never getting called from atomic context, i40evf_add_vlan() calls kzalloc() with GFP_ATOMIC, which does not sleep for allocation. GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL, which can sleep and improve the possibility of sucessful allocation. This is found by a static analysis tool named DCNS written by myself. And I also manually check it. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-30Merge branch 'r8169-further-improvements'David S. Miller1-142/+42
Heiner Kallweit says: ==================== r8169: further improvements w/o functional change This series aims at further improving and simplifying the code w/o any intended functional changes. Series was tested on: RTL8169sb, RTL8168d, RTL8168e-vl ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30r8169: move common initializations to tp->hw_startHeiner Kallweit1-55/+19
The chip-specific init code includes quite some calls which are identical for all chips. So move these calls to tp->hw_start(). In addition move rtl_set_rx_max_size() a little to make sure it's defined before it's used. Unfortunately the diff generated by git is a little bit hard to read. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30r8169: remove calls to rtl_set_rx_modeHeiner Kallweit1-6/+0
__dev_open() calls the ndo_set_rx_mode callback anyway, so we don't have to do it here too. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30r8169: simplify rtl_hw_start_8169Heiner Kallweit1-20/+2
Currently done: - if mac_version in (01, 02, 03, 04) RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); - if mac_version in (01, 02, 03, 04) rtl_set_rx_tx_config_registers(tp); - if mac_version not in (01, 02, 03, 04) RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); rtl_set_rx_tx_config_registers(tp); So we do exactly the same independent of chip version and can simplify the code. In addition remove the call to rtl_init_rxcfg(), it's called in rtl_init_one() already and the set bits are never touched later. rtl_init_8168/8101 don't include this call either. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30r8169: improve handling of CPCMD quirk maskHeiner Kallweit1-28/+7
Both quirk masks are the same, so we can merge them. The quirk mask includes most bits so it's actually easier to define a mask with the bits to keep. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30r8169: improve CPlusCmd handlingHeiner Kallweit1-25/+17
tp->cp_cmd is supposed to reflect the current value of the CplusCmd register. Several (quite old) changes however directly change this register w/o updating tp->cp_cmd. Also we have places in the code reading this register where we could use the cached value. In addition: - Properly initialize tp->cmd with the register value. - In rtl_hw_start_8169 remove one setting of PCIMulRW because it's set unconditionally anyway a few lines later. - In rtl_hw_start_8168 properly mask out the INTT bits before setting INTT_1. So far we rely on both bits being zero. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30r8169: replace magic number for INTT mask with a constantHeiner Kallweit1-2/+3
Use a proper constant for INTT bit mask. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30r8169: improve rtl8169_set_featuresHeiner Kallweit1-14/+4
__rtl8169_set_features is used in rtl8169_set_features only, so we can inline it. In addition: - Remove check (features ^ dev->features), __netdev_update_features check's already that requested features differ from current ones. - Don't mask out unsupported flags, there's no benefit in it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30r8169: remove unneeded call to __rtl8169_set_features in rtl_openHeiner Kallweit1-2/+0
RxChkSum and RxVlan aren't touched outside __rtl8169_set_features (except in probe), so they are always in sync with dev->features. And the RxConfig flags are set in rtl_set_rx_mode() which is called via dev_set_rx_mode() from __dev_open(). Therefore we can safely remove this call. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30liquidio: fix spelling mistake: "mac_tx_multi_collison" -> ↵Colin Ian King1-1/+1
"mac_tx_multi_collision" Trivial fix to spelling mistake in oct_stats_strings text Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30Merge branch 'liquidio-enhanced-ethtool-set-channels-feature'David S. Miller10-452/+528
Intiyaz Basha says: ==================== liquidio: enhanced ethtool --set-channels feature For the ethtool --set-channels feature, the liquidio driver currently accepts max combined value as the queue count configured during driver load time, where max combined count is the total count of input and output queues. This limitation is applicable only when SR-IOV is enabled, that is, when VFs are created for PF. If SR-IOV is not enabled, the driver can configure max supported (64) queues. This series of patches are for enhancing driver to accept max supported queues for ethtool --set-channels. Changes in V2: Only patch #6 was changed to fix these Sparse warnings reported by kbuild test robot: lio_ethtool.c:848:5: warning: symbol 'lio_23xx_reconfigure_queue_count' was not declared. Should it be static? lio_ethtool.c:877:22: warning: incorrect type in assignment (different base types) lio_ethtool.c:878:22: warning: incorrect type in assignment (different base types) lio_ethtool.c:879:22: warning: incorrect type in assignment (different base types) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30liquidio: enhanced ethtool --set-channels featureIntiyaz Basha10-58/+316
Enhancing driver to accept max supported queues for ethtool --set-channels Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30liquidio: Moved common function setup_glists to lio_core.cIntiyaz Basha4-163/+87
Moved common function setup_glists to lio_core.c and reamed it to lio_setup_glists Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30liquidio: Moved common definition octnic_gather to octeon_network.hIntiyaz Basha3-39/+21
Moving common definition octnic_gather to octeon_network.h Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30liquidio: Moved common function delete_glists to lio_core.cIntiyaz Basha4-91/+51
Moved common function delete_glists to lio_core.c and renamed it to lio_delete_glists Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30liquidio: Moved common function list_delete_head to octeon_network.hIntiyaz Basha3-43/+24
Moved common function list_delete_head to octeon_network.h and renamed it to lio_list_delete_head Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30liquidio: Moved common function if_cfg_callback to lio_core.cIntiyaz Basha4-67/+38
Moved common function if_cfg_callback to lio_core.c and renamed it to lio_if_cfg_callback. Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30net: core: Assert the size of netdev_featres_tFlorian Fainelli2-0/+7
We have about 53 netdev_features_t bits defined and counting, add a build time check to catch when an u64 type will not be enough and we will have to convert that to a bitmap. This is done in register_netdevice() for convenience. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30Merge branch 'net-cleanup-skb_tx_hash'David S. Miller6-33/+17
Alexander Duyck says: ==================== Clean up users of skb_tx_hash and __skb_tx_hash I am in the process of doing some work to try and enable macvlan Tx queue selection without using ndo_select_queue. As a part of that I will likely need to make changes to skb_tx_hash. As such this is a clean up or refactor of the two spots where he function has been used. In both cases it didn't really seem like the function was being used correctly so I have updated both code paths to not make use of the function. My current development environment doesn't have an mlx4 or OPA vnic available so the changes to those have been build tested only. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hashAlexander Duyck2-19/+4
I am dropping the export of __skb_tx_hash as after my patches nobody is using it outside of the net/core/dev.c file. In addition I am renaming and repurposing it to just be a static declaration of skb_tx_hash since that was the only user for it at this point. By doing this the compiler can inline it into __netdev_pick_tx as that will improve performance. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30mlx4: Don't bother using skb_tx_hash in mlx4_en_select_queueAlexander Duyck1-1/+1
The code in the fallback path has supported XDP in conjunction with the Tx traffic classification for TCs for over a year now. So instead of just calling skb_tx_hash for every packet we are better off using the fallback since that will record the Tx queue to the socket and then that can be used instead of having to recompute the hash every time. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30opa_vnic: Just use skb_get_hash instead of skb_tx_hashAlexander Duyck3-13/+12
This patch is meant to clean up how the opa_vnic is obtaining entropy from Tx packets. The code as it was written was claiming to get 16 bits of hash, but from what I can tell it was only ever actually getting 14 bits as it was limited to 0 - (2^15 - 1). It then was folding the result to get a 8 bit value for entropy. Instead of throwing away all that input I am cutting out the middle man and instead having the code call skb_get_hash directly and then folding the 32 bit value into a 8 bit value using a pair of shifts and XOR operations. Execution wise this new approach should provide more entropy and be faster since we are bypassing the reciprocal multiplication to reduce the 32b value to 16b and instead just using a shift/XOR combination. In addition we can drop the unneeded adapter value from the call to get the entropy since the netdev itself isn't even needed. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30Merge branch 'lan78xx-fixed-phy'David S. Miller2-32/+79
Raghuram Chary J says: ==================== lan78xx updates along with Fixed phy Support These series of patches handle few modifications in driver and adds support for fixed phy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30lan78xx: Modify error messagesRaghuram Chary J1-2/+2
Modify the error messages when phy registration fails. Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30lan78xx: Remove DRIVER_VERSION for lan78xx driverRaghuram Chary J1-2/+0
Remove driver version info from the lan78xx driver. Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30lan78xx: Lan7801 Support for Fixed PHYRaghuram Chary J2-28/+77
Adding Fixed PHY support to the lan78xx driver. Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30Merge branch 'tcp-mmap-rework-zerocopy-receive'David S. Miller5-118/+154
Eric Dumazet says: ==================== tcp: mmap: rework zerocopy receive syzbot reported a lockdep issue caused by tcp mmap() support. I implemented Andy Lutomirski nice suggestions to resolve the issue and increase scalability as well. First patch is adding a new getsockopt() operation and changes mmap() behavior. Second patch changes tcp_mmap reference program. v4: tcp mmap() support depends on CONFIG_MMU, as kbuild bot told us. v3: change TCP_ZEROCOPY_RECEIVE to be a getsockopt() option instead of setsockopt(), feedback from Ka-Cheon Poon v2: Added a missing page align of zc->length in tcp_zerocopy_receive() Properly clear zc->recv_skip_hint in case user request was completed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVEEric Dumazet1-28/+38
After prior kernel change, mmap() on TCP socket only reserves VMA. We have to use getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) to perform the transfert of pages from skbs in TCP receive queue into such VMA. struct tcp_zerocopy_receive { __u64 address; /* in: address of mapping */ __u32 length; /* in/out: number of bytes to map/mapped */ __u32 recv_skip_hint; /* out: amount of bytes to skip */ }; After a successful getsockopt(...TCP_ZEROCOPY_RECEIVE...), @length contains number of bytes that were mapped, and @recv_skip_hint contains number of bytes that should be read using conventional read()/recv()/recvmsg() system calls, to skip a sequence of bytes that can not be mapped, because not properly page aligned. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Andy Lutomirski <luto@kernel.org> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receiveEric Dumazet4-90/+116
When adding tcp mmap() implementation, I forgot that socket lock had to be taken before current->mm->mmap_sem. syzbot eventually caught the bug. Since we can not lock the socket in tcp mmap() handler we have to split the operation in two phases. 1) mmap() on a tcp socket simply reserves VMA space, and nothing else. This operation does not involve any TCP locking. 2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements the transfert of pages from skbs to one VMA. This operation only uses down_read(&current->mm->mmap_sem) after holding TCP lock, thus solving the lockdep issue. This new implementation was suggested by Andy Lutomirski with great details. Benefits are : - Better scalability, in case multiple threads reuse VMAS (without mmap()/munmap() calls) since mmap_sem wont be write locked. - Better error recovery. The previous mmap() model had to provide the expected size of the mapping. If for some reason one part could not be mapped (partial MSS), the whole operation had to be aborted. With the tcp_zerocopy_receive struct, kernel can report how many bytes were successfuly mapped, and how many bytes should be read to skip the problematic sequence. - No more memory allocation to hold an array of page pointers. 16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/ - skbs are freed while mmap_sem has been released Following patch makes the change in tcp_mmap tool to demonstrate one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...) Note that memcg might require additional changes. Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Suggested-by: Andy Lutomirski <luto@kernel.org> Cc: linux-mm@kvack.org Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>