summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2020-06-30net:qos: police action offloading parameter 'burst' change to the original valuePo Liu3-4/+32
Since 'tcfp_burst' with TICK factor, driver side always need to recover it to the original value, this patch moves the generic calculation and recover to the 'burst' original value before offloading to device driver. Signed-off-by: Po Liu <po.liu@nxp.com> Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-30Merge tag 'mlx5-tls-2020-06-26' of ↵David S. Miller6-10/+92
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-tls-2020-06-26 1) Improve hardware layouts and structure for kTLS support 2) Generalize ICOSQ (Internal Channel Operations Send Queue) Due to the asynchronous nature of adding new kTLS flows and handling HW asynchronous kTLS resync requests, the XSK ICOSQ was extended to support generic async operations, such as kTLS add flow and resync, in addition to the existing XSK usages. 3) kTLS hardware flow steering and classification: The driver already has the means to classify TCP ipv4/6 flows to send them to the corresponding RSS HW engine, as reflected in patches 3 through 5, the series will add a steering layer that will hook to the driver's TCP classifiers and will match on well known kTLS connection, in case of a match traffic will be redirected to the kTLS decryption engine, otherwise traffic will continue flowing normally to the TCP RSS engine. 3) kTLS add flow RX HW offload support New offload contexts post their static/progress params WQEs (Work Queue Element) to communicate the newly added kTLS contexts over the per-channel async ICOSQ. The Channel/RQ is selected according to the socket's rxq index. A new TLS-RX workqueue is used to allow asynchronous addition of steering rules, out of the NAPI context. It will be also used in a downstream patch in the resync procedure. Feature is OFF by default. Can be turned on by: $ ethtool -K <if> tls-hw-rx-offload on 4) Added mlx5 kTLS sw stats and new counters are documented in Documentation/networking/tls-offload.rst rx_tls_ctx - number of TLS RX HW offload contexts added to device for decryption. rx_tls_ooo - number of RX packets which were part of a TLS stream but did not arrive in the expected order and triggered the resync procedure. rx_tls_del - number of TLS RX HW offload contexts deleted from device (connection has finished). rx_tls_err - number of RX packets which were part of a TLS stream but were not decrypted due to unexpected error in the state machine. 5) Asynchronous RX resync a. The NIC driver indicates that it would like to resync on some TLS record within the received packet (P), but the driver does not know (yet) which of the TLS records within the packet. At this stage, the NIC driver will query the device to find the exact TCP sequence for resync (tcpsn), however, the driver does not wait for the device to provide the response. b. Eventually, the device responds, and the driver provides the tcpsn within the resync packet to KTLS. Now, KTLS can check the tcpsn against any processed TLS records within packet P, and also against any record that is processed in the future within packet P. The asynchronous resync path simplifies the device driver, as it can save bits on the packet completion (32-bit TCP sequence), and pass this information on an asynchronous command instead. Performance: CPU: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz, 24 cores, HT off NIC: ConnectX-6 Dx 100GbE dual port Goodput (app-layer throughput) comparison: +---------------+-------+-------+---------+ | # connections | 1 | 4 | 8 | +---------------+-------+-------+---------+ | SW (Gbps) | 7.26 | 24.70 | 50.30 | +---------------+-------+-------+---------+ | HW (Gbps) | 18.50 | 64.30 | 92.90 | +---------------+-------+-------+---------+ | Speedup | 2.55x | 2.56x | 1.85x * | +---------------+-------+-------+---------+ * After linerate is reached, diff is observed in CPU util ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-30net: sched: sch_red: Add qevents "early_drop" and "mark"Petr Machata2-0/+4
In order to allow acting on dropped and/or ECN-marked packets, add two new qevents to the RED qdisc: "early_drop" and "mark". Filters attached at "early_drop" block are executed as packets are early-dropped, those attached at the "mark" block are executed as packets are ECN-marked. Two new attributes are introduced: TCA_RED_EARLY_DROP_BLOCK with the block index for the "early_drop" qevent, and TCA_RED_MARK_BLOCK for the "mark" qevent. Absence of these attributes signifies "don't care": no block is allocated in that case, or the existing blocks are left intact in case of the change callback. For purposes of offloading, blocks attached to these qevents appear with newly-introduced binder types, FLOW_BLOCK_BINDER_TYPE_RED_EARLY_DROP and FLOW_BLOCK_BINDER_TYPE_RED_MARK. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-30net: sched: Introduce helpers for qevent blocksPetr Machata1-0/+49
Qevents are attach points for TC blocks, where filters can be put that are executed when "interesting events" take place in a qdisc. The data to keep and the functions to invoke to maintain a qevent will be largely the same between qevents. Therefore introduce sched-wide helpers for qevent management. Currently, similarly to ingress and egress blocks of clsact pseudo-qdisc, blocks attachment cannot be changed after the qdisc is created. To that end, add a helper tcf_qevent_validate_change(), which verifies whether block index attribute is not attached, or if it is, whether its value matches the current one (i.e. there is no material change). The function tcf_qevent_handle() should be invoked when qdisc hits the "interesting event" corresponding to a block. This function releases root lock for the duration of executing the attached filters, to allow packets generated through user actions (notably mirred) to be reinserted to the same qdisc tree. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-30net: sched: Pass root lock to Qdisc_ops.enqueuePetr Machata1-2/+4
A following patch introduces qevents, points in qdisc algorithm where packet can be processed by user-defined filters. Should this processing lead to a situation where a new packet is to be enqueued on the same port, holding the root lock would lead to deadlocks. To solve the issue, qevent handler needs to unlock and relock the root lock when necessary. To that end, add the root lock argument to the qdisc op enqueue, and propagate throughout. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-29sctp: use list_is_singular in sctp_list_single_entryGeliang Tang1-1/+1
Use list_is_singular() instead of open-coding. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-29bareudp: Added attribute to enable & disable rx metadata collectionMartin2-0/+2
Metadata need not be collected in receive if the packet from bareudp device is not targeted to openvswitch. Signed-off-by: Martin <martin.varghese@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28net/tls: Add asynchronous resyncBoris Pismenny1-1/+37
This patch adds support for asynchronous resynchronization in tls_device. Async resync follows two distinct stages: 1. The NIC driver indicates that it would like to resync on some TLS record within the received packet (P), but the driver does not know (yet) which of the TLS records within the packet. At this stage, the NIC driver will query the device to find the exact TCP sequence for resync (tcpsn), however, the driver does not wait for the device to provide the response. 2. Eventually, the device responds, and the driver provides the tcpsn within the resync packet to KTLS. Now, KTLS can check the tcpsn against any processed TLS records within packet P, and also against any record that is processed in the future within packet P. The asynchronous resync path simplifies the device driver, as it can save bits on the packet completion (32-bit TCP sequence), and pass this information on an asynchronous command instead. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-28Revert "net/tls: Add force_resync for driver resync"Boris Pismenny1-11/+1
This reverts commit b3ae2459f89773adcbf16fef4b68deaaa3be1929. Revert the force resync API. Not in use. To be replaced by a better async resync API downstream. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-28Merge branch 'mlx5-next' of ↵Saeed Mahameed5-6/+62
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: kTLS, Improve TLS params layout structures net/mlx5: Avoid eswitch header inclusion in fs core layer net/mlx5: Avoid RDMA file inclusion in core driver net/mlx5: Add support in query QP, CQ and MKEY segments net/mlx5: Export resource dump interface Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-27net/mlx5: kTLS, Improve TLS params layout structuresTariq Toukan3-5/+11
Add explicit WQE segment structures for the TLS static and progress params. According to the HW spec, TISN is not part of the progress params context, take it out of it. Rename the control segment tisn field as it could hold either a TIS or a TIR number. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-27net/mlx5: Avoid RDMA file inclusion in core driverParav Pandit1-1/+0
mlx5 cq.h does not depend on RDMA verbs. Remove RDMA verbs file inclusion. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-26net: mdio: add a forward declaration for reset_control to mdio.hBartosz Golaszewski1-0/+1
This header refers to struct reset_control but doesn't include any reset header. The structure definition is probably somehow indirectly pulled in since no warnings are reported but for the sake of correctness add the forward declaration for struct reset_control. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-26Merge branch '40GbE' of ↵David S. Miller1-0/+194
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2020-06-25 This series contains updates to i40e driver and removes the individual driver versions from all of the Intel wired LAN drivers. Shiraz moves the client header so that it can easily be shared between the i40e LAN driver and i40iw RDMA driver. Jesse cleans up the unused defines, since they are just dead weight. Alek reduces the unreasonably long wait time for a PF reset after reboot by using jiffies to limit the maximum wait time for the PF reset to succeed. Added additional logging to let the user know when the driver transitions into recovery mode. Adds new device support for our 5 Gbps NICs. Todd adds a check to see if MFS is set after warm reboot and notifies the user when MFS is set to anything lower than the default value. Arkadiusz fixes a possible race condition, where were holding a spin-lock while in atomic context. v2: removed code comments that were no longer applicable in patch 2 of the series. Also removed 'inline' from patch 4 and patch 8 of the series. Also re-arranged code to be able to remove the forward function declarations. Dropped patch 9 of the series, while the author works on cleaning up the commit message. v3: Updated patch 8 description to answer Jakub's questions ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-26i40e: remove unused definesJesse Brandeburg1-9/+0
Remove all the unused defines as they are just dead weight. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-06-26i40e: Move client header locationShiraz Saleem1-0/+203
Move i40e_client.h to include/linux/net/intel/* since its shared between i40iw and i40e. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-06-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller31-56/+149
Minor overlapping changes in xfrm_device.c, between the double ESP trailing bug fix setting the XFRM_INIT flag and the changes in net-next preparing for bonding encryption support. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds14-22/+58
Pull networking fixes from David Miller: 1) Don't insert ESP trailer twice in IPSEC code, from Huy Nguyen. 2) The default crypto algorithm selection in Kconfig for IPSEC is out of touch with modern reality, fix this up. From Eric Biggers. 3) bpftool is missing an entry for BPF_MAP_TYPE_RINGBUF, from Andrii Nakryiko. 4) Missing init of ->frame_sz in xdp_convert_zc_to_xdp_frame(), from Hangbin Liu. 5) Adjust packet alignment handling in ax88179_178a driver to match what the hardware actually does. From Jeremy Kerr. 6) register_netdevice can leak in the case one of the notifiers fail, from Yang Yingliang. 7) Use after free in ip_tunnel_lookup(), from Taehee Yoo. 8) VLAN checks in sja1105 DSA driver need adjustments, from Vladimir Oltean. 9) tg3 driver can sleep forever when we get enough EEH errors, fix from David Christensen. 10) Missing {READ,WRITE}_ONCE() annotations in various Intel ethernet drivers, from Ciara Loftus. 11) Fix scanning loop break condition in of_mdiobus_register(), from Florian Fainelli. 12) MTU limit is incorrect in ibmveth driver, from Thomas Falcon. 13) Endianness fix in mlxsw, from Ido Schimmel. 14) Use after free in smsc95xx usbnet driver, from Tuomas Tynkkynen. 15) Missing bridge mrp configuration validation, from Horatiu Vultur. 16) Fix circular netns references in wireguard, from Jason A. Donenfeld. 17) PTP initialization on recovery is not done properly in qed driver, from Alexander Lobakin. 18) Endian conversion of L4 ports in filters of cxgb4 driver is wrong, from Rahul Lakkireddy. 19) Don't clear bound device TX queue of socket prematurely otherwise we get problems with ktls hw offloading, from Tariq Toukan. 20) ipset can do atomics on unaligned memory, fix from Russell King. 21) Align ethernet addresses properly in bridging code, from Thomas Martitz. 22) Don't advertise ipv4 addresses on SCTP sockets having ipv6only set, from Marcelo Ricardo Leitner. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (149 commits) rds: transport module should be auto loaded when transport is set sch_cake: fix a few style nits sch_cake: don't call diffserv parsing code when it is not needed sch_cake: don't try to reallocate or unshare skb unconditionally ethtool: fix error handling in linkstate_prepare_data() wil6210: account for napi_gro_receive never returning GRO_DROP hns: do not cast return value of napi_gro_receive to null socionext: account for napi_gro_receive never returning GRO_DROP wireguard: receive: account for napi_gro_receive never returning GRO_DROP vxlan: fix last fdb index during dump of fdb with nhid sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket tc-testing: avoid action cookies with odd length. bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT net: dsa: sja1105: fix tc-gate schedule with single element net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules net: dsa: sja1105: unconditionally free old gating config net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top net: macb: free resources on failure path of at91ether_open() net: macb: call pm_runtime_put_sync on failure path ...
2020-06-26rds: transport module should be auto loaded when transport is setRao Shoaib1-1/+3
This enhancement auto loads transport module when the transport is set via SO_RDS_TRANSPORT socket option. Reviewed-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-26sctp: Don't advertise IPv4 addresses if ipv6only is set on the socketMarcelo Ricardo Leitner1-3/+5
If a socket is set ipv6only, it will still send IPv4 addresses in the INIT and INIT_ACK packets. This potentially misleads the peer into using them, which then would cause association termination. The fix is to not add IPv4 addresses to ipv6only sockets. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-26net: phy: Add a helper to return the index for of the internal delayDan Murphy1-0/+4
Add a helper function that will return the index in the array for the passed in internal delay value. The helper requires the array, size and delay value. The helper will then return the index for the exact match or return the index for the index to the closest smaller value. Signed-off-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phylink: add phylink_speed_(up|down) interfaceRussell King1-0/+2
Add an interface for the phy_speed_(up|down) functions when a driver makes use of phylink. These pass the call through to phylib when we have a normal PHY attached (i.o.w., not a PHY on a SFP module.) Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2-0/+9
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net, they are: 1) Unaligned atomic access in ipset, from Russell King. 2) Missing module description, from Rob Gill. 3) Patches to fix a module unload causing NULL pointer dereference in xtables, from David Wilder. For the record, I posting here his cover letter explaining the problem: A crash happened on ppc64le when running ltp network tests triggered by "rmmod iptable_mangle". See previous discussion in this thread: https://lists.openwall.net/netdev/2020/06/03/161 . In the crash I found in iptable_mangle_hook() that state->net->ipv4.iptable_mangle=NULL causing a NULL pointer dereference. net->ipv4.iptable_mangle is set to NULL in +iptable_mangle_net_exit() and called when ip_mangle modules is unloaded. A rmmod task was found running in the crash dump. A 2nd crash showed the same problem when running "rmmod iptable_filter" (net->ipv4.iptable_filter=NULL). To fix this I added .pre_exit hook in all iptable_foo.c. The pre_exit will un-register the underlying hook and exit would do the table freeing. The netns core does an unconditional +synchronize_rcu after the pre_exit hooks insuring no packets are in flight that have picked up the pointer before completing the un-register. These patches include changes for both iptables and ip6tables. We tested this fix with ltp running iptables01.sh and iptables01.sh -6 a loop for 72 hours. 4) Add a selftest for conntrack helper assignment, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-4/+6
Pull rdma fixes from Jason Gunthorpe: "Several regression fixes from work that landed in the merge window, particularly in the mlx5 driver: - Various static checker and warning fixes - General bug fixes in rvt, qedr, hns, mlx5 and hfi1 - Several regression fixes related to the ECE and QP changes in last cycle - Fixes for a few long standing crashers in CMA, uverbs ioctl, and xrc" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (25 commits) IB/hfi1: Add atomic triggered sleep/wakeup IB/hfi1: Correct -EBUSY handling in tx code IB/hfi1: Fix module use count flaw due to leftover module put calls IB/hfi1: Restore kfree in dummy_netdev cleanup IB/mad: Fix use after free when destroying MAD agent RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata RDMA/counter: Query a counter before release RDMA/mad: Fix possible memory leak in ib_mad_post_receive_mads() RDMA/mlx5: Fix integrity enabled QP creation RDMA/mlx5: Remove ECE limitation from the RAW_PACKET QPs RDMA/mlx5: Fix remote gid value in query QP RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path RDMA/core: Check that type_attrs is not NULL prior access RDMA/hns: Fix an cmd queue issue when resetting RDMA/hns: Fix a calltrace when registering MR from userspace RDMA/mlx5: Add missed RST2INIT and INIT2INIT steps during ECE handshake RDMA/cma: Protect bind_list and listen_list while finding matching cm id RDMA/qedr: Fix KASAN: use-after-free in ucma_event_handler+0x532 RDMA/efa: Set maximum pkeys device attribute RDMA/rvt: Fix potential memory leak caused by rvt_alloc_rq ...
2020-06-25Merge tag 'sound-5.8-rc3' of ↵Linus Torvalds2-0/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes gathered in the last two weeks. The major changes here are fixes for the recent DPCM regressions found on i.MX and Qualcomm platforms and fixes for resource leaks in ASoC DAI registrations. Other than those are mostly device-specific fixes including the usual USB- and HD-audio quirks, and a fix for syzkaller case and ID updates for new Intel platforms" * tag 'sound-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits) ALSA: usb-audio: Fix OOB access of mixer element list ALSA: usb-audio: add quirk for Samsung USBC Headset (AKG) ALSA: usb-audio: Add registration quirk for Kingston HyperX Cloud Flight S ASoC: rockchip: Fix a reference count leak. ASoC: amd: closing specific instance. ALSA: hda: Intel: add missing PCI IDs for ICL-H, TGL-H and EKL ASoC: hdac_hda: fix memleak with regmap not freed on remove ASoC: SOF: Intel: add PCI IDs for ICL-H and TGL-H ASoC: SOF: Intel: add PCI ID for CometLake-S ASoC: Intel: SOF: merge COMETLAKE_LP and COMETLAKE_H ALSA: hda/realtek: Add mute LED and micmute LED support for HP systems ALSA: usb-audio: Fix potential use-after-free of streams ALSA: hda/realtek - Add quirk for MSI GE63 laptop ASoC: fsl_ssi: Fix bclk calculation for mono channel ASoC: SOF: Intel: hda: Clear RIRB status before reading WP ASoC: rt1015: Update rt1015 default register value according to spec modification. ASoC: qcom: common: set correct directions for dailinks ASoc: q6afe: add support to get port direction ASoC: soc-pcm: fix checks for multi-cpu FE dailinks ASoC: rt5682: Let dai clks be registered whether mclk exists or not ...
2020-06-25net: qos: police action add index for tc flower offloadingPo Liu1-0/+1
Hardware device may include more than one police entry. Specifying the action's index make it possible for several tc filters to share the same police action when installing the filters. Propagate this index to device drivers through the flow offload intermediate representation, so that drivers could share a single hardware policer between multiple filters. v1->v2 changes: - Update the commit message suggest by Ido Schimmel <idosch@idosch.org> Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: qos: add tc police offloading action with max frame size limitPo Liu2-0/+11
Current police offloading support the 'burst'' and 'rate_bytes_ps'. Some hardware own the capability to limit the frame size. If the frame size larger than the setting, the frame would be dropped. For the police action itself already accept the 'mtu' parameter in tc command. But not extend to tc flower offloading. So extend 'mtu' to tc flower offloading. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25netfilter: ip6tables: Split ip6t_unregister_table() into pre_exit and exit ↵David Wilder1-0/+3
helpers. The pre_exit will un-register the underlying hook and .exit will do the table freeing. The netns core does an unconditional synchronize_rcu after the pre_exit hooks insuring no packets are in flight that have picked up the pointer before completing the un-register. Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: iptables: Split ipt_unregister_table() into pre_exit and exit ↵David Wilder1-0/+6
helpers. The pre_exit will un-register the underlying hook and .exit will do the table freeing. The netns core does an unconditional synchronize_rcu after the pre_exit hooks insuring no packets are in flight that have picked up the pointer before completing the un-register. Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25net: phy: make phy_disable_interrupts() non-staticJisheng Zhang1-0/+1
We face an issue with rtl8211f, a pin is shared between INTB and PMEB, and the PHY Register Accessible Interrupt is enabled by default, so the INTB/PMEB pin is always active in polling mode case. As Heiner pointed out "I was thinking about calling phy_disable_interrupts() in phy_init_hw(), to have a defined init state as we don't know in which state the PHY is if the PHY driver is loaded. We shouldn't assume that it's the chip power-on defaults, BIOS or boot loader could have changed this. Or in case of dual-boot systems the other OS could leave the PHY in whatever state." Make phy_disable_interrupts() non-static so that it could be used in phy_init_hw() to have a defined init state. Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: bridge: add a flag to avoid refreshing fdb when changing/addingNikolay Aleksandrov1-0/+1
When we modify or create a new fdb entry sometimes we want to avoid refreshing its activity in order to track it properly. One example is when a mac is received from EVPN multi-homing peer by FRR, which doesn't want to change local activity accounting. It makes it static and sets a flag to track its activity. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: bridge: add option to allow activity notifications for any fdb entriesNikolay Aleksandrov1-0/+11
This patch adds the ability to notify about activity of any entries (static, permanent or ext_learn). EVPN multihoming peers need it to properly and efficiently handle mac sync (peer active/locally active). We add a new NFEA_ACTIVITY_NOTIFY attribute which is used to dump the current activity state and to control if static entries should be monitored at all. We use 2 bits - one to activate fdb entry tracking (disabled by default) and the second to denote that an entry is inactive. We need the second bit in order to avoid multiple notifications of inactivity. Obviously this makes no difference for dynamic entries since at the time of inactivity they get deleted, while the tracked non-dynamic entries get the inactive bit set and get a notification. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: neighbor: add fdb extended attributeNikolay Aleksandrov1-0/+12
Add an attribute to NDA which will contain all future fdb-specific attributes in order to avoid polluting the NDA namespace with e.g. bridge or vxlan specific attributes. The attribute is called NDA_FDB_EXT_ATTRS and the structure would look like: [NDA_FDB_EXT_ATTRS] = { [NFEA_xxx] } Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: phy: add support for a common probe between shared PHYsAntoine Tenart1-3/+15
Shared PHYs (PHYs in the same hardware package) may have shared registers and their drivers would usually need to share information. There is currently a way to have a shared (part of the) init, by using phy_package_init_once(). This patch extends the logic to share parts of the probe to allow sharing the initialization of locks or resources retrieval. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: Do not clear the sock TX queue in sk_set_socket()Tariq Toukan1-1/+0
Clearing the sock TX queue in sk_set_socket() might cause unexpected out-of-order transmit when called from sock_orphan(), as outstanding packets can pick a different TX queue and bypass the ones already queued. This is undesired in general. More specifically, it breaks the in-order scheduling property guarantee for device-offloaded TLS sockets. Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it explicitly only where needed. Fixes: e022f0b4a03f ("net: Introduce sk_tx_queue_mapping") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: phy: Allow mdio buses to auto-probe c45 devicesJeremy Linton1-0/+8
The mdiobus_scan logic is currently hardcoded to only work with c22 devices. This works fairly well in most cases, but its possible that a c45 device doesn't respond despite being a standard phy. If the parent hardware is capable, it makes sense to scan for c22 devices before falling back to c45. As we want this to reflect the capabilities of the STA, lets add a field to the mii_bus structure to represent the capability. That way devices can opt into the extended scanning. Existing users should continue to default to c22 only scanning as long as they are zero'ing the structure before use. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Calvin Johnson <calvin.johnson@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: ipv6: Use struct_size() helper and kcalloc()Gustavo A. R. Silva1-6/+0
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. Also, remove unnecessary function ipv6_rpl_srh_alloc_size() and replace kzalloc() with kcalloc(), which has a 2-factor argument form for multiplication. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24udp: move gro declarations to net/udp.hEric Dumazet1-0/+7
This removes following warnings : CC net/ipv4/udp_offload.o net/ipv4/udp_offload.c:504:17: warning: no previous prototype for 'udp4_gro_receive' [-Wmissing-prototypes] 504 | struct sk_buff *udp4_gro_receive(struct list_head *head, struct sk_buff *skb) | ^~~~~~~~~~~~~~~~ net/ipv4/udp_offload.c:584:29: warning: no previous prototype for 'udp4_gro_complete' [-Wmissing-prototypes] 584 | INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff) | ^~~~~~~~~~~~~~~~~ CHECK net/ipv6/udp_offload.c net/ipv6/udp_offload.c:115:16: warning: symbol 'udp6_gro_receive' was not declared. Should it be static? net/ipv6/udp_offload.c:148:29: warning: symbol 'udp6_gro_complete' was not declared. Should it be static? CC net/ipv6/udp_offload.o net/ipv6/udp_offload.c:115:17: warning: no previous prototype for 'udp6_gro_receive' [-Wmissing-prototypes] 115 | struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb) | ^~~~~~~~~~~~~~~~ net/ipv6/udp_offload.c:148:29: warning: no previous prototype for 'udp6_gro_complete' [-Wmissing-prototypes] 148 | INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff) | ^~~~~~~~~~~~~~~~~ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: move tcp gro declarations to net/tcp.hEric Dumazet1-0/+4
This patch removes following (C=1 W=1) warnings for CONFIG_RETPOLINE=y : net/ipv4/tcp_offload.c:306:16: warning: symbol 'tcp4_gro_receive' was not declared. Should it be static? net/ipv4/tcp_offload.c:306:17: warning: no previous prototype for 'tcp4_gro_receive' [-Wmissing-prototypes] net/ipv4/tcp_offload.c:319:29: warning: symbol 'tcp4_gro_complete' was not declared. Should it be static? net/ipv4/tcp_offload.c:319:29: warning: no previous prototype for 'tcp4_gro_complete' [-Wmissing-prototypes] CHECK net/ipv6/tcpv6_offload.c net/ipv6/tcpv6_offload.c:16:16: warning: symbol 'tcp6_gro_receive' was not declared. Should it be static? net/ipv6/tcpv6_offload.c:29:29: warning: symbol 'tcp6_gro_complete' was not declared. Should it be static? CC net/ipv6/tcpv6_offload.o net/ipv6/tcpv6_offload.c:16:17: warning: no previous prototype for 'tcp6_gro_receive' [-Wmissing-prototypes] 16 | struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) | ^~~~~~~~~~~~~~~~ net/ipv6/tcpv6_offload.c:29:29: warning: no previous prototype for 'tcp6_gro_complete' [-Wmissing-prototypes] 29 | INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff) | ^~~~~~~~~~~~~~~~~ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24tcp: move ipv4_specific to tcp include fileEric Dumazet2-3/+2
Declare ipv4_specific once, in tcp.h were it belongs. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24tcp: move ipv6_specific declaration to remove a warningEric Dumazet1-0/+2
ipv6_specific should be declared in tcp include files, not mptcp. This removes the following warning : CHECK net/ipv6/tcp_ipv6.c net/ipv6/tcp_ipv6.c:78:42: warning: symbol 'ipv6_specific' was not declared. Should it be static? Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24tcp: add declarations to avoid warningsEric Dumazet1-0/+2
Remove these errors: net/ipv6/tcp_ipv6.c:1550:29: warning: symbol 'tcp_v6_rcv' was not declared. Should it be static? net/ipv6/tcp_ipv6.c:1770:30: warning: symbol 'tcp_v6_early_demux' was not declared. Should it be static? net/ipv6/tcp_ipv6.c:1550:29: warning: no previous prototype for 'tcp_v6_rcv' [-Wmissing-prototypes] 1550 | INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb) | ^~~~~~~~~~ net/ipv6/tcp_ipv6.c:1770:30: warning: no previous prototype for 'tcp_v6_early_demux' [-Wmissing-prototypes] 1770 | INDIRECT_CALLABLE_SCOPE void tcp_v6_early_demux(struct sk_buff *skb) | ^~~~~~~~~~~~~~~~~~ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24bonding/xfrm: use real_dev instead of slave_devJarod Wilson1-1/+1
Rather than requiring every hw crypto capable NIC driver to do a check for slave_dev being set, set real_dev in the xfrm layer and xso init time, and then override it in the bonding driver as needed. Then NIC drivers can always use real_dev, and at the same time, we eliminate the use of a variable name that probably shouldn't have been used in the first place, particularly given recent current events. CC: Boris Pismenny <borisp@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Leon Romanovsky <leon@kernel.org> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org Suggested-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24ipv6: fib6: avoid indirect calls from fib6_rule_lookupBrian Vazquez1-0/+36
It was reported that a considerable amount of cycles were spent on the expensive indirect calls on fib6_rule_lookup. This patch introduces an inline helper called pol_route_func that uses the indirect_call_wrappers to avoid the indirect calls. This patch saves around 50ns per call. Performance was measured on the receiver by checking the amount of syncookies that server was able to generate under a synflood load. Traffic was generated using trafgen[1] which was pushing around 1Mpps on a single queue. Receiver was using only one rx queue which help to create a bottle neck and make the experiment rx-bounded. These are the syncookies generated over 10s from the different runs: Whithout the patch: TcpExtSyncookiesSent 3553749 0.0 TcpExtSyncookiesSent 3550895 0.0 TcpExtSyncookiesSent 3553845 0.0 TcpExtSyncookiesSent 3541050 0.0 TcpExtSyncookiesSent 3539921 0.0 TcpExtSyncookiesSent 3557659 0.0 TcpExtSyncookiesSent 3526812 0.0 TcpExtSyncookiesSent 3536121 0.0 TcpExtSyncookiesSent 3529963 0.0 TcpExtSyncookiesSent 3536319 0.0 With the patch: TcpExtSyncookiesSent 3611786 0.0 TcpExtSyncookiesSent 3596682 0.0 TcpExtSyncookiesSent 3606878 0.0 TcpExtSyncookiesSent 3599564 0.0 TcpExtSyncookiesSent 3601304 0.0 TcpExtSyncookiesSent 3609249 0.0 TcpExtSyncookiesSent 3617437 0.0 TcpExtSyncookiesSent 3608765 0.0 TcpExtSyncookiesSent 3620205 0.0 TcpExtSyncookiesSent 3601895 0.0 Without the patch the average is 354263 pkt/s or 2822 ns/pkt and with the patch the average is 360738 pkt/s or 2772 ns/pkt which gives an estimate of 50 ns per packet. [1] http://netsniff-ng.org/ Changelog since v1: - Change ordering in the ICW (Paolo Abeni) Cc: Luigi Rizzo <lrizzo@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24indirect_call_wrapper: extend indirect wrapper to support up to 4 callsBrian Vazquez1-0/+12
There are many places where 2 annotations are not enough. This patch adds INDIRECT_CALL_3 and INDIRECT_CALL_4 to cover such cases. Signed-off-by: Brian Vazquez <brianvv@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24net: qed: fix left elements count calculationAlexander Lobakin1-10/+16
qed_chain_get_element_left{,_u32} returned 0 when the difference between producer and consumer page count was equal to the total page count. Fix this by conditional expanding of producer value (vs unconditional). This allowed to eliminate normalizaton against total page count, which was the cause of this bug. Misc: replace open-coded constants with common defines. Fixes: a91eb52abb50 ("qed: Revisit chain implementation") Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24bridge: uapi: mrp: Fix MRP_PORT_ROLEHoratiu Vultur1-1/+0
Currently the MRP_PORT_ROLE_NONE has the value 0x2 but this is in conflict with the IEC 62439-2 standard. The standard defines the following port roles: primary (0x0), secondary(0x1), interconnect(0x2). Therefore remove the port role none. Fixes: 4714d13791f831 ("bridge: uapi: mrp: Add mrp attributes.") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24rtnetlink: add keepalived rtm_protocolAlexandre Cassen1-22/+23
Keepalived can set global static ip routes or virtual ip routes dynamically following VRRP protocol states. Using a dedicated rtm_protocol will help keeping track of it. Changes in v2: - fix tab/space indenting Signed-off-by: Alexandre Cassen <acassen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23net/mlx5: Add support in query QP, CQ and MKEY segmentsMaor Gottlieb1-0/+3
Introduce new resource dump segments - PRM_QUERY_QP, PRM_QUERY_CQ and PRM_QUERY_MKEY. These segments contains the resource dump in PRM query format. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-06-23net/mlx5: Export resource dump interfaceMaor Gottlieb1-0/+48
Export some of the resource dump API. mlx5_ib driver will use it in downstream patches. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>