summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-11-30tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPINGFrancis Yan20-5/+90
This patch exports the sender chronograph stats via the socket SO_TIMESTAMPING channel. Currently we can instrument how long a particular application unit of data was queued in TCP by tracking SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having these sender chronograph stats exported simultaneously along with these timestamps allow further breaking down the various sender limitation. For example, a video server can tell if a particular chunk of video on a connection takes a long time to deliver because TCP was experiencing small receive window. It is not possible to tell before this patch without packet traces. To prepare these stats, the user needs to set SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags while requesting other SOF_TIMESTAMPING TX timestamps. When the timestamps are available in the error queue, the stats are returned in a separate control message of type SCM_TIMESTAMPING_OPT_STATS, in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME, TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond. Signed-off-by: Francis Yan <francisyyan@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30tcp: export sender limits chronographs to TCP_INFOFrancis Yan2-0/+24
This patch exports all the sender chronograph measurements collected in the previous patches to TCP_INFO interface. Note that busy time exported includes all the other sending limits (rwnd-limited, sndbuf-limited). Internally the time unit is jiffy but externally the measurements are in microseconds for future extensions. Signed-off-by: Francis Yan <francisyyan@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30tcp: instrument how long TCP is limited by insufficient send bufferFrancis Yan3-3/+24
This patch measures the amount of time when TCP runs out of new data to send to the network due to insufficient send buffer, while TCP is still busy delivering (i.e. write queue is not empty). The goal is to indicate either the send buffer autotuning or user SO_SNDBUF setting has resulted network under-utilization. The measurement starts conservatively by checking various conditions to minimize false claims (i.e. under-estimation is more likely). The measurement stops when the SOCK_NOSPACE flag is cleared. But it does not account the time elapsed till the next application write. Also the measurement only starts if the sender is still busy sending data, s.t. the limit accounted is part of the total busy time. Signed-off-by: Francis Yan <francisyyan@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30tcp: instrument how long TCP is limited by receive windowFrancis Yan1-2/+9
This patch measures the total time when the TCP stops sending because the receiver's advertised window is not large enough. Note that once the limit is lifted we are likely in the busy status if we have data pending. Signed-off-by: Francis Yan <francisyyan@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30tcp: instrument how long TCP is busy sendingFrancis Yan3-4/+24
This patch measures TCP busy time, which is defined as the period of time when sender has data (or FIN) to send. The time starts when data is buffered and stops when the write queue is flushed by ACKs or error events. Note the busy time does not include SYN time, unless data is included in SYN (i.e. Fast Open). It does include FIN time even if the FIN carries no payload. Excluding pure FIN is possible but would incur one additional test in the fast path, which may not be worth it. Signed-off-by: Francis Yan <francisyyan@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30tcp: instrument tcp sender limits chronographsFrancis Yan3-2/+49
This patch implements the skeleton of the TCP chronograph instrumentation on sender side limits: 1) idle (unspec) 2) busy sending data other than 3-4 below 3) rwnd-limited 4) sndbuf-limited The limits are enumerated 'tcp_chrono'. Since a connection in theory can idle forever, we do not track the actual length of this uninteresting idle period. For the rest we track how long the sender spends in each limit. At any point during the life time of a connection, the sender must be in one of the four states. If there are multiple conditions worthy of tracking in a chronograph then the highest priority enum takes precedence over the other conditions. So that if something "more interesting" starts happening, stop the previous chrono and start a new one. The time unit is jiffy(u32) in order to save space in tcp_sock. This implies application must sample the stats no longer than every 49 days of 1ms jiffy. Signed-off-by: Francis Yan <francisyyan@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30cpsw: ethtool: add support for getting/setting EEE registersYegor Yefremov1-0/+26
Add the ability to query and set Energy Efficient Ethernet parameters via ethtool for applicable devices. This patch doesn't activate full EEE support in cpsw driver, but it enables reading and writing EEE advertising settings. This way one can disable advertising EEE for certain speeds. Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com> Acked-by: Rami Rosen <roszenrami@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30hv_netvsc: remove excessive logging on MTU changeVitaly Kuznetsov2-7/+7
When we change MTU or the number of channels on a netvsc device we get the following logged: hv_netvsc bf5edba8...: net device safe to remove hv_netvsc: hv_netvsc channel opened successfully hv_netvsc bf5edba8...: Send section size: 6144, Section count:2560 hv_netvsc bf5edba8...: Device MAC 00:15:5d:1e:91:12 link state up This information is useful as debug at most. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30Merge branch 'mlxsw-enhancements'David S. Miller3-1/+7
Jiri Pirko says: ==================== mlxsw: couple of enhancements and fixes Couple of enhancements and fixes from Ido. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30mlxsw: core: Change order of operations in removal pathIdo Schimmel1-1/+1
We call bus->init() before allocating 'lag.mapping'. Change the order of operations in removal path to reflect that. This makes the error path of mlxsw_core_bus_device_register() symmetric with mlxsw_core_bus_device_unregister(). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30mlxsw: core: Add missing rollback in error pathIdo Schimmel1-0/+1
Without this rollback, the thermal zone is still registered during the error path, whereas its private data is freed upon the destruction of the underlying bus device due to the use of devm_kzalloc(). This results in use after free. Fix this by calling mlxsw_thermal_fini() from the appropriate place in the error path. Fixes: a50c1e35650b ("mlxsw: core: Implement thermal zone") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30mlxsw: spectrum_buffers: Limit size of poolsIdo Schimmel1-0/+3
The shared buffer pools are containers whose size is used to calculate the maximum usage for packets from / to a specific port / {port, PG/TC}, when dynamic threshold is employed. While it's perfectly fine for the sum of the pools to exceed the maximum size of the shared buffer, a single pool cannot. Add a check when the pool size is set and forbid sizes larger than the maximum size of the shared buffer. Without the patch: $ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype dynamic // No error is returned With the patch: $ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype dynamic devlink answers: Invalid argument Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30mlxsw: resources: Add maximum buffer sizeIdo Schimmel1-0/+2
We need to be able to limit the size of shared buffer pools, so query the maximum size from the device during init. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30mlxsw: switchib: add MLXSW_PCI dependencyArnd Bergmann1-1/+1
The newly added switchib driver fails to link if MLXSW_PCI=m: drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function^Cmlxsw_sib_module_exit': switchib.c:(.exit.text+0x8): undefined reference to `mlxsw_pci_driver_unregister' switchib.c:(.exit.text+0x10): undefined reference to `mlxsw_pci_driver_unregister' drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function `mlxsw_sib_module_init': switchib.c:(.init.text+0x28): undefined reference to `mlxsw_pci_driver_register' switchib.c:(.init.text+0x38): undefined reference to `mlxsw_pci_driver_register' switchib.c:(.init.text+0x48): undefined reference to `mlxsw_pci_driver_unregister' The other two such sub-drivers have a dependency, so add the same one here. In theory we could allow this driver if MLXSW_PCI is disabled, but it's probably not worth it. Fixes: d1ba52638456 ("mlxsw: switchib: Introduce SwitchIB and SwitchIB silicon driver") Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30net: phy: Fix use after free in phy_detach()Geert Uytterhoeven1-2/+2
If device_release_driver(&phydev->mdio.dev) is called, it releases all resources belonging to the PHY device. Hence the subsequent call to phy_led_triggers_unregister() will access already freed memory when unregistering the LEDs. Move the call to phy_led_triggers_unregister() before the possible call to device_release_driver() to fix this. Fixes: 2e0bc452f4721520 ("net: phy: leds: add support for led triggers on phy link state change") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Zach Brown <zach.brown@ni.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30stmmac: fix comments, make debug output consistentPavel Machek2-7/+10
Fix comments, add some new, and make debugfs output consistent. Signed-off-by: Pavel Machek <pavel@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30bpf: cgroup: fix documentation of __cgroup_bpf_update()Daniel Mack1-2/+2
There's a 'not' missing in one paragraph. Add it. Fixes: 3007098494be ("cgroup: add support for eBPF programs") Signed-off-by: Daniel Mack <daniel@zonque.org> Reported-by: Rami Rosen <roszenrami@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30Merge branch 'eee-broken-modes'David S. Miller5-9/+98
Jerome Brunet says: ==================== Fix OdroidC2 Gigabit Tx link issue This patchset fixes an issue with the OdroidC2 board (DWMAC + RTL8211F). The platform seems to enter LPI on the Rx path too often while performing relatively high TX transfer. This eventually break the link (both Tx and Rx), and require to bring the interface down and up again to get the Rx path working again. The root cause of this issue is not fully understood yet but disabling EEE advertisement on the PHY prevent this feature to be negotiated. With this change, the link is stable and reliable, with the expected throughput performance. The patchset adds options in the generic phy driver to disable EEE advertisement, through device tree. The way it is done is very similar to the handling of the max-speed property. Changes since V2: [2] - Rename "eee-advert-disable" to "eee-broken-modes" to make the intended purpose of this option clear (flag broken configuration, not a configuration option) - Add DT bindings constants so the DT configuration is more user friendly - Submit to net-next instead of net. Changes since V1: [1] - Disable the advertisement of EEE in the generic code instead of the realtek driver. [1] : http://lkml.kernel.org/r/1479220154-25851-1-git-send-email-jbrunet@baylibre.com [2] : http://lkml.kernel.org/r/1479742524-30222-1-git-send-email-jbrunet@baylibre.com ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30dt: bindings: add ethernet phy eee-broken-modes option documentationjbrunet1-0/+2
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Reviewed-by: Andreas Färber <afaerber@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30dt-bindings: net: add EEE capability constantsjbrunet1-0/+19
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Tested-by: Yegor Yefremov <yegorslists@googlemail.com> Tested-by: Andreas Färber <afaerber@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30net: phy: add an option to disable EEE advertisementjbrunet3-9/+77
This patch adds an option to disable EEE advertisement in the generic PHY by providing a mask of prohibited modes corresponding to the value found in the MDIO_AN_EEE_ADV register. On some platforms, PHY Low power idle seems to be causing issues, even breaking the link some cases. The patch provides a convenient way for these platforms to disable EEE advertisement and work around the issue. Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Tested-by: Yegor Yefremov <yegorslists@googlemail.com> Tested-by: Andreas Färber <afaerber@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30net: stmmac: enable tx queue 0 for gmac4 IPs synthesized with multiple TX queuesNiklas Cassel2-1/+14
The dwmac4 IP can synthesized with 1-8 number of tx queues. On an IP synthesized with DWC_EQOS_NUM_TXQ > 1, all txqueues are disabled by default. For these IPs, the bitfield TXQEN is R/W. Always enable tx queue 0. The write will have no effect on IPs synthesized with DWC_EQOS_NUM_TXQ == 1. The driver does still not utilize more than one tx queue in the IP. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30net: arc_emac: add dependencies on associated arches and compile testPeter Robinson1-2/+3
Add dependencies on the architectures that support these devices and add compile test to ensure ongoing code build coverage. Signed-off-by: Peter Robinson <pbrobinson@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30MAINTAINERS: Add device tree bindings to mv88e6xx sectionAndreas Färber1-0/+2
Also include the netdev list for convenience, as done elsewhere. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Andreas Färber <afaerber@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29mlx4: give precise rx/tx bytes/packets countersEric Dumazet4-23/+58
mlx4 stats are chaotic because a deferred work queue is responsible to update them every 250 ms. Even sampling stats every one second with "sar -n DEV 1" gives variations like the following : lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65 07:39:22 eth0 146877.00 3265554.00 9467.15 4828168.50 07:39:23 eth0 146587.00 3260329.00 9448.15 4820445.98 07:39:24 eth0 146894.00 3259989.00 9468.55 4819943.26 07:39:25 eth0 110368.00 2454497.00 7113.95 3629012.17 <<>> 07:39:26 eth0 146563.00 3257502.00 9447.25 4816266.23 07:39:27 eth0 145678.00 3258292.00 9389.79 4817414.39 07:39:28 eth0 145268.00 3253171.00 9363.85 4809852.46 07:39:29 eth0 146439.00 3262185.00 9438.97 4823172.48 07:39:30 eth0 146758.00 3264175.00 9459.94 4826124.13 07:39:31 eth0 146843.00 3256903.00 9465.44 4815381.97 Average: eth0 142827.50 3179259.70 9206.30 4700578.16 This patch allows rx/tx bytes/packets counters being folded at the time we need stats. We now can fetch stats every 1 ms if we want to check NIC behavior on a small time window. It is also easier to detect anomalies. lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65 07:42:50 eth0 142915.00 3177696.00 9212.06 4698270.42 07:42:51 eth0 143741.00 3200232.00 9265.15 4731593.02 07:42:52 eth0 142781.00 3171600.00 9202.92 4689260.16 07:42:53 eth0 143835.00 3192932.00 9271.80 4720761.39 07:42:54 eth0 141922.00 3165174.00 9147.64 4679759.21 07:42:55 eth0 142993.00 3207038.00 9216.78 4741653.05 07:42:56 eth0 141394.06 3154335.64 9113.85 4663731.73 07:42:57 eth0 141850.00 3161202.00 9144.48 4673866.07 07:42:58 eth0 143439.00 3180736.00 9246.05 4702755.35 07:42:59 eth0 143501.00 3210992.00 9249.99 4747501.84 Average: eth0 142835.66 3182165.93 9206.98 4704874.08 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29geneve: fix ip_hdr_len reserved for geneve6 tunnel.Haishuang Yan1-1/+1
It shold reserved sizeof(ipv6hdr) for geneve in ipv6 tunnel. Fixes: c3ef5aa5e5 ('geneve: Merge ipv4 and ipv6 geneve_build_skb()') Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29Merge branch 'phy-doc-improvements'David S. Miller1-35/+105
Florian Fainelli says: ==================== Documentation: net: phy: Improve documentation This patch series addresses discussions and feedback that was recently received on the mailing-list in the area of: flow control/pause frames, interpretation of phy_interface_t and finally add some links to useful standards documents. Changes in v3: - add Timur's feedback into patch 3 Changes in v2: - clarify a few things in the RGMII section, add a paragraph about common issues with RGMII delay mismatches ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29Documentation: net: phy: Add links to several standards documentsFlorian Fainelli1-0/+10
Add links to the IEEE 802.3-2008 document, and the RGMII v1.3 and v2.0 revisions of the standard. Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29Documentation: net: phy: Add blurb about RGMIIFlorian Fainelli1-0/+77
RGMII is a recurring source of pain for people with Gigabit Ethernet hardware since it may require PHY driver and MAC driver level configuration hints. Document what are the expectations from PHYLIB and what options exist. Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29Documentation: net: phy: Add a paragraph about pause frames/flow controlFlorian Fainelli1-2/+16
Describe that the Ethernet MAC controller is ultimately responsible for dealing with proper pause frames/flow control advertisement and enabling, and that it is therefore allowed to have it change phydev->supported/advertising with SUPPORTED_Pause and SUPPORTED_AsymPause. Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29Documentation: net: phy: remove description of function pointersFlorian Fainelli1-33/+2
Remove the function pointers documentation which duplicates information found in include/linux/phy.h. Maintaining documentation about two different locations just does not work, but the code is less likely to be outdated. Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net: dsa: mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt countAndreas Färber1-1/+1
mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings, so free the same amount. This will be 8 or 9 in practice, less than 16. Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.") Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Andreas Färber <afaerber@suse.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28Merge branch 'mlx5-DCBX-and-ethtool-updates'David S. Miller12-68/+985
Saeed Mahameed says: ==================== Mellanox 100G mlx5 DCBX and ethtool updates This series provides the following mlx5 updates: From Huy: DCBX CEE API and DCBX firmware/host modes support. - 1st patch ensures the dcbnl_rtnl_ops is published only when the qos capability bits is on. - 2nd patch adds the support for CEE interfaces into mlx5 dcbnl_rtnl_ops - 3rd patch refactors ETS query to read ETS configuration directly from firmware rather than having a software shadow to it. The existing IEEE interfaces stays the same. - 4th patch adds the support for MLX5_REG_DCBX_PARAM and MLX5_REG_DCBX_APP firmware commands to manipulate mlx5 DCBX mode. - 5th patch adds the driver support for the new DCBX firmware. This ensures the backward compatibility versus the old and new firmware. With the new DCBX firmware, qos settings can be controlled by either firmware or software depending on the DCBX mode. From Kamal and Saeed: - mlx5 self-test support. From Shaker: - Private flag to give the user the ability to enable/disable mlx5 CQE compression. V1->V2: - Check ETS capability where needed in: ("net/mlx5e: Read ETS settings directly from firmware") - Fix return value of mlx5e_dcbnl_switch_to_host_mode in: ("net/mlx5e: ConnectX-4 firmware support for DCBX") - Update commit message of: ("net/mlx5e: ConnectX-4 firmware support for DCBX") - Fix two sparse static check warnings in en_selftest.c This series was generated against commit: e5f12b3f5ebb ("Merge branch 'mlxsw-trap-groups-and-policers'") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net/mlx5e: Add CQE compression user controlShaker Daibes5-13/+51
The user can now override the automatic driver decision using the rx_cqe_compress flag, which is the preference for CQE compression. The flag is initialized with the automatic driver decision. Signed-off-by: Shaker Daibes <shakerd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net/mlx5e: Moves pflags to priv->paramsShaker Daibes3-12/+14
pflags is a configuration parameter for the netdev, naturally it belongs to priv->params. Also introduce MLX5E_GET_PFLAG Signed-off-by: Shaker Daibes <shakerd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net/mlx5e: Add support for loopback selftestSaeed Mahameed4-3/+227
Extend the self diagnostic tests to support loopback test. The loopback test doesn't require the offline flag, it will use the generic dev_queue_xmit and a dedicated packet_type to capture and verify mlx5e selftest loopback packets. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net/mlx5e: Add support for ethtool self diagnostics testKamal Heib4-2/+139
The self diagnostics test implementaion include the following features: 1. Link Test: Check that link is in up state. 2. Speed Test: Check that link was negotiated correctly. 3. Health Test: Check the device health. Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net/mlx5e: Add DCBX control interfaceHuy Nguyen1-3/+24
Use setdcbx interface to set the DCBX mode to firmware or os. If setdcbx is called with mode value of zero, the DCBX mode is set to firmware. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net/mlx5e: ConnectX-4 firmware support for DCBXHuy Nguyen3-28/+94
DBCX by default is controlled by firmware where dcbx capability bit is set. In this mode, firmware is responsible for reading/sending the TLV packets from/to the remote partner. This patch sets up the infrastructure to move between HOST/FW DCBX control mode. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net/mlx5: Add DCBX firmware commands supportHuy Nguyen3-0/+29
Add set/query commands for DCBX_PARAM register Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net/mlx5e: Read ETS settings directly from firmwareHuy Nguyen3-25/+56
Issue description: Current implementation saves the ETS settings from user in a temporal soft copy and returns this settings when user queries the ETS settings. With the new DCBX firmware, the ETS settings can be changed by firmware when the DCBX is in firmware controlled mode. Therefore, user will obtain wrong values from the temporal soft copy. Solution: 1. Read the ETS settings directly from firmware. 2. For tc_tsa: a. Initialize tc_tsa to vendor IEEE_8021QAZ_TSA_VENDOR at netdev creation. b. When reading ETS setting from FW, if the traffic class bandwidth is less than 100, set tc_tsa to IEEE_8021QAZ_TSA_ETS. This implementation solves the scenarios when the DCBX is in FW control and willing bit is on which means the ETS setting is dictated by remote switch. Also check ETS capability where needed. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net/mlx5e: Support DCBX CEE APIHuy Nguyen4-2/+370
Add DCBX CEE API interface for ConnectX-4. Configurations are stored in a temporary structure and are applied to the card's firmware when the CEE's setall callback function is called. Note: priority group in CEE is equivalent to traffic class in ConnectX-4 hardware spec. bw allocation per priority in CEE is not supported because ConnectX-4 only supports bw allocation per traffic class. user priority in CEE does not have an equivalent term in ConnectX-4. Therefore, user priority to priority mapping in CEE is not supported. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net/mlx5e: Add qos capability checkHuy Nguyen1-1/+2
Make sure firmware supports qos before exposing the DCB API. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28virtio-net: enable multiqueue by defaultJason Wang1-2/+7
We use single queue even if multiqueue is enabled and let admin to enable it through ethtool later. This is used to avoid possible regression (small packet TCP stream transmission). But looks like an overkill since: - single queue user can disable multiqueue when launching qemu - brings extra troubles for the management since it needs extra admin tool in guest to enable multiqueue - multiqueue performs much better than single queue in most of the cases So this patch enables multiqueue by default: if #queues is less than or equal to #vcpu, enable as much as queue pairs; if #queues is greater than #vcpu, enable #vcpu queue pairs. Cc: Hannes Frederic Sowa <hannes@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Neil Horman <nhorman@redhat.com> Cc: Jeremy Eder <jeder@redhat.com> Cc: Marko Myllynen <myllynen@redhat.com> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28Merge branch 'MV88E6097-fixes'David S. Miller1-0/+2
Stefan Eichenberger says: ==================== Fix support for the MV88E6097 This patchset fixes the following two issues for the MV88E6097: - Add missing definition of g1_irqs - Add missing comment ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net: dsa: mv88e6xxx: add missing comment for MV88E6097Stefan Eichenberger1-0/+1
Add a missing comment for the MV88E6097 because of unification. Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28net: dsa: mv88e6xxx: add g1_irqs definition for MV88E6097Stefan Eichenberger1-0/+1
Add the missing definition of g1_irqs for MV88E6097. Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28Merge branch 'bpf-misc-next'David S. Miller20-59/+160
Daniel Borkmann says: ==================== BPF cleanups and misc updates This patch set adds couple of cleanups in first few patches, exposes owner_prog_type for array maps as well as mlocked mem for maps in fdinfo, allows for mount permissions in fs and fixes various outstanding issues in selftests and samples. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28bpf: fix multiple issues in selftest suite and samplesDaniel Borkmann9-11/+64
1) The test_lru_map and test_lru_dist fails building on my machine since the sys/resource.h header is not included. 2) test_verifier fails in one test case where we try to call an invalid function, since the verifier log output changed wrt printing function names. 3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for retrieving the number of possible CPUs. This is broken at least in our scenario and really just doesn't work. glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF. First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l, if that fails, depending on the config, it either tries to count CPUs in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead. If /proc/cpuinfo has some issue, it returns just 1 worst case. This oddity is nothing new [1], but semantics/behaviour seems to be settled. _SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if that fails it looks into /proc/stat for cpuX entries, and if also that fails for some reason, /proc/cpuinfo is consulted (and returning 1 if unlikely all breaks down). While that might match num_possible_cpus() from the kernel in some cases, it's really not guaranteed with CPU hotplugging, and can result in a buffer overflow since the array in user space could have too few number of slots, and on perpcu map lookup, the kernel will write beyond that memory of the value buffer. William Tu reported such mismatches: [...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu() happens when CPU hotadd is enabled. For example, in Fusion when setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64 -smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf() will be 2 [2]. [...] Documentation/cputopology.txt says /sys/devices/system/cpu/possible outputs cpu_possible_mask. That is the same as in num_possible_cpus(), so first step would be to fix the _SC_NPROCESSORS_CONF calls with our own implementation. Later, we could add support to bpf(2) for passing a mask via CPU_SET(3), for example, to just select a subset of CPUs. BPF samples code needs this fix as well (at least so that people stop copying this). Thus, define bpf_num_possible_cpus() once in selftests and import it from there for the sample code to avoid duplicating it. The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated. After all three issues are fixed, the test suite runs fine again: # make run_tests | grep self selftests: test_verifier [PASS] selftests: test_maps [PASS] selftests: test_lru_map [PASS] selftests: test_kmod.sh [PASS] [1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html Fixes: 3059303f59cf ("samples/bpf: update tracex[23] examples to use per-cpu maps") Fixes: 86af8b4191d2 ("Add sample for adding simple drop program to link") Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY") Fixes: e15596717948 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH") Fixes: ebb676daa1a3 ("bpf: Print function name in addition to function id") Fixes: 5db58faf989f ("bpf: Add tests for the LRU bpf_htab") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: William Tu <u9012063@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28bpf: allow for mount options to specify permissionsDaniel Borkmann1-1/+53
Since we recently converted the BPF filesystem over to use mount_nodev(), we now have the possibility to also hold mount options in sb's s_fs_info. This work implements mount options support for specifying permissions on the sb's inode, which will be used by tc when it manually needs to mount the fs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>