summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)AuthorFilesLines
2017-11-21mlxsw: spectrum: Do not try to create non-existing ports during unsplitIdo Schimmel2-2/+8
On some systems, when we unsplit a port we need to re-create two ports instead. On other systems, only one needs to be re-created. Do not try to create a port if during driver initialization it was assigned a negative module number, which is invalid. This avoids the following error during unsplit: [ 941.012478] mlxsw_spectrum 0000:01:00.0: Port 43: Failed to map module The error is harmless and caused by the fact that a local port is already mapped to module 0. Fixes: be94535f9531 ("mlxsw: spectrum: Make split flow match firmware requirements") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-16Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-3/+2
Merge updates from Andrew Morton: - a few misc bits - ocfs2 updates - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits) memory hotplug: fix comments when adding section mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP mm: simplify nodemask printing mm,oom_reaper: remove pointless kthread_run() error check mm/page_ext.c: check if page_ext is not prepared writeback: remove unused function parameter mm: do not rely on preempt_count in print_vma_addr mm, sparse: do not swamp log with huge vmemmap allocation failures mm/hmm: remove redundant variable align_end mm/list_lru.c: mark expected switch fall-through mm/shmem.c: mark expected switch fall-through mm/page_alloc.c: broken deferred calculation mm: don't warn about allocations which stall for too long fs: fuse: account fuse_inode slab memory as reclaimable mm, page_alloc: fix potential false positive in __zone_watermark_ok mm: mlock: remove lru_add_drain_all() mm, sysctl: make NUMA stats configurable shmem: convert shmem_init_inodecache() to void Unify migrate_pages and move_pages access checks mm, pagevec: rename pagevec drained field ...
2017-11-16mm: remove __GFP_COLDMel Gorman1-3/+2
As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advantage of. Juding from the users of __GFP_COLD, it is likely that a number of them are the result of copying other sites instead of actually measuring the impact. Remove the __GFP_COLD parameter which simplifies a number of paths in the page allocator. This is potentially controversial but bear in mind that the size of the per-cpu pagelists versus modern cache sizes means that the whole per-cpu list can often fit in the L3 cache. Hence, there is only a potential benefit for microbenchmarks that alloc/free pages in a tight loop. It's even worse when THP is taken into account which has little or no chance of getting a cache-hot page as the per-cpu list is bypassed and the zeroing of multiple pages will thrash the cache anyway. The truncate microbenchmarks are not shown as this patch affects the allocation path and not the free path. A page fault microbenchmark was tested but it showed no sigificant difference which is not surprising given that the __GFP_COLD branches are a miniscule percentage of the fault path. Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-16Merge tag 'for-linus' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is a fairly plain pull request. Lots of driver updates across the stack, a huge number of static analysis cleanups including a close to 50 patch series from Bart Van Assche, and a number of new features inside the stack such as general CQ moderation support. Nothing really stands out, but there might be a few conflicts as you take things in. In particular, the cleanups touched some of the same lines as the new timer_setup changes. Everything in this pull request has been through 0day and at least two days of linux-next (since Stephen doesn't necessarily flag new errors/warnings until day2). A few more items (about 30 patches) from Intel and Mellanox showed up on the list on Tuesday. I've excluded those from this pull request, and I'm sure some of them qualify as fixes suitable to send any time, but I still have to review them fully. If they contain mostly fixes and little or no new development, then I will probably send them through by the end of the week just to get them out of the way. There was a break in my acceptance of patches which coincides with the computer problems I had, and then when I got things mostly back under control I had a backlog of patches to process, which I did mostly last Friday and Monday. So there is a larger number of patches processed in that timeframe than I was striving for. Summary: - Add iWARP support to qedr driver - Lots of misc fixes across subsystem - Multiple update series to hns roce driver - Multiple update series to hfi1 driver - Updates to vnic driver - Add kref to wait struct in cxgb4 driver - Updates to i40iw driver - Mellanox shared pull request - timer_setup changes - massive cleanup series from Bart Van Assche - Two series of SRP/SRPT changes from Bart Van Assche - Core updates from Mellanox - i40iw updates - IPoIB updates - mlx5 updates - mlx4 updates - hns updates - bnxt_re fixes - PCI write padding support - Sparse/Smatch/warning cleanups/fixes - CQ moderation support - SRQ support in vmw_pvrdma" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (296 commits) RDMA/core: Rename kernel modify_cq to better describe its usage IB/mlx5: Add CQ moderation capability to query_device IB/mlx4: Add CQ moderation capability to query_device IB/uverbs: Add CQ moderation capability to query_device IB/mlx5: Exposing modify CQ callback to uverbs layer IB/mlx4: Exposing modify CQ callback to uverbs layer IB/uverbs: Allow CQ moderation with modify CQ iw_cxgb4: atomically flush the qp iw_cxgb4: only call the cq comp_handler when the cq is armed iw_cxgb4: Fix possible circular dependency locking warning RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion IB/core: Only maintain real QPs in the security lists IB/ocrdma_hw: remove unnecessary code in ocrdma_mbx_dealloc_lkey RDMA/core: Make function rdma_copy_addr return void RDMA/vmw_pvrdma: Add shared receive queue support RDMA/core: avoid uninitialized variable warning in create_udata RDMA/bnxt_re: synchronize poll_cq and req_notify_cq verbs RDMA/bnxt_re: Flush CQ notification Work Queue before destroying QP RDMA/bnxt_re: Set QP state in case of response completion errors RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entries ...
2017-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds78-2503/+10193
Pull networking updates from David Miller: "Highlights: 1) Maintain the TCP retransmit queue using an rbtree, with 1GB windows at 100Gb this really has become necessary. From Eric Dumazet. 2) Multi-program support for cgroup+bpf, from Alexei Starovoitov. 3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew Lunn. 4) Add meter action support to openvswitch, from Andy Zhou. 5) Add a data meta pointer for BPF accessible packets, from Daniel Borkmann. 6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet. 7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli. 8) More work to move the RTNL mutex down, from Florian Westphal. 9) Add 'bpftool' utility, to help with bpf program introspection. From Jakub Kicinski. 10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper Dangaard Brouer. 11) Support 'blocks' of transformations in the packet scheduler which can span multiple network devices, from Jiri Pirko. 12) TC flower offload support in cxgb4, from Kumar Sanghvi. 13) Priority based stream scheduler for SCTP, from Marcelo Ricardo Leitner. 14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg. 15) Add RED qdisc offloadability, and use it in mlxsw driver. From Nogah Frankel. 16) eBPF based device controller for cgroup v2, from Roman Gushchin. 17) Add some fundamental tracepoints for TCP, from Song Liu. 18) Remove garbage collection from ipv6 route layer, this is a significant accomplishment. From Wei Wang. 19) Add multicast route offload support to mlxsw, from Yotam Gigi" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits) tcp: highest_sack fix geneve: fix fill_info when link down bpf: fix lockdep splat net: cdc_ncm: GetNtbFormat endian fix openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start netem: remove unnecessary 64 bit modulus netem: use 64 bit divide by rate tcp: Namespace-ify sysctl_tcp_default_congestion_control net: Protect iterations over net::fib_notifier_ops in fib_seq_sum() ipv6: set all.accept_dad to 0 by default uapi: fix linux/tls.h userspace compilation error usbnet: ipheth: prevent TX queue timeouts when device not ready vhost_net: conditionally enable tx polling uapi: fix linux/rxrpc.h userspace compilation errors net: stmmac: fix LPI transitioning for dwmac4 atm: horizon: Fix irq release error net-sysfs: trigger netlink notification on ifalias change via sysfs openvswitch: Using kfree_rcu() to simplify the code openvswitch: Make local function ovs_nsh_key_attr_size() static openvswitch: Fix return value check in ovs_meter_cmd_features() ...
2017-11-14mlxsw: spectrum_router: Add batch neighbour deletionIdo Schimmel1-3/+12
In commit 4a3c67a6e7cd ("mlxsw: spectrum_router: Don't batch neighbour deletion") I removed the support for batch deletion of neighbours on a router interface (RIF) since at that time the firmware did not support it for IPv6 neighbours. This is now supported by the version enforced by the driver, so there is no reason to delete neighbours one by one anymore. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14mlxsw: spectrum: Update minimum firmware version to 13.1530.152Shalom Toledo1-2/+2
This new firmware contains: - Support Spectrum A1 revision - Batch deletion of IPv6 neighbours - Remove incorrect VPD capability Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-13Merge branch 'locking-core-for-linus' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The main changes in this cycle are: - Another attempt at enabling cross-release lockdep dependency tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time with better performance and fewer false positives. (Byungchul Park) - Introduce lockdep_assert_irqs_enabled()/disabled() and convert open-coded equivalents to lockdep variants. (Frederic Weisbecker) - Add down_read_killable() and use it in the VFS's iterate_dir() method. (Kirill Tkhai) - Convert remaining uses of ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle driven. (Mark Rutland, Paul E. McKenney) - Get rid of lockless_dereference(), by strengthening Alpha atomics, strengthening READ_ONCE() with smp_read_barrier_depends() and thus being able to convert users of lockless_dereference() to READ_ONCE(). (Will Deacon) - Various micro-optimizations: - better PV qspinlocks (Waiman Long), - better x86 barriers (Michael S. Tsirkin) - better x86 refcounts (Kees Cook) - ... plus other fixes and enhancements. (Borislav Petkov, Juergen Gross, Miguel Bernal Marin)" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE rcu: Use lockdep to assert IRQs are disabled/enabled netpoll: Use lockdep to assert IRQs are disabled/enabled timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled irq_work: Use lockdep to assert IRQs are disabled/enabled irq/timings: Use lockdep to assert IRQs are disabled/enabled perf/core: Use lockdep to assert IRQs are disabled/enabled x86: Use lockdep to assert IRQs are disabled/enabled smp/core: Use lockdep to assert IRQs are disabled/enabled timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled timers/nohz: Use lockdep to assert IRQs are disabled/enabled workqueue: Use lockdep to assert IRQs are disabled/enabled irq/softirqs: Use lockdep to assert IRQs are disabled/enabled locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled() locking/pvqspinlock: Implement hybrid PV queued/unfair locks locking/rwlocks: Fix comments x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion() workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes ...
2017-11-13net/mlx4: Use Kconfig flag to remove support of old gen2 Mellanox devicesSlava Shwartsman2-0/+10
Since Mellanox focus is on newer adapters, we would like to have the ability to disable the support for old gen2 adapters. This can be done by turning off the MLX4_CORE_GEN2 Kconfig flag. We keep it turned on by default. Signed-off-by: Slava Shwartsman <slavash@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller5-13/+20
2017-11-10net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEsEugenia Emantayev1-1/+1
This is to prevent the case of working with a single MPWQE (1 WQE is always reserved as RQ is linked-list). When the WQE is fully consumed, HW should still have available buffer in order not to drop packets. Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)") Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-10net/mlx5e: Set page to null in case dma mapping failsInbar Karmy1-7/+5
Currently, when dma mapping fails, put_page is called, but the page is not set to null. Later, in the page_reuse treatment in mlx5e_free_rx_descs(), mlx5e_page_release() is called for the second time, improperly doing dma_unmap (for a non-mapped address) and an extra put_page. Prevent this by nullifying the page pointer when dma_map fails. Fixes: accd58833237 ("net/mlx5e: Introduce RX Page-Reuse") Signed-off-by: Inbar Karmy <inbark@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-10net/mlx5e: Fix napi poll with zero budgetSaeed Mahameed1-4/+6
napi->poll can be called with budget 0, e.g. in netpoll scenarios where the caller only wants to poll TX rings (poll_one_napi@net/core/netpoll.c). The below commit changed RX polling from "while" loop to "do {} while", which caused to ignore the initial budget and handle at least one RX packet. This fixes the following warning: [ 2852.049194] mlx5e_napi_poll+0x0/0x260 [mlx5_core] exceeded budget in poll [ 2852.049195] ------------[ cut here ]------------ [ 2852.049195] WARNING: CPU: 0 PID: 25691 at net/core/netpoll.c:171 netpoll_poll_dev+0x18a/0x1a0 Fixes: 4b7dfc992514 ("net/mlx5e: Early-return on empty completion queues") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Martin KaFai Lau <kafai@fb.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-10net/mlx5: Cancel health poll before sending panic teardown commandHuy Nguyen1-0/+7
After the panic teardown firmware command, health_care detects the error in PCI bus and calls the mlx5_pci_err_detected. This health_care flow is no longer needed because the panic teardown firmware command will bring down the PCI bus communication with the HCA. The solution is to cancel the health care timer and its pending workqueue request before sending panic teardown firmware command. Kernel trace: mlx5_core 0033:01:00.0: Shutdown was called mlx5_core 0033:01:00.0: health_care:154:(pid 9304): handling bad device here mlx5_core 0033:01:00.0: mlx5_handle_bad_state:114:(pid 9304): NIC state 1 mlx5_core 0033:01:00.0: mlx5_pci_err_detected was called mlx5_core 0033:01:00.0: mlx5_enter_error_state:96:(pid 9304): start mlx5_3:mlx5_ib_event:3061:(pid 9304): warning: event on port 0 mlx5_core 0033:01:00.0: mlx5_enter_error_state:104:(pid 9304): end Unable to handle kernel paging request for data at address 0x0000003f Faulting instruction address: 0xc0080000434b8c80 Fixes: 8812c24d28f4 ('net/mlx5: Add fast unload support in shutdown flow') Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-10net/mlx5: Loop over temp list to release delay eventsHuy Nguyen1-1/+1
list_splice_init initializing waiting_events_list after splicing it to temp list, therefore we should loop over temp list to fire the events. Fixes: 4ca637a20a52 ("net/mlx5: Delay events till mlx5 interface's add complete for pci resume") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-10Merge tag 'mlx5-updates-2017-11-09' of ↵David S. Miller7-52/+184
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-11-09 This series introduces vlan offloads related improvements for mlx5 ethernet netdev driver, from Gal Pressman. - Add support for 802.1ad vlan filter - Add support for 802.1ad vlan insertion - Add vlan offloads statistics to ethtool (inserted/stripped vlans) - CHECKSUM_COMPLETE support for vlan traffic when vlan stripping is off! (Finally) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-5/+8
Simple cases of overlapping changes in the packet scheduler. Must easier to resolve this time. Which probably means that I screwed it up somehow. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-09net/mlx5e: CHECKSUM_COMPLETE offload for VLAN/QinQ packetsGal Pressman1-3/+14
When the VLAN tag is present in the packet buffer (i.e VLAN stripping disabled, QinQ) the driver will currently report CHECKSUM_UNNECESSARY. Instead of using CHECKSUM_COMPLETE offload for packets with first ethertype of IPv4/6, use it for packets with last ethertype of IPv4/6 to cover the former cases as well. The checksum field present in the CQE is calculated from the IP header until the end of the packet. When the first ethertype is different than IPv4/6 (for ex. 802.1Q VLAN) a checksum of the VLAN header/s should be added. The small header/s checksum calculation will allow us to use CHECKSUM_COMPLETE instead of CHECKSUM_UNNECESSARY. Testing bandwidth of one and 8 TCP streams to a single RQ, LRO and VLAN stripping offloads disabled: CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz NIC: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] Before: +--------------+--------------------+---------------------+----------------------+ | Traffic type | 1 Stream BW [Mbps] | 8 Streams BW [Mbps] | Checksum offload | +--------------+--------------------+---------------------+----------------------+ | Untagged | 28,247.35 | 24,716.88 | CHECKSUM_COMPLETE | | VLAN | 27,516.69 | 23,752.26 | CHECKSUM_UNNECESSARY | | QinQ | 6,961.30 | 20,667.04 | CHECKSUM_UNNECESSARY | +--------------+--------------------+---------------------+----------------------+ Now: +--------------+--------------------+---------------------+-------------------+ | Traffic type | 1 Stream BW [Mbps] | 8 Streams BW [Mbps] | Checksum offload | +--------------+--------------------+---------------------+-------------------+ | Untagged | 28,521.28 | 24,926.32 | CHECKSUM_COMPLETE | | VLAN | 27,389.37 | 23,715.34 | CHECKSUM_COMPLETE | | QinQ | 6,901.77 | 20,845.73 | CHECKSUM_COMPLETE | +--------------+--------------------+---------------------+-------------------+ No performance degradation observed. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09net/mlx5e: Add VLAN offloads statisticsGal Pressman5-1/+15
The following counters are now exposed through ethtool -S: rx[i]_removed_vlan_packets (per channel) rx_removed_vlan_packets tx[i]_added_vlan_packets (per channel) tx_added_vlan_packets rx_removed_vlan_packets: The number of packets that had their outer VLAN header stripped to the CQE by the hardware. tx_added_vlan_packets: The number of packets that had their outer VLAN header inserted by the hardware. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09net/mlx5e: Add 802.1ad VLAN insertion supportGal Pressman2-0/+3
Report VLAN insertion support for S-tagged packets and add support by choosing the correct VLAN type in the WQE. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09net/mlx5e: Add 802.1ad VLAN filter steering rulesGal Pressman3-13/+112
When a user chooses to use 802.1ad VLAN the proper steering rules will be added to the VLAN flow table (matching the specific S-tag VID). Due to current hardware limitation, when using 802.1ad, we must disable C-tag VLAN stripping on the RQs. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09net/mlx5e: Declare bitmap using kernel macroGal Pressman1-1/+1
Replace explicit declaration of bitmap with DECLARE_BITMAP kernel macro. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09net/mlx5e: Add rollback on add VLAN failureGal Pressman1-1/+6
When add VLAN rule fails the active vlan bit should be cleared. Fixes: afb736e9330a ("net/mlx5: Ethernet resource handling files") Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09net/mlx5e: Rename VLAN related variables and functionsGal Pressman3-37/+37
Rename VLAN related symbols to better reflect the fact that they are associated to C-tag VLAN. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-08mlxsw: spectrum: Fix error return code in mlxsw_sp_port_create()Wei Yongjun1-0/+1
Fix to return a negative error code from the VID create error handling case instead of 0, as done elsewhere in this function. Fixes: c57529e1d5d8 ("mlxsw: spectrum: Replace vPorts with Port-VLAN") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08mlxsw: spectrum: Support general qdisc statsNogah Frankel2-0/+56
Add support for ndo_setup_tc with enum tc_setup_type value of TC_SETUP_QDISC_STATS. This call updates the generic qdisc stats from the cache if the handle ID that is asked for matching the root qdisc ID and fails otherwise. Currently doesn't support qlen and rqueues. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08mlxsw: spectrum: Support RED xstatsNogah Frankel2-0/+60
Add support for ndo_setup_tc with enum tc_setup_type value of TC_SETUP_RED_XSTATS. This call returns the RED qdisc xstats from the cache if the handle ID that is asked for matching the root qdisc ID and fails otherwise. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08mlxsw: spectrum: Collect tclass related stats periodicallyNogah Frankel2-0/+43
Add more statistics to be collected from the HW periodically. These stats are tclass based (beside ECN marked packet, that exist only port based). They are needed to expose RED qdisc stats and xstats correctly. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08mlxsw: reg: Add ext and tc-cong counter groupsYuval Mintz1-0/+19
This adds the counter group definitions for 2 new counter groups which are necessary for gaining ECN & wred counters. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08mlxsw: spectrum: Support RED qdisc offloadNogah Frankel4-1/+193
Add support for ndo_setup_tc with enum tc_setup_type value of TC_SETUP_RED. This call sets RED qdisc on a traffic class. This patch supports RED qdisc only as a root qdisc and set in on the default tclass. It can be set with or without ECN. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08mlxsw: reg: Add cwtp & cwtpm registersNogah Frankel1-0/+187
This patch adds 2 new registers: - Congestion WRED ECN TClass Profile Register [CWTP] - Congestion WRED ECN TClass and Pool Mapping Register [CWTPM] These registers would later be needed to offload RED-related functionality to the HW. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08net_sch: mqprio: Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIONogah Frankel2-2/+2
Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO to match the new convention. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08net/mlx5e/core/en_fs: fix pointer dereference after free in ↵Gustavo A. R. Silva1-5/+8
mlx5e_execute_l2_action hn is being kfree'd in mlx5e_del_l2_from_hash and then dereferenced by accessing hn->ai.addr Fix this by copying the MAC address into a local variable for its safe use in all possible execution paths within function mlx5e_execute_l2_action. Addresses-Coverity-ID: 1417789 Fixes: eeb66cdb6826 ("net/mlx5: Separate between E-Switch and MPFS") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-07Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar11-97/+232
Conflicts: include/linux/compiler-clang.h include/linux/compiler-gcc.h include/linux/compiler-intel.h include/uapi/linux/stddef.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-05Merge tag 'mlx5-updates-2017-11-04' of ↵David S. Miller17-52/+591
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-11-04 This series includes: From Huy: dscp to priority mapping for Ethernet packet. =================================================== First six patches enable differentiated services code point (dscp) to priority mapping for Ethernet packet. Once this feature is enabled, the packet is routed to the corresponding priority based on its dscp. User can combine this feature with priority flow control (pfc) feature to have priority flow control based on the dscp. Firmware interface: Mellanox firmware provides two control knobs for this feature: QPTS register allow changing the trust state between dscp and pcp mode. The default is pcp mode. Once in dscp mode, firmware will route the packet based on its dscp value if the dscp field exists. QPDPM register allow mapping a specific dscp (0 to 63) to a specific priority (0 to 7). By default, all the dscps are mapped to priority zero. Software interface: This feature is controlled via application priority TLV. IEEE specification P802.1Qcd/D2.1 defines priority selector id 5 for application priority TLV. This APP TLV selector defines DSCP to priority map. This APP TLV can be sent by the switch or can be set locally using software such as lldptool. In mlx5 drivers, we add the support for net dcb's getapp and setapp call back. Mlx5 driver only handles the selector id 5 application entry (dscp application priority application entry). If user sends multiple dscp to priority APP TLV entries on the same dscp, the last sent one will take effect. All the previous sent will be deleted. The firmware trust state (in QPTS register) is changed based on the number of dscp to priority application entries. When the first dscp to priority application entry is added by the user, the trust state is changed to dscp. When the last dscp to priority application entry is deleted by the user, the trust state is changed to pcp. When the port is in DSCP trust state, the transmit queue is selected based on the dscp of the skb. When the port is in DSCP trust state and vport inline mode is not NONE, firmware requires mlx5 driver to copy the IP header to the wqe ethernet segment inline header if the skb has it. This is done by changing the transmit queue sq's min inline mode to L3. Note that the min inline mode of sqs that belong to other features such as xdpsq, icosq are not modified. =================================================== Plus to the dscp series, some small misc changes are include as well: From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic From Or Gerlitz, Enlarge the NIC TC offload table size From Rabie, Initialize destination_flow struct to 0 From Feras, Add inner TTC table to IPoIB flow steering From Tal, Enable CQE based moderation on TX CQ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: bpf: rename ndo_xdp to ndo_bpfJakub Kicinski2-5/+5
ndo_xdp is a control path callback for setting up XDP in the driver. We can reuse it for other forms of communication between the eBPF stack and the drivers. Rename the callback and associated structures and definitions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net/mlx5e: Enable CQE based moderation on TX CQTal Gilboa4-23/+71
By using CQE based moderation on TX CQ we can reduce the number of TX interrupt rate. Besides the benefit of less interrupts, this also allows the kernel to better utilize TSO. Since TSO has some CPU overhead, it might not aggregate when CPU is under high stress. By reducing the interrupt rate and the CPU utilization, we can get better aggregation and better overall throughput. The feature is enabled by default and has a private flag in ethtool for control. Throughput, interrupt rate and TSO utilization improvements: (ConnectX-4Lx 40GbE, unidirectional, 1/16 TCP streams, 64B packets) --------------------------------------------------------- Metric | Streams | CQE Based | EQE Based | improvement --------------------------------------------------------- BW | 1 | 2.4Gb/s | 2.15Gb/s | +11.6% IR | 1 | 27Kips | 50.6Kips | -46.7% TSO Util | 1 | 74.6% | 71% | +5% BW | 16 | 29Gb/s | 25.85Gb/s | +12.2% IR | 16 | 482Kips | 745Kips | -35.3% TSO Util | 16 | 69.1% | 49% | +41.1% *BW = Bandwidth, IR = Interrupt rate, ips = interrupt per second. TSO Util = bytes in TSO sessions / all bytes transferred Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-05net/mlx5e: IPoIB, Add inner TTC table to IPoIB flow steeringFeras Daoud3-3/+16
For supported platforms, add inner TTC flow table to enhanced IPoIB flow steering. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-05net/mlx5: Initialize destination_flow struct to 0Rabie Loulou5-15/+15
This is needed in order to enlarge it with more members that will get value of 0 when not set. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-05net/mlx5: Enlarge the NIC TC offload table sizeOr Gerlitz1-2/+13
The NIC TC offload table size was hard coded to 1k. Change it to be min(max NIC RX table size, min(max flow counters, 64k) * num flow groups) where the max values are read from the firmware and the number of flow groups is hard-coded as before this change. We don't know upfront the division of flows to groups (== different masks). This setup allows each group to be of size up to the where we want to go (when supported, all offloaded flows use counters). Thus, we don't expect multiple occurences for a group which in turn would add steering hops. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-05net/mlx5e: DCBNL, Add debug messages logInbar Karmy1-1/+23
Add debug print when changing the configuration of QoS through dcbnl. Use ethtool -s <devname> msglvl hw on/off to toggle debug messages. Signed-off-by: Inbar Karmy <inbark@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-05net/mlx5e: Add support for ethtool msglvl supportGal Pressman3-0/+25
Use ethtool -s <devname> msglvl <type> on/off to toggle debug messages. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-05net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQHuy Nguyen5-6/+73
If the port is in DSCP trust state, packets are placed in the right priority queue based on the dscp value. This is done by selecting the transmit queue based on the dscp of the skb. Until now select_queue honors priority only from the vlan header. However that is not sufficient in cases where port trust state is DSCP mode as packet might not even contain vlan header. Therefore if the port is in dscp trust state and vport's min inline mode is not NONE, copy the IP header to the eseg's inline header if the skb has it. This is done by changing the transmit queue sq's min inline mode to L3. Note that the min inline mode of sqs that belong to other features such as xdpsq, icosq are not modified. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-05net/mlx5e: Add dcbnl dscp to priority supportHuy Nguyen3-2/+232
This patch implements dcbnl hooks to set and delete DSCP to priority map as defined by the DCB subsystem. Device maintains internal trust state which needs to be set to DSCP state for performing DSCP to priority mapping. When the first dscp to priority APP entry is added by the user, the trust state is changed to dscp. When the last dscp to priority APP entry is deleted by the user, the trust state is changed to pcp. If user sends multiple dscp to priority APP entries on the same dscp, the last sent one will take effect. All the previous sent will be deleted. The dscp to priority APP entries are added and deleted in the net/dcb APP database using dcb_ieee_setapp/getapp. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-05net/mlx5: QPTS and QPDPM register firmware command supportHuy Nguyen1-0/+99
The QPTS register allows changing the priority trust state between pcp and dscp. Add support to get/set trust state from device. When the port is in pcp/dscp trust state, packet is routed by hardware to matching priority based on its pcp/dscp value respectively. The QPDPM register allow channing the dscp to priority mapping. Add support to get/set dscp to priority mapping from device. Note that to change a dscp mapping, the "e" bit of this dscp structure must be set in the QPDPM firmware command. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-05net/mlx5: QCAM register firmware command supportHuy Nguyen3-0/+24
The QCAM register provides capability bit for all the QoS registers using ACCESS_REG command. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-0/+4
Files removed in 'net-next' had their license header updated in 'net'. We take the remove from 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum_router: Handle down of tunnel underlayPetr Machata1-2/+55
When the bound device of a tunnel device is down, encapsulated packets are not egressed anymore, but tunnel decap still works. Extend mlxsw_sp_nexthop_rif_update() to take IFF_UP into consideration when deciding whether a given next hop should be offloaded. Because the new logic was added to mlxsw_sp_nexthop_rif_update(), this fixes the case where a newly-added tunnel has a down bound device, which would previously be fully offloaded. Now the down state of the bound device is noted and next hops forwarding to such tunnel are not offloaded. In addition to that, notice NETDEV_UP and NETDEV_DOWN of a bound device to force refresh of tunnel encap route offloads. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum_ipip: Handle underlay device changePetr Machata1-2/+3
When a bound device of an IP-in-IP tunnel changes, such as through 'ip tunnel change name $name dev $dev', the loopback backing the tunnel needs to be recreated. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04mlxsw: spectrum: Handle NETDEV_CHANGE on L3 tunnelsPetr Machata4-7/+114
Changes to L3 tunnel netdevices (through `ip tunnel change' as well as `ip link set') lead to NETDEV_CHANGE being generated on the tunnel device. Because what is relevant for the tunnel in question depends on the tunnel type, handling of the event is dispatched to the IPIP module through a newly-added interface mlxsw_sp_ipip_ops.ol_netdev_change(). IPIP tunnels now remember the last set of tunnel parameters in struct mlxsw_sp_ipip_entry.parms, and use it to figure out what exactly has changed. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>