summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-10-18net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't availablePaul Blakey1-22/+82
When adding a vxlan tc rule, and a neighbour isn't available, we don't insert any rule to hardware. Once we enable offloading flows with multiple priorities, a packet that should have matched this rule will continue in hardware pipeline and might match a wrong one. This is unlike in tc software path where it will be matched and forwarded to the vxlan device (which will cause a ARP lookup eventually) and stop processing further tc filters. To address that, when when a neighbour isn't available (EAGAIN from attach_encap), or gets deleted, change the original action to be a forward to slow path instead. Neighbour update will restore the original action once the neighbour becomes available. This will be done atomically so at any given time we will have a the correct match. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5: E-Switch, Enable setting goto slow path chain actionPaul Blakey2-0/+8
A pre-step for the tc offloads code to use this when a neigh is not available for encap rules. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5e: Avoid duplicated code for tc offloads add/del fdb ruleOr Gerlitz1-41/+50
The code for adding/deleting fdb flow is repeated when user-space does flow add/del and when we add/del from the neigh update path - unify them to avoid the duplication. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5e: For TC offloads, always add new flow instead of appending the actionsPaul Blakey2-3/+3
When replacing a tc flower rule, flower first requests to add the new rule (new action), then deletes the old one. But currently when asked to add a new tc flower flow, we append the actions (and counters to it). This can result in a fte with two flow counters or conflicting actions (drop and encap action) which firmware complains/errs about and isn't achieving what the user aimed for. Instead, insert the flow using the new no-append flag which will add a new HW rule, the old flow and rule will be deleted later by flower Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanmox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5: Add a no-append flow insertion modePaul Blakey5-9/+24
If no-append flag is set, we will add a new FTE, instead of appending the actions of the inserted rule when the same match already exists. While here, move the has_flow_tag boolean indicator to be a flag too. This patch doesn't change any functionality. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanmox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5: E-Switch, Add chains and prioritiesPaul Blakey3-86/+339
A chain is a group of priorities, so use the fdb parallel sub namespaces to implement chains, and a flow table for each priority in them. Because these namespaces are parallel and in series to the slow path fdb, the chains aren't connected to one another (but to the slow path), and one must use a explicit goto action to reach a different chain. Flow tables for the priorities will be created on demand and destroyed once not used. The Firmware has four pools of tables for sizes S/XS/M/L (4k, 64k, 1m, 4m). We maintain ghost copies of the pools occupancy. When a new table is to be created, we scan the pools from large to small and find the 1st table size which can be now created. When a table is destroyed, we update the relevant pool. Multi chain/prio isn't enabled yet by this patch, for now all flows will use the default chain 0, and prio 1. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5: E-Switch, Have explicit API to delete fwd rulesOr Gerlitz3-2/+14
Be symmetric with the e-switch API to add rules which has a specific function to add fwd rules which are used as part of vport mirroring. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5: Split FDB fast path prio to multiple namespacesPaul Blakey5-11/+101
Towards supporting multi-chains and priorities, split the FDB fast path to multiple namespaces (sub namespaces), each with multiple priorities. This patch adds a new flow steering type, FS_TYPE_PRIO_CHAINS, which is like current FS_TYPE_PRIO, but may contain only namespaces, and those will be in parallel to one another in terms of managing of the flow tables connections inside them. Meaning, while searching for the next or previous flow table to connect for a new table inside such namespace we skip the parallel namespaces in the same level under the FS_TYPE_PRIO_CHAINS prio we originated from. We use this new type for splitting the fast path prio into multiple parallel namespaces, each containing normal prios. The prios inside them (and their tables) will be connected to one another, but not from one parallel namespace to another, instead the last prio in each namespace will be connected to the next prio in the containing FDB namespace, which is the slow path prio. Signed-off-by: Paul Blakey <paulb@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5: Add cap bits for multi fdb encapPaul Blakey1-1/+3
If set, the firmware supports creating of flow tables with encap enabled while VFs are configured, if we already created one (restriction still applies on the first creation). Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5e: Split TC add rule path for nic vs e-switchRoi Dayan1-47/+138
Move to have clear separation on the code path to add nic vs e-switch flows. While here we break the code that deals with adding offloaded TC tool to few smaller stages, each on helper function. Besides getting us simpler and readable code, these are pre-steps for being able to have two HW flows serving one SW TC flow for some e-switch use cases. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5e: Change return type of tc add flow functionsRabie Loulou1-47/+39
Refactor the flow add utility functions to return err code instead of rule pointers. This will allow for simpler logic when one tc rule is duplicated to two HW rules in downstream patches. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5: Use flow counter IDs and not the wrapping cache objectMark Bloch9-19/+23
Currently, when a flow rule is created using the FS core layer, the caller has to pass the entire flow counter object and not just the counter HW handle (ID). This requires both the FS core and the caller to have knowledge about the inner implementation of the FS layer flow counters cache and limits the possible users. Move to use the counter ID across the place when dealing with flows. Doing this decoupling, now can we privatize the inner implementation of the flow counters. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18net/mlx5: E-Switch, Get counters for offloaded flows from callersMark Bloch5-36/+33
There's no real reason for the e-switch logic to manage the creation of counters for offloaded flows. The API already has the directive for the caller to denote they want to attach a counter to the created flow. As such, we go and move the management of flow counters to the mlx5e tc offload logic. This also lets us remove an inelegant interface where the FS layer had to provide a way to retrieve a counter from a flow rule. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-18Merge branch 'mlx5-next' of ↵Saeed Mahameed21-230/+392
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into net-next mlx5 updates for both net-next and rdma-next * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: (21 commits) net/mlx5: Expose DC scatter to CQE capability bit net/mlx5: Update mlx5_ifc with DEVX UID bits net/mlx5: Set uid as part of DCT commands net/mlx5: Set uid as part of SRQ commands net/mlx5: Set uid as part of SQ commands net/mlx5: Set uid as part of RQ commands net/mlx5: Set uid as part of QP commands net/mlx5: Set uid as part of CQ commands net/mlx5: Rename incorrect naming in IFC file net/mlx5: Export packet reformat alloc/dealloc functions net/mlx5: Pass a namespace for packet reformat ID allocation net/mlx5: Expose new packet reformat capabilities {net, RDMA}/mlx5: Rename encap to reformat packet net/mlx5: Move header encap type to IFC header file net/mlx5: Break encap/decap into two separated flow table creation flags net/mlx5: Add support for more namespaces when allocating modify header net/mlx5: Export modify header alloc/dealloc functions net/mlx5: Add proper NIC TX steering flow tables support net/mlx5: Cleanup flow namespace getter switch logic net/mlx5: Add memic command opcode to command checker ... Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-16tcp, ulp: remove socket lock assertion on ULP cleanupDaniel Borkmann1-2/+4
Eric reported that syzkaller triggered a splat in tcp_cleanup_ulp() where assertion sock_owned_by_me() failed. This happened through inet_csk_prepare_forced_close() first releasing the socket lock, then calling into tcp_done(newsk) which is called after the inet_csk_prepare_forced_close() and therefore without the socket lock held. The sock_owned_by_me() assertion can generally be removed as the only place where tcp_cleanup_ulp() is called from now is out of inet_csk_destroy_sock() -> sk->sk_prot->destroy() where socket is in dead state and unreachable. Therefore, add a comment why the check is not needed instead. Fixes: 8b9088f806e1 ("tcp, ulp: enforce sock_owned_by_me upon ulp init and cleanup") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net/mlx5: Expose DC scatter to CQE capability bitYonatan Cohen1-1/+2
dc_req_scat_data_cqe capability bit determines if requester scatter to cqe is available for 64 bytes CQE over DC transport type. Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-10-16Merge branch 'hns3-Some-cleanup-and-bugfix-for-desc-filling'David S. Miller2-78/+62
Yunsheng Lin says: ==================== Some cleanup and bugfix for desc filling When retransmiting packets, skb_cow_head which is called in hns3_set_tso may clone a new header. And driver will clear the checksum of the header after doing DMA map, so HW will read the old header whose L3 checksum is not cleared and calculate a wrong L3 checksum. Also When sending a big fragment using multiple buffer descriptor, hns3 does one maping, but do multiple unmapping when tx is done, which may cause unmapping problem. This patchset does some cleanup before fixing the above problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: hns3: fix for multiple unmapping DMA problemFuyun Liang1-5/+8
When sending a big fragment using multiple buffer descriptor, hns3 does one maping, but do multiple unmapping when tx is done, which may cause unmapping problem. To fix it, this patch makes sure the value of desc_cb.length of the non-first bd is zero. If desc_cb.length is zero, we do not unmap the buffer. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: hns3: rename hns_nic_dma_unmapFuyun Liang1-7/+7
To keep symmetrical, this patch renames hns_nic_dma_unmap to hns3_clear_desc. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: hns3: add handling for big TX fragmentFuyun Liang1-14/+31
This patch unifies big tx fragment handling for tso and non-tso case. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: hns3: move DMA map into hns3_fill_descPeng Li2-26/+24
To solve the L3 checksum error problem which happens when driver does not clear L3 checksum, DMA map should be done after calling skb_cow_head. This patch moves DMA map into hns3_fill_desc to ensure that DMA map is done after calling skb_cow_head. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: hns3: remove hns3_fill_desc_tsoPeng Li1-39/+5
This patch removes hns3_fill_desc_tso in preparation for fixing some desc filling bug, because for tso or non-tso case, we will use the unified hns3_fill_desc. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16Merge branch 'qed-Align-PTT-and-add-various-link-modes'David S. Miller8-103/+587
Rahul Verma says: ==================== Align PTT and add various link modes. This series aligns the ptt propagation as local ptt or global ptt. Adds new transceiver modes, speed capabilities and board config, which is utilized to display the enhanced link modes, media types and speed. Enhances the link with detailed information. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16qed: Prevent link getting down in case of autoneg-off.Rahul Verma1-7/+33
Newly added link modes are required to be added during setting link modes. If the new link mode is not available during qed_set_link, it may cause link getting down due to empty supported capability, being passed to MFW, after setting autoneg off/on with current/supported speed. Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16qede: Check available link modes before link set from ethtool.Rahul Verma1-19/+45
Set link mode after checking available "supported" link caps of the port. Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16qed: Add supported link and advertise link to display in ethtool.Rahul Verma5-58/+426
Added transceiver type, speed capability and board types in HSI, are utilizing to display the accurate link information in ethtool. Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16qed: Added supported transceiver modes, speed capability and board config to ↵Rahul Verma1-1/+53
HSI. Added transceiver modes with different speed and media type, speed capability and supported board types in HSI, which will be utilizing to display correct specification of link modes and speed type. Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16qed: Align local and global PTT to propagate through the APIs.Rahul Verma5-23/+35
Align the use of local PTT to propagate through the qed_mcp* API's. Global ptt should not be used. Register access should be done through layers. Register address is mapped into a PTT, PF translation table. Several interface functions require a PTT to direct read/write into register. There is a pool of PTT maintained, and several PTT are used simultaneously to access device registers in different flows. Same PTT should not be used in flows that can run concurrently. To avoid running out of PTT resources, too many PTT should not be acquired without releasing them. Every PF has a global PTT, which is used throughout the life of PF, in most important flows for register access. Generic functions acquire the PTT locally and release after the use. This patch aligns the use of Global PTT and Local PTT accordingly. Signed-off-by: Rahul Verma <rahul.verma@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: aquantia: make function aq_fw2x_update_stats staticYueHaibing1-1/+1
Fixes the following sparse warning: drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c:282:5: warning: symbol 'aq_fw2x_update_stats' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16Merge branch 'net-Kernel-side-filtering-for-route-dumps'David S. Miller13-95/+387
David Ahern says: ==================== net: Kernel side filtering for route dumps Implement kernel side filtering of route dumps by protocol (e.g., which routing daemon installed the route), route type (e.g., unicast), table id and nexthop device. iproute2 has been doing this filtering in userspace for years; pushing the filters to the kernel side reduces the amount of data the kernel sends and reduces wasted cycles on both sides processing unwanted data. These initial options provide a huge improvement for efficiently examining routes on large scale systems. v2 - better handling of requests for a specific table. Rather than walking the hash of all tables, lookup the specific table and dump it - refactor mr_rtm_dumproute moving the loop over the table into a helper that can be invoked directly - add hook to return NLM_F_DUMP_FILTERED in DONE message to ensure it is returned even when the dump returns nothing ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net/ipv4: Bail early if user only wants prefix entriesDavid Ahern1-2/+6
Unlike IPv6, IPv4 does not have routes marked with RTF_PREFIX_RT. If the flag is set in the dump request, just return. In the process of this change, move the CLONE check to use the new filter flags. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net/ipv6: Bail early if user only wants cloned entriesDavid Ahern1-2/+5
Similar to IPv4, IPv6 fib no longer contains cloned routes. If a user requests a route dump for only cloned entries, no sense walking the FIB and returning everything. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net/mpls: Handle kernel side filtering of route dumpsDavid Ahern1-5/+28
Update the dump request parsing in MPLS for the non-INET case to enable kernel side filtering. If INET is disabled the only filters that make sense for MPLS are protocol and nexthop device. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: Enable kernel side filtering of route dumpsDavid Ahern6-15/+53
Update parsing of route dump request to enable kernel side filtering. Allow filtering results by protocol (e.g., which routing daemon installed the route), route type (e.g., unicast), table id and nexthop device. These amount to the low hanging fruit, yet a huge improvement, for dumping routes. ip_valid_fib_dump_req is called with RTNL held, so __dev_get_by_index can be used to look up the device index without taking a reference. From there filter->dev is only used during dump loops with the lock still held. Set NLM_F_DUMP_FILTERED in the answer_flags so the user knows the results have been filtered should no entries be returned. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: Plumb support for filtering ipv4 and ipv6 multicast route dumpsDavid Ahern4-12/+74
Implement kernel side filtering of routes by egress device index and table id. If the table id is given in the filter, lookup table and call mr_table_dump directly for it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16ipmr: Refactor mr_rtm_dumprouteDavid Ahern2-33/+61
Move per-table loops from mr_rtm_dumproute to mr_table_dump and export mr_table_dump for dumps by specific table id. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net/mpls: Plumb support for filtering route dumpsDavid Ahern1-1/+41
Implement kernel side filtering of routes by egress device index and protocol. MPLS uses only a single table and route type. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net/ipv6: Plumb support for filtering route dumpsDavid Ahern2-14/+54
Implement kernel side filtering of routes by table id, egress device index, protocol, and route type. If the table id is given in the filter, lookup the table and call fib6_dump_table directly for it. Move the existing route flags check for prefix only routes to the new filter. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net/ipv4: Plumb support for filtering route dumpsDavid Ahern3-13/+39
Implement kernel side filtering of routes by table id, egress device index, protocol and route type. If the table id is given in the filter, lookup the table and call fib_table_dump directly for it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: Add struct for fib dump filterDavid Ahern7-11/+37
Add struct fib_dump_filter for options on limiting which routes are returned in a dump request. The current list is table id, protocol, route type, rtm_flags and nexthop device index. struct net is needed to lookup the net_device from the index. Declare the filter for each route dump handler and plumb the new arguments from dump handlers to ip_valid_fib_dump_req. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16netlink: Add answer_flags to netlink_callbackDavid Ahern2-1/+3
With dump filtering we need a way to ensure the NLM_F_DUMP_FILTERED flag is set on a message back to the user if the data returned is influenced by some input attributes. Normally this can be done as messages are added to the skb, but if the filter results in no data being returned, the user could be confused as to why. This patch adds answer_flags to the netlink_callback allowing dump handlers to set the NLM_F_DUMP_FILTERED at a minimum in the NLMSG_DONE message ensuring the flag gets back to the user. The netlink_callback space is initialized to 0 via a memset in __netlink_dump_start, so init of the new answer_flags is covered. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller54-3659/+4962
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-10-16 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Convert BPF sockmap and kTLS to both use a new sk_msg API and enable sk_msg BPF integration for the latter, from Daniel and John. 2) Enable BPF syscall side to indicate for maps that they do not support a map lookup operation as opposed to just missing key, from Prashant. 3) Add bpftool map create command which after map creation pins the map into bpf fs for further processing, from Jakub. 4) Add bpftool support for attaching programs to maps allowing sock_map and sock_hash to be used from bpftool, from John. 5) Improve syscall BPF map update/delete path for map-in-map types to wait a RCU grace period for pending references to complete, from Daniel. 6) Couple of follow-up fixes for the BPF socket lookup to get it enabled also when IPv6 is compiled as a module, from Joe. 7) Fix a generic-XDP bug to handle the case when the Ethernet header was mangled and thus update skb's protocol and data, from Jesper. 8) Add a missing BTF header length check between header copies from user space, from Wenwen. 9) Minor fixups in libbpf to use __u32 instead u32 types and include proper perf_event.h uapi header instead of perf internal one, from Yonghong. 10) Allow to pass user-defined flags through EXTRA_CFLAGS and EXTRA_LDFLAGS to bpftool's build, from Jiri. 11) BPF kselftest tweaks to add LWTUNNEL to config fragment and to install with_addr.sh script from flow dissector selftest, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: phy: merge phy_start_aneg and phy_start_aneg_privHeiner Kallweit1-18/+3
After commit 9f2959b6b52d ("net: phy: improve handling delayed work") the sync parameter isn't needed any longer in phy_start_aneg_priv(). This allows to merge phy_start_aneg() and phy_start_aneg_priv(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16hv_netvsc: fix vf serial matching with pci slot infoHaiyang Zhang1-4/+11
The VF device's serial number is saved as a string in PCI slot's kobj name, not the slot->number. This patch corrects the netvsc driver, so the VF device can be successfully paired with synthetic NIC. Fixes: 00d7ddba1143 ("hv_netvsc: pair VF based on serial number") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16Merge branch 'tcp-second-round-for-EDT-conversion'David S. Miller10-58/+78
Eric Dumazet says: ==================== tcp: second round for EDT conversion First round of EDT patches left TCP stack in a non optimal state. - High speed flows suffered from loss of performance, addressed by the first patch of this series. - Second patch brings pacing to the current state of networking, since we now reach ~100 Gbit on a single TCP flow. - Third patch implements a mitigation for scheduling delays, like the one we did in sch_fq in the past. - Fourth patch removes one special case in sch_fq for ACK packets. - Fifth patch removes a serious perfomance cost for TCP internal pacing. We should setup the high resolution timer only if really needed. - Sixth patch fixes a typo in BBR. - Last patch is one minor change in cdg congestion control. Neal Cardwell also has a patch series fixing BBR after EDT adoption. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16tcp: cdg: use tcp high resolution clock cacheEric Dumazet1-1/+1
We store in tcp socket a cache of most recent high resolution clock, there is no need to call local_clock() again, since this cache is good enough. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16tcp_bbr: fix typo in bbr_pacing_margin_percentNeal Cardwell1-2/+2
There was a typo in this parameter name. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16tcp: optimize tcp internal pacingEric Dumazet1-15/+16
When TCP implements its own pacing (when no fq packet scheduler is used), it is arming high resolution timer after a packet is sent. But in many cases (like TCP_RR kind of workloads), this high resolution timer expires before the application attempts to write the following packet. This overhead also happens when the flow is ACK clocked and cwnd limited instead of being limited by the pacing rate. This leads to extra overhead (high number of IRQ) Now tcp_wstamp_ns is reserved for the pacing timer only (after commit "tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh"), we can setup the timer only when a packet is about to be sent, and if tcp_wstamp_ns is in the future. This leads to a ~10% performance increase in TCP_RR workloads. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net_sched: sch_fq: no longer use skb_is_tcp_pure_ack()Eric Dumazet1-1/+1
With the new EDT model, sch_fq no longer has to special case TCP pure acks, since their skb->tstamp will allow them being sent without pacing delay. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16tcp: mitigate scheduling jitter in EDT pacing modelEric Dumazet1-6/+13
In commit fefa569a9d4b ("net_sched: sch_fq: account for schedule/timers drifts") we added a mitigation for scheduling jitter in fq packet scheduler. This patch does the same in TCP stack, now it is using EDT model. Note that this mitigation is valid for both external (fq packet scheduler) or internal TCP pacing. This uses the same strategy than the above commit, allowing a time credit of half the packet currently sent. Consider following case : An skb is sent, after an idle period of 300 usec. The air-time (skb->len/pacing_rate) is 500 usec Instead of setting the pacing timer to now+500 usec, it will use now+min(500/2, 300) -> now+250usec This is like having a token bucket with a depth of half an skb. Tested: tc qdisc replace dev eth0 root pfifo_fast Before netperf -P0 -H remote -- -q 1000000000 # 8000Mbit 540000 262144 262144 10.00 7710.43 After : netperf -P0 -H remote -- -q 1000000000 # 8000 Mbit 540000 262144 262144 10.00 7999.75 # Much closer to 8000Mbit target Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>