summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-07-18tools/bpftool: Add name mappings for SK_LOOKUP prog and attach typeJakub Sitnicki4-3/+5
Make bpftool show human-friendly identifiers for newly introduced program and attach type, BPF_PROG_TYPE_SK_LOOKUP and BPF_SK_LOOKUP, respectively. Also, add the new prog type bash-completion, man page and help message. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-14-jakub@cloudflare.com
2020-07-18libbpf: Add support for SK_LOOKUP program typeJakub Sitnicki4-0/+10
Make libbpf aware of the newly added program type, and assign it a section name. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-13-jakub@cloudflare.com
2020-07-18bpf: Sync linux/bpf.h to tools/Jakub Sitnicki1-0/+77
Newly added program, context type and helper is used by tests in a subsequent patch. Synchronize the header file. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-12-jakub@cloudflare.com
2020-07-18udp6: Run SK_LOOKUP BPF program on socket lookupJakub Sitnicki1-9/+51
Same as for udp4, let BPF program override the socket lookup result, by selecting a receiving socket of its choice or failing the lookup, if no connected UDP socket matched packet 4-tuple. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-11-jakub@cloudflare.com
2020-07-18udp6: Extract helper for selecting socket from reuseport groupJakub Sitnicki1-11/+26
Prepare for calling into reuseport from __udp6_lib_lookup as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-10-jakub@cloudflare.com
2020-07-18udp: Run SK_LOOKUP BPF program on socket lookupJakub Sitnicki1-9/+50
Following INET/TCP socket lookup changes, modify UDP socket lookup to let BPF program select a receiving socket before searching for a socket by destination address and port as usual. Lookup of connected sockets that match packet 4-tuple is unaffected by this change. BPF program runs, and potentially overrides the lookup result, only if a 4-tuple match was not found. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-9-jakub@cloudflare.com
2020-07-18udp: Extract helper for selecting socket from reuseport groupJakub Sitnicki1-10/+24
Prepare for calling into reuseport from __udp4_lib_lookup as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-8-jakub@cloudflare.com
2020-07-18inet6: Run SK_LOOKUP BPF program on socket lookupJakub Sitnicki2-0/+74
Following ipv4 stack changes, run a BPF program attached to netns before looking up a listening socket. Program can return a listening socket to use as result of socket lookup, fail the lookup, or take no action. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-7-jakub@cloudflare.com
2020-07-18inet6: Extract helper for selecting socket from reuseport groupJakub Sitnicki1-9/+22
Prepare for calling into reuseport from inet6_lookup_listener as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-6-jakub@cloudflare.com
2020-07-18inet: Run SK_LOOKUP BPF program on socket lookupJakub Sitnicki4-1/+156
Run a BPF program before looking up a listening socket on the receive path. Program selects a listening socket to yield as result of socket lookup by calling bpf_sk_assign() helper and returning SK_PASS code. Program can revert its decision by assigning a NULL socket with bpf_sk_assign(). Alternatively, BPF program can also fail the lookup by returning with SK_DROP, or let the lookup continue as usual with SK_PASS on return, when no socket has been selected with bpf_sk_assign(). This lets the user match packets with listening sockets freely at the last possible point on the receive path, where we know that packets are destined for local delivery after undergoing policing, filtering, and routing. With BPF code selecting the socket, directing packets destined to an IP range or to a port range to a single socket becomes possible. In case multiple programs are attached, they are run in series in the order in which they were attached. The end result is determined from return codes of all the programs according to following rules: 1. If any program returned SK_PASS and selected a valid socket, the socket is used as result of socket lookup. 2. If more than one program returned SK_PASS and selected a socket, last selection takes effect. 3. If any program returned SK_DROP, and no program returned SK_PASS and selected a socket, socket lookup fails with -ECONNREFUSED. 4. If all programs returned SK_PASS and none of them selected a socket, socket lookup continues to htable-based lookup. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-5-jakub@cloudflare.com
2020-07-18inet: Extract helper for selecting socket from reuseport groupJakub Sitnicki1-9/+20
Prepare for calling into reuseport from __inet_lookup_listener as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-4-jakub@cloudflare.com
2020-07-18bpf: Introduce SK_LOOKUP program type with a dedicated attach pointJakub Sitnicki10-4/+312
Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer when looking up a listening socket for a new connection request for connection oriented protocols, or when looking up an unconnected socket for a packet for connection-less protocols. When called, SK_LOOKUP BPF program can select a socket that will receive the packet. This serves as a mechanism to overcome the limits of what bind() API allows to express. Two use-cases driving this work are: (1) steer packets destined to an IP range, on fixed port to a socket 192.0.2.0/24, port 80 -> NGINX socket (2) steer packets destined to an IP address, on any port to a socket 198.51.100.1, any port -> L7 proxy socket In its run-time context program receives information about the packet that triggered the socket lookup. Namely IP version, L4 protocol identifier, and address 4-tuple. Context can be further extended to include ingress interface identifier. To select a socket BPF program fetches it from a map holding socket references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...) helper to record the selection. Transport layer then uses the selected socket as a result of socket lookup. In its basic form, SK_LOOKUP acts as a filter and hence must return either SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should look for a socket to receive the packet, or use the one selected by the program if available, while SK_DROP informs the transport layer that the lookup should fail. This patch only enables the user to attach an SK_LOOKUP program to a network namespace. Subsequent patches hook it up to run on local delivery path in ipv4 and ipv6 stacks. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com
2020-07-18bpf, netns: Handle multiple link attachmentsJakub Sitnicki3-9/+139
Extend the BPF netns link callbacks to rebuild (grow/shrink) or update the prog_array at given position when link gets attached/updated/released. This let's us lift the limit of having just one link attached for the new attach type introduced by subsequent patch. No functional changes intended. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-2-jakub@cloudflare.com
2020-07-18ne2k-pci: Use netif_msg_init to initialize msg_enable bitsArmin Wolf1-3/+6
Use netif_msg_enable() to process param settings. Signed-off-by: Armin Wolf <W_Armin@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-18Merge branch 'net-atlantic-add-support-for-FW-4-x'David S. Miller6-31/+70
Mark Starovoytov says: ==================== net: atlantic: add support for FW 4.x This patch set adds support for FW 4.x, which is about to get into the production for some products. 4.x is mostly compatible with 3.x, save for soft reset, which requires the acquisition of 2 additional semaphores. Other differences (e.g. absence of PTP support) are handled via capabilities. Note: 4.x targets specific products only. 3.x is still the main firmware branch, which should be used by most users (at least for now). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-18net: atlantic: add support for FW 4.xDmitry Bogdanov4-15/+58
This patch adds support for FW 4.x, which is about to get into the production for some products. 4.x is mostly compatible with 3.x, save for soft reset, which requires the acquisition of 2 additional semaphores. Other differences (e.g. absence of PTP support) are handled via capabilities. Note: 4.x targets specific products only. 3.x is still the main firmware branch, which should be used by most users (at least for now). Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-18net: atlantic: align return value of ver_match function with function nameMark Starovoytov3-19/+15
This patch aligns the return value of hw_atl_utils_ver_match function with its name. Change the return type to bool, because it's better aligned with the actual usage. Return true when the version matches, false otherwise. Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-18net: ethernet: et131x: Remove redundant register readMark Einon1-3/+0
Following the removal of an unused variable assignment (remove unused variable 'pm_csr') the associated register read can also go, as the read also occurs in the subsequent et1310_in_phy_coma() call. Signed-off-by: Mark Einon <mark.einon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-18net: ethernet: et131x: Remove unused variable 'pm_csr'Zhang Changzhong1-7/+3
Gcc report warning as follows: drivers/net/ethernet/agere/et131x.c:953:6: warning: variable 'pm_csr' set but not used [-Wunused-but-set-variable] 953 | u32 pm_csr; | ^~~~~~ drivers/net/ethernet/agere/et131x.c:1002:6:warning: variable 'pm_csr' set but not used [-Wunused-but-set-variable] 1002 | u32 pm_csr; | ^~~~~~ drivers/net/ethernet/agere/et131x.c:3446:8: warning: variable 'pm_csr' set but not used [-Wunused-but-set-variable] 3446 | u32 pm_csr; | ^~~~~~ After commit 38df6492eb51 ("et131x: Add PCIe gigabit ethernet driver et131x to drivers/net"), 'pm_csr' is never used in these functions, so removing it to avoid build warning. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Mark Einon <mark.einon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-18net: bna: Remove unused variable 't'Zhang Changzhong1-2/+1
Gcc report warning as follows: drivers/net/ethernet/brocade/bna/bfa_ioc.c:1538:6: warning: variable 't' set but not used [-Wunused-but-set-variable] 1538 | u32 t; | ^ After commit c107ba171f3d ("bna: Firmware Patch Simplification"), 't' is never used, so removing it to avoid build warning. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-18net: bnxt: don't complain if TC flower can't be supportedJakub Kicinski2-6/+6
The fact that NETIF_F_HW_TC is not set should be a sufficient indication to the user that TC offloads are not supported. No need to bother users of older firmware versions with pointless warnings on every boot. Also, since the support is optional, bnxt_init_tc() should not return an error in case FW is old, similarly to how error is not returned when CONFIG_BNXT_FLOWER_OFFLOAD is not set. With that we can add an error message to the caller, to warn about actual unexpected failures. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17Merge tag 'mlx5-updates-2020-07-16' of ↵David S. Miller40-211/+1473
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-07-16 Fixes: 1) Fix build break when CONFIG_XPS is not set 2) Fix missing switch_id for representors Updates: 1) IPsec XFRM RX offloads from Raed and Huy. - Added IPSec RX steering flow tables to NIC RX - Refactoring of the existing FPGA IPSec, to add support for ConnectX IPsec. - RX data path handling for IPSec traffic - Synchronize offloading device ESN with xfrm received SN 2) Parav allows E-Switch to siwtch to switchdev mode directly without the need to go through legacy mode first. 3) From Tariq, Misc updates including: 3.1) indirect calls for RX and XDP handlers 3.2) Make MLX5_EN_TLS non-prompt as it should always be enabled when TLS and MLX5_EN are selected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: alteon: Avoid some useless memsetChristophe JAILLET1-4/+1
Avoid a memset after a call to 'dma_alloc_coherent()'. This is useless since commit 518a2f1925c3 ("dma-mapping: zero memory returned from dma_alloc_*") Replace a kmalloc+memset with a corresponding kzalloc. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: alteon: switch from 'pci_' to 'dma_' APIChristophe JAILLET1-58/+56
The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below and has been hand modified to replace GFP_ with a correct flag. It has been compile tested. When memory is allocated in 'ace_allocate_descriptors()' and 'ace_init()' GFP_KERNEL can be used because both functions are called from the probe function and no lock is acquired. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: sungem: switch from 'pci_' to 'dma_' APIChristophe JAILLET1-26/+27
The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below and has been hand modified to replace GFP_ with a correct flag. It has been compile tested. When memory is allocated in 'gem_init_one()', GFP_KERNEL can be used because it is a probe function and no lock is acquired. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: decnet: af_decnet: Simplify goto loop.Suraj Upadhyay1-8/+5
Replace goto loop with while loop. Signed-off-by: Suraj Upadhyay <usuraj35@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17Merge branch 'tcp-dsack-multi-seg'David S. Miller3-36/+47
Priyaranjan Jha says: ==================== tcp: improve handling of DSACK covering multiple segments Currently, while processing DSACK, we assume DSACK covers only one segment. This leads to significant underestimation of no. of duplicate segments with LRO/GRO. Also, the existing SNMP counters, TCPDSACKRecv and TCPDSACKOfoRecv, make similar assumption for DSACK, which makes them unusable for estimating spurious retransmit rates. This patch series fixes the segment accounting with DSACK, by estimating number of duplicate segments based on: (DSACKed sequence range) / MSS. It also introduces a new SNMP counter, TCPDSACKRecvSegs, which tracks the estimated number of duplicate segments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17tcp: add SNMP counter for no. of duplicate segments reported by DSACKPriyaranjan Jha3-0/+3
There are two existing SNMP counters, TCPDSACKRecv and TCPDSACKOfoRecv, which are incremented depending on whether the DSACKed range is below the cumulative ACK sequence number or not. Unfortunately, these both implicitly assume each DSACK covers only one segment. This makes these counters unusable for estimating spurious retransmit rates, or real/non-spurious loss rate. This patch introduces a new SNMP counter, TCPDSACKRecvSegs, which tracks the estimated number of duplicate segments based on: (DSACKed sequence range) / MSS. This counter is usable for estimating spurious retransmit rates, or real/non-spurious loss rate. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17tcp: fix segment accounting when DSACK range covers multiple segmentsPriyaranjan Jha1-36/+44
Currently, while processing DSACK, we assume DSACK covers only one segment. This leads to significant underestimation of DSACKs with LRO/GRO. This patch fixes segment accounting with DSACK by estimating segment count from DSACK sequence range / MSS. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: sun: cassini: switch from 'pci_' to 'dma_' APIChristophe JAILLET1-50/+54
The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below and has been hand modified to replace GFP_ with a correct flag. It has been compile tested. When memory is allocated in 'cas_tx_tiny_alloc()', GFP_KERNEL can be used because a few lines below in its only caller, 'cas_alloc_rxds()', is also called. This function makes an explicit use of GFP_KERNEL. When memory is allocated in 'cas_init_one()', GFP_KERNEL can be used because it is a probe function and no lock is acquired. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17mptcp: silence warning in subflow_data_ready()Davide Caratti1-2/+3
since commit d47a72152097 ("mptcp: fix race in subflow_data_ready()"), it is possible to observe a regression in MP_JOIN kselftests. For sockets in TCP_CLOSE state, it's not sufficient to just wake up the main socket: we also need to ensure that received data are made available to the reader. Silence the WARN_ON_ONCE() in these cases: it preserves the syzkaller fix and restores kselftests when they are ran as follows: # while true; do > make KBUILD_OUTPUT=/tmp/kselftest TARGETS=net/mptcp kselftest > done Reported-by: Florian Westphal <fw@strlen.de> Fixes: d47a72152097 ("mptcp: fix race in subflow_data_ready()") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/47 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17Merge branch 'usbnet-multicast-filter-support-for-cdc-ncm-devices'David S. Miller4-5/+11
Bjørn Mork says: ==================== usbnet: multicast filter support for cdc ncm devices This revives a 2 year old patch set from Miguel Rodríguez Pérez, which appears to have been lost somewhere along the way. I've based it on the last version I found (v4), and added one patch which I believe must have been missing in the original. I kept Oliver's ack on one of the patches, since both the patch and the motivation still is the same. Hope this is OK.. Thanks to the anonymous user <wxcafe@wxcafe.net> for bringing up this problem in https://bugs.debian.org/965074 This is only build and load tested by me. I don't have any device where I can test the actual functionality. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: cdc_ncm: hook into set_rx_mode to admit multicast trafficMiguel Rodríguez Pérez1-0/+3
We set set_rx_mode to usbnet_cdc_update_filter provided by cdc_ether that simply admits all multicast traffic if there is more than one multicast filter configured. Signed-off-by: Miguel Rodríguez Pérez <miguel@det.uvigo.gal> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: cdc_ncm: add .ndo_set_rx_mode to cdc_ncm_netdev_opsMiguel Rodríguez Pérez1-0/+1
The cdc_ncm driver overrides the net_device_ops structure used by usbnet to be able to hook into .ndo_change_mtu. However, the structure was missing the .ndo_set_rx_mode field, preventing the driver from hooking into usbnet's set_rx_mode. This patch adds the missing callback to usbnet_set_rx_mode in net_device_ops. Signed-off-by: Miguel Rodríguez Pérez <miguel@det.uvigo.gal> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: usbnet: export usbnet_set_rx_mode()Bjørn Mork2-1/+3
This function can be reused by other usbnet minidrivers. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: cdc_ether: export usbnet_cdc_update_filterMiguel Rodríguez Pérez2-1/+3
This makes the function available to other drivers, like cdc_ncm. Signed-off-by: Miguel Rodríguez Pérez <miguel@det.uvigo.gal> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: cdc_ether: use dev->intf to get interface informationMiguel Rodríguez Pérez1-3/+1
usbnet_cdc_update_filter was getting the interface number from the usb_interface struct in cdc_state->control. However, cdc_ncm does not initialize that structure in its bind function, but uses cdc_ncm_ctx instead. Getting intf directly from struct usbnet solves the problem. Signed-off-by: Miguel Rodríguez Pérez <miguel@det.uvigo.gal> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: openvswitch: reorder masks array based on usageEelco Chaudron4-7/+207
This patch reorders the masks array every 4 seconds based on their usage count. This greatly reduces the masks per packet hit, and hence the overall performance. Especially in the OVS/OVN case for OpenShift. Here are some results from the OVS/OVN OpenShift test, which use 8 pods, each pod having 512 uperf connections, each connection sends a 64-byte request and gets a 1024-byte response (TCP). All uperf clients are on 1 worker node while all uperf servers are on the other worker node. Kernel without this patch : 7.71 Gbps Kernel with this patch applied: 14.52 Gbps We also run some tests to verify the rebalance activity does not lower the flow insertion rate, which does not. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Tested-by: Andrew Theurer <atheurer@redhat.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: phy: sfp: Cotsworks SFF module EEPROM fixupChris Healy1-0/+44
Some Cotsworks SFF have invalid data in the first few bytes of the module EEPROM. This results in these modules not being detected as valid modules. Address this by poking the correct EEPROM values into the module EEPROM when the model/PN match and the existing module EEPROM contents are not correct. Signed-off-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: phy: continue searching for C45 MMDs even if first returned ffff:ffffVladimir Oltean1-1/+2
At the time of introduction, in commit bdeced75b13f ("net: dsa: felix: Add PCS operations for PHYLINK"), support for the Lynx PCS inside Felix was relying, for USXGMII support, on the fact that get_phy_device() is able to parse the Lynx PCS "device-in-package" registers for this C45 MDIO device and identify it correctly. However, this was actually working somewhat by mistake (in the sense that, even though it was detected, it was detected for the wrong reasons). The get_phy_c45_ids() function works by iterating through all MMDs starting from 1 (MDIO_MMD_PMAPMD) and stops at the first one which returns a non-zero value in the "device-in-package" register pair, proceeding to see what that non-zero value is. For the Felix PCS, the first MMD (1, for the PMA/PMD) returns a non-zero value of 0xffffffff in the "device-in-package" registers. There is a code branch which is supposed to treat this case and flag it as wrong, and normally, this would have caught my attention when adding initial support for this PCS: if ((devs_in_pkg & 0x1fffffff) == 0x1fffffff) { /* If mostly Fs, there is no device there, then let's probe * MMD 0, as some 10G PHYs have zero Devices In package, * e.g. Cortina CS4315/CS4340 PHY. */ However, this code never actually kicked in, it seems, because this snippet from get_phy_c45_devs_in_pkg() was basically sabotaging itself, by returning 0xfffffffe instead of 0xffffffff: /* Bit 0 doesn't represent a device, it indicates c22 regs presence */ *devices_in_package &= ~BIT(0); Then the rest of the code just carried on thinking "ok, MMD 1 (PMA/PMD) says that there are 31 devices in that package, each having a device id of ffff:ffff, that's perfectly fine, let's go ahead and probe this PHY device". But after cleanup commit 320ed3bf9000 ("net: phy: split devices_in_package"), this got "fixed", and now devs_in_pkg is no longer 0xfffffffe, but 0xffffffff. So now, get_phy_device is returning -ENODEV for the Lynx PCS, because the semantics have remained mostly unchanged: the loop stops at the first MMD that returns a non-zero value, and that is MMD 1. But the Lynx PCS is simply a clause 37 PCS which implements the required MAC-side functionality for USXGMII (when operated in C45 mode, which is where C45 devices-in-package detection is relevant to). Of course it will fail the PMD/PMA test (MMD 1), since it is not a PHY. But it does implement detection for MDIO_MMD_PCS (3): - MDIO_DEVS1=0x008a, MDIO_DEVS2=0x0000, - MDIO_DEVID1=0x0083, MDIO_DEVID2=0xe400 Let get_phy_c45_ids() continue searching for valid MMDs, and don't assume that every phy_device has a PMA/PMD MMD implemented. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17Merge branch 'net-sched-do-not-drop-root-lock-in-tcf_qevent_handle'Jakub Kicinski37-85/+77
Petr Machata says: ==================== net: sched: Do not drop root lock in tcf_qevent_handle() Mirred currently does not mix well with blocks executed after the qdisc root lock is taken. This includes classification blocks (such as in PRIO, ETS, DRR qdiscs) and qevents. The locking caused by the packet mirrored by mirred can cause deadlocks: either when the thread of execution attempts to take the lock a second time, or when two threads end up waiting on each other's locks. The qevent patchset attempted to not introduce further badness of this sort, and dropped the lock before executing the qevent block. However this lead to too little locking and races between qdisc configuration and packet enqueue in the RED qdisc. Before the deadlock issues are solved in a way that can be applied across many qdiscs reasonably easily, do for qevents what is done for the classification blocks and just keep holding the root lock. That is done in patch #1. Patch #2 then drops the now unnecessary root_lock argument from Qdisc_ops.enqueue. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-07-17Revert "net: sched: Pass root lock to Qdisc_ops.enqueue"Petr Machata35-73/+71
This reverts commit aebe4426ccaa4838f36ea805cdf7d76503e65117. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-07-17net: sched: Do not drop root lock in tcf_qevent_handle()Petr Machata3-12/+6
Mirred currently does not mix well with blocks executed after the qdisc root lock is taken. This includes classification blocks (such as in PRIO, ETS, DRR qdiscs) and qevents. The locking caused by the packet mirrored by mirred can cause deadlocks: either when the thread of execution attempts to take the lock a second time, or when two threads end up waiting on each other's locks. The qevent patchset attempted to not introduce further badness of this sort, and dropped the lock before executing the qevent block. However this lead to too little locking and races between qdisc configuration and packet enqueue in the RED qdisc. Before the deadlock issues are solved in a way that can be applied across many qdiscs reasonably easily, do for qevents what is done for the classification blocks and just keep holding the root lock. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-07-17net/mlx5e: CT: Map 128 bits labels to 32 bit map IDEli Britstein3-23/+42
The 128 bits ct_label field is matched using a 32 bit hardware register. As such, only the lower 32 bits of ct_label field are offloaded. Change this logic to support setting and matching higher bits too. Map the 128 bits data to a unique 32 bits ID. Matching is done as exact match of the mapping ID of key & mask. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Maor Dickman <maord@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-17net/mlx5e: Do not request completion on every single UMR WQETariq Toukan1-1/+0
UMR WQEs are posted in bulks, and HW is notified once per a bulk. Reduce the number of completions by requesting such only for the last WQE of the bulk. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-17net/mlx5e: RX, Avoid indirect call in representor CQE handlingTariq Toukan1-1/+4
Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call when/if CONFIG_RETPOLINE=y. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-17net/mlx5e: XDP, Avoid indirect call in TX flowTariq Toukan3-14/+37
Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call when/if CONFIG_RETPOLINE=y. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-17net/mlx5e: IPsec: Add Connect-X IPsec ESN update offload supportRaed Salem1-0/+88
Synchronize offloading device ESN with xfrm received SN by updating an existing IPsec HW context with the new SN. Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-17net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offloadRaed Salem4-4/+88
On receive flow inspect received packets for IPsec offload indication using the cqe, for IPsec offloaded packets propagate offload status and stack handle to stack for further processing. Supported statuses: - Offload ok. - Authentication failure. - Bad trailer indication. Connect-X IPsec does not use mlx5e_ipsec_handle_rx_cqe. For RX only offload, we see the BW gain. Below is the iperf3 performance report on two server of 24 cores Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz with ConnectX6-DX. We use one thread per IPsec tunnel. --------------------------------------------------------------------- Mode | Num tunnel | BW | Send CPU util | Recv CPU util | | (Gbps) | (Average %) | (Average %) --------------------------------------------------------------------- Cryto offload | 1 | 4.6 | 4.2 | 14.5 --------------------------------------------------------------------- Cryto offload | 24 | 38 | 73 | 63 --------------------------------------------------------------------- Non-offload | 1 | 4 | 4 | 13 --------------------------------------------------------------------- Non-offload | 24 | 23 | 52 | 67 Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-17net/mlx5e: IPsec: Add IPsec steering in local NIC RXHuy Nguyen9-5/+630
Introduce decrypt FT, the RX error FT and the default rules. The IPsec RX decrypt flow table is pointed by the TTC (Traffic Type Classifier) ESP steering rules. The decrypt flow table has two flow groups. The first flow group keeps the decrypt steering rule programmed via the "ip xfrm s" interface. The second flow group has a default rule to forward all non-offloaded ESP packet to the TTC ESP default RSS TIR. The RX error flow table is the destination of the decrypt steering rules in the IPsec RX decrypt flow table. It has a fixed rule with single copy action that copies ipsec_syndrome to metadata_regB[0:6]. The IPsec syndrome is used to filter out non-ipsec packet and to return the IPsec crypto offload status in Rx flow. The destination of RX error flow table is the TTC ESP default RSS TIR. All the FTs (decrypt FT and error FT) are created only when IPsec SAs are added. If there is no IPsec SAs, the FTs are removed. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>