summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-12-16net: hns: Fixed bug that netdev was opened twiceYonglong Liu1-0/+3
After resetting dsaf to try to repair chip error such as ecc error, the net device will be open if net interface is up. But at this time if there is the users set the net device up with the command ifconfig, the net device will be opened twice consecutively. Function napi_enable was called when open device. And Kernel panic will be occurred if it was called twice consecutively. Such as follow: static inline void napi_enable(struct napi_struct *n) { BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); smp_mb__before_clear_bit(); clear_bit(NAPI_STATE_SCHED, &n->state); } [37255.571996] Kernel panic - not syncing: BUG! [37255.595234] Call trace: [37255.597694] [<ffff80000008ab48>] dump_backtrace+0x0/0x1a0 [37255.603114] [<ffff80000008ad08>] show_stack+0x20/0x28 [37255.608187] [<ffff8000009c4944>] dump_stack+0x98/0xb8 [37255.613258] [<ffff8000009c149c>] panic+0x10c/0x26c [37255.618070] [<ffff80000070f134>] hns_nic_net_up+0x30c/0x4e0 [37255.623664] [<ffff80000070f39c>] hns_nic_net_open+0x94/0x12c [37255.629346] [<ffff80000084be78>] __dev_open+0xf4/0x168 [37255.634504] [<ffff80000084c1ac>] __dev_change_flags+0x98/0x15c [37255.640359] [<ffff80000084c29c>] dev_change_flags+0x2c/0x68 [37255.769580] [<ffff8000008dc400>] devinet_ioctl+0x650/0x704 [37255.775086] [<ffff8000008ddc38>] inet_ioctl+0x98/0xb4 [37255.780159] [<ffff800000827b7c>] sock_do_ioctl+0x44/0x84 [37255.785490] [<ffff800000828e04>] sock_ioctl+0x248/0x30c [37255.790737] [<ffff80000026dc6c>] do_vfs_ioctl+0x480/0x618 [37255.796156] [<ffff80000026de94>] SyS_ioctl+0x90/0xa4 [37255.801139] SMP: stopping secondary CPUs [37255.805079] kbox: catch panic event. [37255.809586] collected_len = 128928, LOG_BUF_LEN_LOCAL = 131072 [37255.816103] flush cache 0xffff80003f000000 size 0x800000 [37255.822192] flush cache 0xffff80003f000000 size 0x800000 [37255.828289] flush cache 0xffff80003f000000 size 0x800000 [37255.834378] kbox: no notify die func register. no need to notify [37255.840413] ---[ end Kernel panic - not syncing: BUG! This patchset fix this bug according to the flag NIC_STATE_DOWN. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: hns: Some registers use wrong address according to the datasheet.Yonglong Liu2-127/+127
According to the hip06 datasheet: 1.Six registers use wrong address: RCB_COM_SF_CFG_INTMASK_RING RCB_COM_SF_CFG_RING_STS RCB_COM_SF_CFG_RING RCB_COM_SF_CFG_INTMASK_BD RCB_COM_SF_CFG_BD_RINT_STS DSAF_INODE_VC1_IN_PKT_NUM_0_REG 2.The offset of DSAF_INODE_VC1_IN_PKT_NUM_0_REG should be 0x103C + 0x80 * all_chn_num 3.The offset to show the value of DSAF_INODE_IN_DATA_STP_DISC_0_REG is wrong, so the value of DSAF_INODE_SW_VLAN_TAG_DISC_0_REG will be overwrite These registers are only used in "ethtool -d", so that did not cause ndev to misfunction. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: hns: All ports can not work when insmod hns ko after rmmod.Yonglong Liu2-0/+18
There are two test cases: 1. Remove the 4 modules:hns_enet_drv/hns_dsaf/hnae/hns_mdio, and install them again, must use "ifconfig down/ifconfig up" command pair to bring port to work. This patch calls phy_stop function when init phy to fix this bug. 2. Remove the 2 modules:hns_enet_drv/hns_dsaf, and install them again, all ports can not use anymore, because of the phy devices register failed(phy devices already exists). Phy devices are registered when hns_dsaf installed, this patch removes them when hns_dsaf removed. The two cases are sometimes related, fixing the second case also requires fixing the first case, so fix them together. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: hns: Incorrect offset address used for some registers.Yonglong Liu1-2/+2
According to the hip06 Datasheet: 1. The offset of INGRESS_SW_VLAN_TAG_DISC should be 0x1A00+4*all_chn_num 2. The offset of INGRESS_IN_DATA_STP_DISC should be 0x1A50+4*all_chn_num Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16l2tp: Add protocol field decompressionSam Protsenko1-0/+4
When Protocol Field Compression (PFC) is enabled, the "Protocol" field in PPP packet will be received without leading 0x00. See section 6.5 in RFC 1661 for details. So let's decompress protocol field if needed, the same way it's done in drivers/net/ppp/pptp.c. In case when "nopcomp" pppd option is not enabled, PFC (pcomp) can be negotiated during LCP handshake, and L2TP driver in kernel will receive PPP packets with compressed Protocol field, which in turn leads to next error: Protocol Rejected (unsupported protocol 0x2145) because instead of Protocol=0x0021 in PPP packet there will be Protocol=0x21. This patch unwraps it back to 0x0021, which fixes the issue. Sending the compressed Protocol field will be implemented in subsequent patch, this one is self-sufficient. Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16Merge tag 'mlx5e-updates-2018-12-14' of ↵David S. Miller19-174/+1048
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-12-14 (VF Lag) From Aviv Heller, Subsequent patches introduce VF LAG, which provdies load-balancing and high-availability capabilities for VFs associated with different physical ports of the same Connect-X card. This series consists of the following: - mlx5 devcom, driver infrastructure that facilitates operations that involve both core devices (physical functions) of the same card, to synchronize and communicate between two driver instances of the same card. - Infrastructure for TC rule duplication. - Changes to LAG logic to enable its use when SR-IOV is enabled - PFs in switchdev mode is the only mode currently supported. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: clear skb->tstamp in forwarding pathsEric Dumazet2-0/+2
Sergey reported that forwarding was no longer working if fq packet scheduler was used. This is caused by the recent switch to EDT model, since incoming packets might have been timestamped by __net_timestamp() __net_timestamp() uses ktime_get_real(), while fq expects packets using CLOCK_MONOTONIC base. The fix is to clear skb->tstamp in forwarding paths. Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.") Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sergey Matyukevich <geomatsi@gmail.com> Tested-by: Sergey Matyukevich <geomatsi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16Merge branch 'net-mitigate-retpoline-overhead'David S. Miller9-22/+136
Paolo Abeni says: ==================== net: mitigate retpoline overhead The spectre v2 counter-measures, aka retpolines, are a source of measurable overhead[1]. We can partially address that when the function pointer refers to a builtin symbol resorting to a list of tests vs well-known builtin function and direct calls. Experimental results show that replacing a single indirect call via retpoline with several branches and a direct call gives performance gains even when multiple branches are added - 5 or more, as reported in [2]. This may lead to some uglification around the indirect calls. In netconf 2018 Eric Dumazet described a technique to hide the most relevant part of the needed boilerplate with some macro help. This series is a [re-]implementation of such idea, exposing the introduced helpers in a new header file. They are later leveraged to avoid the indirect call overhead in the GRO path, when possible. Overall this gives > 10% performance improvement for UDP GRO benchmark and smaller but measurable for TCP syn flood. The added infra can be used in follow-up patches to cope with retpoline overhead in other points of the networking stack (e.g. at the qdisc layer) and possibly even in other subsystems. v2 -> v3: - fix build error with CONFIG_IPV6=m v1 -> v2: - list explicitly the builtin function names in INDIRECT_CALL_*(), as suggested by Ed Cree - expand the recipients list rfc -> v1: - use branch prediction hints, as suggested by Eric [1] http://vger.kernel.org/netconf2018_files/PaoloAbeni_netconf2018.pdf [2] https://linuxplumbersconf.org/event/2/contributions/99/attachments/98/117/lpc18_paper_af_xdp_perf-v2.pdf ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16udp: use indirect call wrappers for GRO socket lookupPaolo Abeni1-2/+6
This avoids another indirect call for UDP GRO. Again, the test for the IPv6 variant is performed first. v1 -> v2: - adapted to INDIRECT_CALL_ changes Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: use indirect call wrappers at GRO transport layerPaolo Abeni7-15/+61
This avoids an indirect call in the receive path for TCP and UDP packets. TCP takes precedence on UDP, so that we have a single additional conditional in the common case. When IPV6 is build as module, all gro symbols except UDPv6 are builtin, while the latter belong to the ipv6 module, so we need some special care. v1 -> v2: - adapted to INDIRECT_CALL_ changes v2 -> v3: - fix build issue with CONFIG_IPV6=m Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: use indirect call wrappers at GRO network layerPaolo Abeni3-5/+18
This avoids an indirect calls for L3 GRO receive path, both for ipv4 and ipv6, if the latter is not compiled as a module. Note that when IPv6 is compiled as builtin, it will be checked first, so we have a single additional compare for the more common path. v1 -> v2: - adapted to INDIRECT_CALL_ changes Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16indirect call wrappers: helpers to speed-up indirect calls of builtinPaolo Abeni1-0/+51
This header define a bunch of helpers that allow avoiding the retpoline overhead when calling builtin functions via function pointers. It boils down to explicitly comparing the function pointers to known builtin functions and eventually invoke directly the latter. The macros defined here implement the boilerplate for the above schema and will be used by the next patches. rfc -> v1: - use branch prediction hint, as suggested by Eric v1 -> v2: - list explicitly the builtin function names in INDIRECT_CALL_*(), as suggested by Ed Cree Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: socionext: remove mmio reads on TxIlias Apalodimas1-43/+54
Currently the driver issues 2 mmio reads to figure out the number of transmitted packets and clean them. We can get rid of the expensive reads since BIT 31 of the Tx descriptor can be used for that. We can also remove the budget counting of Tx completions since all of the descriptors are not deliberately processed. Performance numbers using pktgen are: size pre-patch(pps) post-patch(pps) 64 362483 427916 128 358315 411686 256 352725 389683 512 215675 216464 1024 113812 114442 Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16net: socionext: correctly recover txq after being fullIlias Apalodimas1-11/+45
Running pktgen with packets sizes > 512b ends up in the interface Txq getting stuck. "netsec 522d0000.ethernet eth0: netsec_netdev_start_xmit: TxQFull!" appears on dmesg but the interface never recovers. It requires an ifconfig down/up to make the interface usable again. The reason that triggers this, is a race condition between .ndo_start_xmit and the napi completion. The available budget is calculated first and indicates the queue is full. Due to a costly netif_err() the queue is not stopped in time while the napi completion runs, clears the irq and frees up descriptors, thus the queue never wakes up again. Fix this by moving the print after stopping the queue, make the print ratelimited, add barriers and check for cleaned descriptors.. Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16dt-bindings: net: ravb: Add support for r8a774c0 SoCFabrizio Castro1-0/+1
Document RZ/G2E (R8A774C0) SoC bindings. Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15mod_devicetable.h: correct kerneldoc typo, "PHYSID2" -> "MII_PHYSID2"Robert P. J. Day1-1/+1
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: ipv4: do not handle duplicate fragments as overlappingMichal Kubecek1-6/+12
Since commit 7969e5c40dfd ("ip: discard IPv4 datagrams with overlapping segments.") IPv4 reassembly code drops the whole queue whenever an overlapping fragment is received. However, the test is written in a way which detects duplicate fragments as overlapping so that in environments with many duplicate packets, fragmented packets may be undeliverable. Add an extra test and for (potentially) duplicate fragment, only drop the new fragment rather than the whole queue. Only starting offset and length are checked, not the contents of the fragments as that would be too expensive. For similar reason, linear list ("run") of a rbtree node is not iterated, we only check if the new fragment is a subset of the interval covered by existing consecutive fragments. v2: instead of an exact check iterating through linear list of an rbtree node, only check if the new fragment is subset of the "run" (suggested by Eric Dumazet) Fixes: 7969e5c40dfd ("ip: discard IPv4 datagrams with overlapping segments.") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15neighbor: Improve neighbour struct layoutDavid Ahern1-2/+2
Move arp_queue_len_bytes ahead of arp_queue to remove two 4-byte holes. Ensure ha element is always 8-byte aligned. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15qmi_wwan: Added support for Telit LN940 seriesJörgen Storvist1-0/+1
Added support for the Telit LN940 series cellular modules QMI interface. QMI_QUIRK_SET_DTR quirk requied for Qualcomm MDM9x40 chipset. Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: sched: simplify the qdisc_leaf codeTonghao Zhang1-3/+1
Except for returning, the var leaf is not used in the qdisc_leaf(). For simplicity, remove it. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15ipv6: Fix handling of LLA with VRF and sockets bound to VRFDavid Ahern1-1/+2
A recent commit allows sockets bound to a VRF to receive ipv6 link local packets. However, it only works for UDP and worse TCP connection attempts to the LLA with the only listener bound to the VRF just hang where as before the client gets a reset and connection refused. Fix by adjusting ir_iif for LL addresses and packets received through a device enslaved to a VRF. Fixes: 6f12fa775530 ("vrf: mark skb for multicast or link-local as enslaved to VRF") Reported-by: Donald Sharp <sharpd@cumulusnetworks.com> Cc: Mike Manning <mmanning@vyatta.att-mail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15qmi_wwan: Added support for Fibocom NL668 seriesJörgen Storvist1-0/+1
Added support for Fibocom NL668 series QMI interface. Using QMI_QUIRK_SET_DTR required for Qualcomm MDM9x07 chipsets. Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15r8169: improve spurious interrupt detectionHeiner Kallweit1-1/+2
Improve detection of spurious interrupts by checking against the interrupt mask as currently set in the chip. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15cxgb4: remove DEFINE_SIMPLE_DEBUGFS_FILE()Yangtao Li3-117/+25
We already have the DEFINE_SHOW_ATTRIBUTE. There is no need to define such a macro, so remove DEFINE_SIMPLE_DEBUGFS_FILE. Also use the DEFINE_SHOW_ATTRIBUTE macro to simplify some code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15ipconfig: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li1-12/+1
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller10-39/+109
Alexei Starovoitov says: ==================== pull-request: bpf 2018-12-15 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix liveness propagation of callee saved registers, from Jakub. 2) fix overflow in bpf_jit_limit knob, from Daniel. 3) bpf_flow_dissector api fix, from Stanislav. 4) bpf_perf_event api fix on powerpc, from Sandipan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15Merge branch 'hns3-Add-more-commands-to-Debugfs-in-HNS3-driver'David S. Miller11-5/+1366
Salil Mehta says: ==================== net: hns3: Add more commands to Debugfs in HNS3 driver This patch-set adds few more debugfs commands to HNS3 Ethernet Driver. Support has been added to query info related to below items: 1. Packet buffer descriptor ("echo bd info [queue no] [bd index] > cmd") 2. Manager table("echo dump mng tbl > cmd") 3. Dfx status register("echo dump reg ssu [prt id] > cmd") 4. Dcb status register("echo dump reg dcb [port id] > cmd") 5. Queue map ("echo queue map [queue no] > cmd") 6. Tm map ("echo tm map [queue no] > cmd") NOTE: Above commands are *read-only* and are only intended to query the information from the SoC(and dump inside the kernel, for now) and in no way tries to perform write operations for the purpose of configuration etc. Change Log: V1-->V2: 1. Addressed the GCC-8.2 compiler issue reported by David S. Miller. Link: https://lkml.org/lkml/2018/12/14/1298 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: hns3: Add "tm map" status information query functionliuzhongzhu4-0/+93
This patch prints dcb register status information by module. debugfs command: root@(none)# echo dump tm map 100 > cmd queue_id | qset_id | pri_id | tc_id 0100 | 0065 | 08 | 00 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: hns3: Add "queue map" information query functionliuzhongzhu7-2/+76
This patch prints queue map information. debugfs command: echo dump queue map > cmd Sample Command: root@(none)# echo queue map > cmd local queue id | global queue id | vector id 0 32 769 1 33 770 2 34 771 3 35 772 4 36 773 5 37 774 6 38 775 7 39 776 8 40 777 9 41 778 10 42 779 11 43 780 12 44 781 13 45 782 14 46 783 15 47 784 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: hns3: Add "dcb register" status information query functionliuzhongzhu4-0/+134
This patch prints dcb register status information by module. debugfs command: root@(none)# echo dump reg dcb > cmd roce_qset_mask: 0x0 nic_qs_mask: 0x0 qs_shaping_pass: 0x0 qs_bp_sts: 0x0 pri_mask: 0x0 pri_cshaping_pass: 0x0 pri_pshaping_pass: 0x0 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: hns3: Add "status register" information query functionliuzhongzhu4-0/+885
This patch prints status register information by module. debugfs command: echo dump reg [mode name] > cmd Sample Command: root@(none)# echo dump reg bios common > cmd BP_CPU_STATE: 0x0 DFX_MSIX_INFO_NIC_0: 0xc000 DFX_MSIX_INFO_NIC_1: 0xf DFX_MSIX_INFO_NIC_2: 0x2 DFX_MSIX_INFO_NIC_3: 0x2 DFX_MSIX_INFO_ROC_0: 0xc000 DFX_MSIX_INFO_ROC_1: 0x0 DFX_MSIX_INFO_ROC_2: 0x0 DFX_MSIX_INFO_ROC_3: 0x0 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: hns3: Add "manager table" information query functionliuzhongzhu4-2/+100
This patch prints manager table information. debugfs command: echo dump mng tbl > cmd Sample Command: root@(none)# echo dump mng tbl > cmd entry|mac_addr |mask|ether|mask|vlan|mask|i_map|i_dir|e_type 00 |01:00:5e:00:00:01|0 |00000|0 |0000|0 |00 |00 |0 01 |c2:f1:c5:82:68:17|0 |00000|0 |0000|0 |00 |00 |0 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: hns3: Add "bd info" query functionliuzhongzhu1-1/+78
This patch prints Sending and receiving package descriptor information. debugfs command: echo dump bd info 1 > cmd Sample Command: root@(none)# echo bd info 1 > cmd hns3 0000:7d:00.0: TX Queue Num: 0, BD Index: 0 hns3 0000:7d:00.0: (TX) addr: 0x0 hns3 0000:7d:00.0: (TX)vlan_tag: 0 hns3 0000:7d:00.0: (TX)send_size: 0 hns3 0000:7d:00.0: (TX)vlan_tso: 0 hns3 0000:7d:00.0: (TX)l2_len: 0 hns3 0000:7d:00.0: (TX)l3_len: 0 hns3 0000:7d:00.0: (TX)l4_len: 0 hns3 0000:7d:00.0: (TX)vlan_tag: 0 hns3 0000:7d:00.0: (TX)tv: 0 hns3 0000:7d:00.0: (TX)vlan_msec: 0 hns3 0000:7d:00.0: (TX)ol2_len: 0 hns3 0000:7d:00.0: (TX)ol3_len: 0 hns3 0000:7d:00.0: (TX)ol4_len: 0 hns3 0000:7d:00.0: (TX)paylen: 0 hns3 0000:7d:00.0: (TX)vld_ra_ri: 0 hns3 0000:7d:00.0: (TX)mss: 0 hns3 0000:7d:00.0: RX Queue Num: 0, BD Index: 120 hns3 0000:7d:00.0: (RX)addr: 0xffee7000 hns3 0000:7d:00.0: (RX)pkt_len: 0 hns3 0000:7d:00.0: (RX)size: 0 hns3 0000:7d:00.0: (RX)rss_hash: 0 hns3 0000:7d:00.0: (RX)fd_id: 0 hns3 0000:7d:00.0: (RX)vlan_tag: 0 hns3 0000:7d:00.0: (RX)o_dm_vlan_id_fb: 0 hns3 0000:7d:00.0: (RX)ot_vlan_tag: 0 hns3 0000:7d:00.0: (RX)bd_base_info: 0 Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15Merge branch 'net-prefer-listeners-bound-to-an-address'David S. Miller8-217/+325
Peter Oskolkov says: ==================== net: prefer listeners bound to an address A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patchset eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In a future patchset I plan to explore whether it is possible to remove port-only hashtables completely: additional refactoring will be required, as some non-lookup code uses the hashtables. ==================== Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15selftests: net: test that listening sockets match on address properlyPeter Oskolkov4-2/+271
This patch adds a selftest that verifies that a socket listening on a specific address is chosen in preference over sockets that listen on any address. The test covers UDP/UDP6/TCP/TCP6. It is based on, and similar to, reuseport_dualstack.c selftest. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: tcp6: prefer listeners bound to an addressPeter Oskolkov1-48/+6
A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patch eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In addition, compute_score() is tweaked to _not_ match addr_any sockets to specific addresses, as hash collisions could result in the unwanted behavior described above. Tested: the patch compiles; full test in the last patch in this patchset. Existing reuseport_* selftests also pass. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: tcp: prefer listeners bound to an addressPeter Oskolkov1-52/+8
A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patch eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In addition, compute_score() is tweaked to _not_ match addr_any sockets to specific addresses, as hash collisions could result in the unwanted behavior described above. Tested: the patch compiles; full test in the last patch in this patchset. Existing reuseport_* selftests also pass. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: udp6: prefer listeners bound to an addressPeter Oskolkov1-58/+21
A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patch eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In addition, compute_score() is tweaked to _not_ match addr_any sockets to specific addresses, as hash collisions could result in the unwanted behavior described above. Tested: the patch compiles; full test in the last patch in this patchset. Existing reuseport_* selftests also pass. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: udp: prefer listeners bound to an addressPeter Oskolkov1-57/+19
A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patch eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In addition, compute_score() is tweaked to _not_ match addr_any sockets to specific addresses, as hash collisions could result in the unwanted behavior described above. Tested: the patch compiles; full test in the last patch in this patchset. Existing reuseport_* selftests also pass. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15add snmp counters documentyupeng1-1/+244
Add explainations for some general IP counters, SACK and DSACK related counters Signed-off-by: yupeng <yupeng0921@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15tipc: check tsk->group in tipc_wait_for_cond()Cong Wang1-11/+14
tipc_wait_for_cond() drops socket lock before going to sleep, but tsk->group could be freed right after that release_sock(). So we have to re-check and reload tsk->group after it wakes up. After this patch, tipc_wait_for_cond() returns -ERESTARTSYS when tsk->group is NULL, instead of continuing with the assumption of a non-NULL tsk->group. (It looks like 'dsts' should be re-checked and reloaded too, but it is a different bug.) Similar for tipc_send_group_unicast() and tipc_send_group_anycast(). Reported-by: syzbot+10a9db47c3a0e13eb31c@syzkaller.appspotmail.com Fixes: b7d42635517f ("tipc: introduce flow control for group broadcast messages") Fixes: ee106d7f942d ("tipc: introduce group anycast messaging") Fixes: 27bd9ec027f3 ("tipc: introduce group unicast messaging") Cc: Ying Xue <ying.xue@windriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15Merge branch 'neighbor-More-gc_list-changes'David S. Miller2-52/+60
David Ahern says: ==================== neighbor: More gc_list changes More gc_list changes and cleanups. The first 2 patches are bug fixes from the first gc_list change. Specifically, fix the locking order to be consistent - table lock followed by neighbor lock, and then entries in the FAILED state should always be candidates for forced_gc without waiting for any time span (return to the eviction logic prior to the separate gc_list). Patch 3 removes 2 now unnecessary arguments to neigh_del. Patch 4 moves a helper from a header file to core code in preparation for Patch 5 which removes NTF_EXT_LEARNED entries from the gc_list. These entries are already exempt from forced_gc; patch 5 removes them from consideration and makes them on par with PERMANENT entries given that they are also managed by userspace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15neighbor: Remove externally learned entries from gc_listDavid Ahern1-21/+28
Externally learned entries are similar to PERMANENT entries in the sense they are managed by userspace and can not be garbage collected. As such remove them from the gc_list, remove the flags check from neigh_forced_gc and skip threshold checks in neigh_alloc. As with PERMANENT entries, this allows unlimited number of NTF_EXT_LEARNED entries. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15neighbor: Move neigh_update_ext_learned to core fileDavid Ahern2-18/+18
neigh_update_ext_learned has one caller in neighbour.c so does not need to be defined in the header. Move it and in the process remove the intialization of ndm_flags and just set it based on the flags check. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15neighbor: Remove state and flags arguments to neigh_delDavid Ahern1-5/+4
neigh_del now only has 1 caller, and the state and flags arguments are both 0. Remove them and simplify neigh_del. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15neighbor: Fix state check in neigh_forced_gcDavid Ahern1-3/+2
PERMANENT entries are not on the gc_list so the state check is now redundant. Also, the move to not purge entries until after 5 seconds should not apply to FAILED entries; those can be removed immediately to make way for newer ones. This restores the previous logic prior to the gc_list. Fixes: 58956317c8de ("neighbor: Improve garbage collection") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15neighbor: Fix locking order for gc_list changesDavid Ahern1-12/+15
Lock checker noted an inverted lock order between neigh_change_state (neighbor lock then table lock) and neigh_periodic_work (table lock and then neighbor lock) resulting in: [ 121.057652] ====================================================== [ 121.058740] WARNING: possible circular locking dependency detected [ 121.059861] 4.20.0-rc6+ #43 Not tainted [ 121.060546] ------------------------------------------------------ [ 121.061630] kworker/0:2/65 is trying to acquire lock: [ 121.062519] (____ptrval____) (&n->lock){++--}, at: neigh_periodic_work+0x237/0x324 [ 121.063894] [ 121.063894] but task is already holding lock: [ 121.064920] (____ptrval____) (&tbl->lock){+.-.}, at: neigh_periodic_work+0x194/0x324 [ 121.066274] [ 121.066274] which lock already depends on the new lock. [ 121.066274] [ 121.067693] [ 121.067693] the existing dependency chain (in reverse order) is: ... Fix by renaming neigh_change_state to neigh_update_gc_list, changing it to only manage whether an entry should be on the gc_list and taking locks in the same order as neigh_periodic_work. Invoke at the end of neigh_update only if diff between old or new states has the PERMANENT flag set. Fixes: 8cc196d6ef86 ("neighbor: gc_list changes should be protected by table lock") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: Allow class-e address assignment via ifconfig ioctlDave Taht3-5/+12
While most distributions long ago switched to the iproute2 suite of utilities, which allow class-e (240.0.0.0/4) address assignment, distributions relying on busybox, toybox and other forms of ifconfig cannot assign class-e addresses without this kernel patch. While CIDR has been obsolete for 2 decades, and a survey of all the open source code in the world shows the IN_whatever macros are also obsolete... rather than obsolete CIDR from this ioctl entirely, this patch merely enables class-e assignment, sanely. Signed-off-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds16-30/+54
Merge misc fixes from Andrew Morton: "11 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: scripts/spdxcheck.py: always open files in binary mode checkstack.pl: fix for aarch64 userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered fs/iomap.c: get/put the page in iomap_page_create/release() hugetlbfs: call VM_BUG_ON_PAGE earlier in free_huge_page() memblock: annotate memblock_is_reserved() with __init_memblock psi: fix reference to kernel commandline enable arch/sh/include/asm/io.h: provide prototypes for PCI I/O mapping in asm/io.h mm/sparse: add common helper to mark all memblocks present mm: introduce common STRUCT_PAGE_MAX_SHIFT define alpha: fix hang caused by the bootmem removal
2018-12-15ip6mr: Fix potential Spectre v1 vulnerabilityGustavo A. R. Silva1-0/+4
vr.mifi is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/ipv6/ip6mr.c:1845 ip6mr_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap) net/ipv6/ip6mr.c:1919 ip6mr_compat_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap) Fix this by sanitizing vr.mifi before using it to index mrt->vif_table' Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>