summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-02-20Merge tag 'pidfd.v5.17-rc4' of ↵Linus Torvalds1-4/+3
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd fix from Christian Brauner: "This fixes a problem reported by lockdep when installing a pidfd via fd_install() with siglock and the tasklisk write lock held in copy_process() when calling clone()/clone3() with CLONE_PIDFD. Originally a pidfd was created prior to holding any of these locks but this required a call to ksys_close(). So quite some time ago in 6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups") we switched to a get_unused_fd_flags() + fd_install() model. As part of that we moved fd_install() as late as possible. This was done for two main reasons. First, because we needed to ensure that we call fd_install() past the point of no return as once that's called the fd is live in the task's file table. Second, because we tried to ensure that the fd is visible in /proc/<pid>/fd/<pidfd> right when the task is visible. This fix moves the fd_install() to an even later point which means that a task will be visible in proc while the pidfd isn't yet under /proc/<pid>/fd/<pidfd>. While this is a user visible change it's very unlikely that this will have any impact. Nobody should be relying on that and if they do we need to come up with something better but again, it's doubtful this is relevant" * tag 'pidfd.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: copy_process(): Move fd_install() out of sighand->siglock critical section
2022-02-20Merge branch 'ucount-rlimit-fixes-for-v5.17' of ↵Linus Torvalds4-19/+23
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucounts fixes from Eric Biederman: "Michal Koutný recently found some bugs in the enforcement of RLIMIT_NPROC in the recent ucount rlimit implementation. In this set of patches I have developed a very conservative approach changing only what is necessary to fix the bugs that I can see clearly. Cleanups and anything that is making the code more consistent can follow after we have the code working as it has historically. The problem is not so much inconsistencies (although those exist) but that it is very difficult to figure out what the code should be doing in the case of RLIMIT_NPROC. All other rlimits are only enforced where the resource is acquired (allocated). RLIMIT_NPROC by necessity needs to be enforced in an additional location, and our current implementation stumbled it's way into that implementation" * 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Handle wrapping in is_ucounts_overlimit ucounts: Move RLIMIT_NPROC handling after set_user ucounts: Base set_cred_ucounts changes on the real user ucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1 rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user
2022-02-20Merge branch 'tcp_drop_reason'David S. Miller6-28/+132
Menglong Dong says: ==================== net: add skb drop reasons to TCP packet receive In the commit c504e5c2f964 ("net: skb: introduce kfree_skb_reason()"), we added the support of reporting the reasons of skb drops to kfree_skb tracepoint. And in this series patches, reasons for skb drops are added to TCP layer (both TCPv4 and TCPv6 are considered). Following functions are processed: tcp_v4_rcv() tcp_v6_rcv() tcp_v4_inbound_md5_hash() tcp_v6_inbound_md5_hash() tcp_add_backlog() tcp_v4_do_rcv() tcp_v6_do_rcv() tcp_rcv_established() tcp_data_queue() tcp_data_queue_ofo() The functions we handled are mostly for packet ingress, as skb drops hardly happens in the egress path of TCP layer. However, it's a little complex for TCP state processing, as I find that it's hard to report skb drop reasons to where it is freed. For example, when skb is dropped in tcp_rcv_state_process(), the reason can be caused by the call of tcp_v4_conn_request(), and it's hard to return a drop reason from tcp_v4_conn_request(). So such cases are skipped for this moment. Following new drop reasons are introduced (what they mean can be see in the document for them): /* SKB_DROP_REASON_TCP_MD5* corresponding to LINUX_MIB_TCPMD5* */ SKB_DROP_REASON_TCP_MD5NOTFOUND SKB_DROP_REASON_TCP_MD5UNEXPECTED SKB_DROP_REASON_TCP_MD5FAILURE SKB_DROP_REASON_SOCKET_BACKLOG SKB_DROP_REASON_TCP_FLAGS SKB_DROP_REASON_TCP_ZEROWINDOW SKB_DROP_REASON_TCP_OLD_DATA SKB_DROP_REASON_TCP_OVERWINDOW /* corresponding to LINUX_MIB_TCPOFOMERGE */ SKB_DROP_REASON_TCP_OFOMERGE Here is a example to get TCP packet drop reasons from ftrace: $ echo 1 > /sys/kernel/debug/tracing/events/skb/kfree_skb/enable $ cat /sys/kernel/debug/tracing/trace $ <idle>-0 [036] ..s1. 647.428165: kfree_skb: skbaddr=000000004d037db6 protocol=2048 location=0000000074cd1243 reason: NO_SOCKET $ <idle>-0 [020] ..s2. 639.676674: kfree_skb: skbaddr=00000000bcbfa42d protocol=2048 location=00000000bfe89d35 reason: PROTO_MEM From the reason 'PROTO_MEM' we can know that the skb is dropped because the memory configured in net.ipv4.tcp_mem is up to the limition. Changes since v2: - remove the 'inline' of tcp_drop() in the 1th patch, as Jakub suggested Changes since v1: - enrich the document for this series patches in the cover letter, as Eric suggested - fix compile warning report by Jakub in the 6th patch - let NO_SOCKET trump the XFRM failure in the 2th and 3th patches ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20net: tcp: use tcp_drop_reason() for tcp_data_queue_ofo()Menglong Dong3-4/+11
Replace tcp_drop() used in tcp_data_queue_ofo with tcp_drop_reason(). Following drop reasons are introduced: SKB_DROP_REASON_TCP_OFOMERGE Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20net: tcp: use tcp_drop_reason() for tcp_data_queue()Menglong Dong3-2/+27
Replace tcp_drop() used in tcp_data_queue() with tcp_drop_reason(). Following drop reasons are introduced: SKB_DROP_REASON_TCP_ZEROWINDOW SKB_DROP_REASON_TCP_OLD_DATA SKB_DROP_REASON_TCP_OVERWINDOW SKB_DROP_REASON_TCP_OLD_DATA is used for the case that end_seq of skb less than the left edges of receive window. (Maybe there is a better name?) Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20net: tcp: use tcp_drop_reason() for tcp_rcv_established()Menglong Dong3-2/+9
Replace tcp_drop() used in tcp_rcv_established() with tcp_drop_reason(). Following drop reasons are added: SKB_DROP_REASON_TCP_FLAGS Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20net: tcp: use kfree_skb_reason() for tcp_v{4,6}_do_rcv()Menglong Dong2-2/+8
Replace kfree_skb() used in tcp_v4_do_rcv() and tcp_v6_do_rcv() with kfree_skb_reason(). Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20net: tcp: add skb drop reasons to tcp_add_backlog()Menglong Dong5-4/+13
Pass the address of drop_reason to tcp_add_backlog() to store the reasons for skb drops when fails. Following drop reasons are introduced: SKB_DROP_REASON_SOCKET_BACKLOG Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20net: tcp: add skb drop reasons to tcp_v{4,6}_inbound_md5_hash()Menglong Dong4-7/+33
Pass the address of drop reason to tcp_v4_inbound_md5_hash() and tcp_v6_inbound_md5_hash() to store the reasons for skb drops when this function fails. Therefore, the drop reason can be passed to kfree_skb_reason() when the skb needs to be freed. Following drop reasons are added: SKB_DROP_REASON_TCP_MD5NOTFOUND SKB_DROP_REASON_TCP_MD5UNEXPECTED SKB_DROP_REASON_TCP_MD5FAILURE SKB_DROP_REASON_TCP_MD5* above correspond to LINUX_MIB_TCPMD5* Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20net: tcp: use kfree_skb_reason() for tcp_v6_rcv()Menglong Dong1-4/+17
Replace kfree_skb() used in tcp_v6_rcv() with kfree_skb_reason(). Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20net: tcp: add skb drop reasons to tcp_v4_rcv()Menglong Dong1-1/+6
Use kfree_skb_reason() for some path in tcp_v4_rcv() that missed before, including: SKB_DROP_REASON_SOCKET_FILTER SKB_DROP_REASON_XFRM_POLICY Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20net: tcp: introduce tcp_drop_reason()Menglong Dong1-2/+8
For TCP protocol, tcp_drop() is used to free the skb when it needs to be dropped. To make use of kfree_skb_reason() and pass the drop reason to it, introduce the function tcp_drop_reason(). Meanwhile, make tcp_drop() an inline call to tcp_drop_reason(). Reviewed-by: Mengen Sun <mengensun@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20Merge branch 'bnxt_en-fixes'David S. Miller6-28/+90
Michael Chan says: ==================== bnxt_en: Bug fixes This series contains bug fixes for FEC reporting, ethtool self test, multicast setup, devlink health reporting and live patching, and a firmware response timeout. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20bnxt_en: Fix devlink fw_activateKalesh AP1-8/+31
To install a livepatch, first flash the package to NVM, and then activate the patch through the "HWRM_FW_LIVEPATCH" fw command. To uninstall a patch from NVM, flash the removal package and then activate it through the "HWRM_FW_LIVEPATCH" fw command. The "HWRM_FW_LIVEPATCH" fw command has to consider following scenarios: 1. no patch in NVM and no patch active. Do nothing. 2. patch in NVM, but not active. Activate the patch currently in NVM. 3. patch is not in NVM, but active. Deactivate the patch. 4. patch in NVM and the patch active. Do nothing. Fix the code to handle these scenarios during devlink "fw_activate". To install and activate a live patch: devlink dev flash pci/0000:c1:00.0 file thor_patch.pkg devlink -f dev reload pci/0000:c1:00.0 action fw_activate limit no_reset To remove and deactivate a live patch: devlink dev flash pci/0000:c1:00.0 file thor_patch_rem.pkg devlink -f dev reload pci/0000:c1:00.0 action fw_activate limit no_reset Fixes: 3c4153394e2c ("bnxt_en: implement firmware live patching") Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20bnxt_en: Increase firmware message response DMA wait timeMichael Chan2-4/+10
When polling for the firmware message response, we first poll for the response message header. Once the valid length is detected in the header, we poll for the valid bit at the end of the message which signals DMA completion. Normally, this poll time for DMA completion is extremely short (0 to a few usec). But on some devices under some rare conditions, it can be up to about 20 msec. Increase this delay to 50 msec and use udelay() for the first 10 usec for the common case, and usleep_range() beyond that. Also, change the error message to include the above delay time when printing the timeout value. Fixes: 3c8c20db769c ("bnxt_en: move HWRM API implementation into separate file") Reviewed-by: Vladimir Olovyannikov <vladimir.olovyannikov@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20bnxt_en: Restore the resets_reliable flag in bnxt_open()Kalesh AP1-2/+15
During ifdown, we call bnxt_inv_fw_health_reg() which will clear both the status_reliable and resets_reliable flags if these registers are mapped. This is correct because a FW reset during ifdown will clear these register mappings. If we detect that FW has gone through reset during the next ifup, we will remap these registers. But during normal ifup with no FW reset, we need to restore the resets_reliable flag otherwise we will not show the reset counter during devlink diagnose. Fixes: 8cc95ceb7087 ("bnxt_en: improve fw diagnose devlink health messages") Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20bnxt_en: Fix incorrect multicast rx mask setting when not requestedPavan Chebbi1-5/+8
We should setup multicast only when net_device flags explicitly has IFF_MULTICAST set. Otherwise we will incorrectly turn it on even when not asked. Fix it by only passing the multicast table to the firmware if IFF_MULTICAST is set. Fixes: 7d2837dd7a32 ("bnxt_en: Setup multicast properly after resetting device.") Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20bnxt_en: Fix occasional ethtool -t loopback test failuresMichael Chan3-1/+9
In the current code, we setup the port to PHY or MAC loopback mode and then transmit a test broadcast packet for the loopback test. This scheme fails sometime if the port is shared with management firmware that can also send packets. The driver may receive the management firmware's packet and the test will fail when the contents don't match the test packet. Change the test packet to use it's own MAC address as the destination and setup the port to only receive it's own MAC address. This should filter out other packets sent by management firmware. Fixes: 91725d89b97a ("bnxt_en: Add PHY loopback to ethtool self-test.") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20bnxt_en: Fix offline ethtool selftest with RDMA enabledMichael Chan2-8/+14
For offline (destructive) self tests, we need to stop the RDMA driver first. Otherwise, the RDMA driver will run into unrecoverable errors when destructive firmware tests are being performed. The irq_re_init parameter used in the half close and half open sequence when preparing the NIC for offline tests should be set to true because the RDMA driver will free all IRQs before the offline tests begin. Fixes: 55fd0cf320c3 ("bnxt_en: Add external loopback test to ethtool selftest.") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Ben Li <ben.li@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20bnxt_en: Fix active FEC reporting to ethtoolSomnath Kotur1-0/+3
ethtool --show-fec <interface> does not show anything when the Active FEC setting in the chip is set to None. Fix it to properly return ETHTOOL_FEC_OFF in that case. Fixes: 8b2775890ad8 ("bnxt_en: Report FEC settings to ethtool.") Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-20hwmon: (ntc_thermistor) Underscore Samsung thermistorLinus Walleij1-1/+1
The sysfs does not like that we name the thermistor something that contains a dash: ntc-thermistor thermistor: hwmon: 'ssg1404-001221' is not a valid name attribute, please fix Fix it up by switching to an underscore. Fixes: e13e979b2b3d ("hwmon: (ntc_thermistor) Add Samsung 1404-001221 NTC") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20220205005804.123245-1-linus.walleij@linaro.org Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-02-20netfilter: nf_tables_offload: incorrect flow offload action array sizePablo Neira Ayuso6-5/+26
immediate verdict expression needs to allocate one slot in the flow offload action array, however, immediate data expression does not need to do so. fwd and dup expression need to allocate one slot, this is missing. Add a new offload_action interface to report if this expression needs to allocate one slot in the flow offload action array. Fixes: be2861dc36d7 ("netfilter: nft_{fwd,dup}_netdev: add offload support") Reported-and-tested-by: Nick Gregory <Nick.Gregory@Sophos.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-20net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't heldVladimir Oltean2-7/+20
If the DSA master doesn't support IFF_UNICAST_FLT, then the following call path is possible: dsa_slave_switchdev_event_work -> dsa_port_host_fdb_add -> dev_uc_add -> __dev_set_rx_mode -> __dev_set_promiscuity Since the blamed commit, dsa_slave_switchdev_event_work() no longer holds rtnl_lock(), which triggers the ASSERT_RTNL() from __dev_set_promiscuity(). Taking rtnl_lock() around dev_uc_add() is impossible, because all the code paths that call dsa_flush_workqueue() do so from contexts where the rtnl_mutex is already held - so this would lead to an instant deadlock. dev_uc_add() in itself doesn't require the rtnl_mutex for protection. There is this comment in __dev_set_rx_mode() which assumes so: /* Unicast addresses changes may only happen under the rtnl, * therefore calling __dev_set_promiscuity here is safe. */ but it is from commit 4417da668c00 ("[NET]: dev: secondary unicast address support") dated June 2007, and in the meantime, commit f1f28aa3510d ("netdev: Add addr_list_lock to struct net_device."), dated July 2008, has added &dev->addr_list_lock to protect this instead of the global rtnl_mutex. Nonetheless, __dev_set_promiscuity() does assume rtnl_mutex protection, but it is the uncommon path of what we typically expect dev_uc_add() to do. So since only the uncommon path requires rtnl_lock(), just check ahead of time whether dev_uc_add() would result into a call to __dev_set_promiscuity(), and handle that condition separately. DSA already configures the master interface to be promiscuous if the tagger requires this. We can extend this to also cover the case where the master doesn't handle dev_uc_add() (doesn't support IFF_UNICAST_FLT), and on the premise that we'd end up making it promiscuous during operation anyway, either if a DSA slave has a non-inherited MAC address, or if the bridge notifies local FDB entries for its own MAC address, the address of a station learned on a foreign port, etc. Fixes: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19net: prestera: acl: fix 'client_map' buff overflowVolodymyr Mytnyk1-1/+1
smatch warnings: drivers/net/ethernet/marvell/prestera/prestera_acl.c:103 prestera_acl_chain_to_client() error: buffer overflow 'client_map' 3 <= 3 prestera_acl_chain_to_client(u32 chain_index, ...) ... u32 client_map[] = { PRESTERA_HW_COUNTER_CLIENT_LOOKUP_0, PRESTERA_HW_COUNTER_CLIENT_LOOKUP_1, PRESTERA_HW_COUNTER_CLIENT_LOOKUP_2 }; if (chain_index > ARRAY_SIZE(client_map)) ... Fixes: fa5d824ce5dd ("net: prestera: acl: add multi-chain support offload") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19net: dsa: microchip: add ksz8563 to ksz9477 I2C driverAhmad Fatoum1-0/+1
The KSZ9477 SPI driver already has support for the KSZ8563. The same switch chip can also be managed via i2c and we have an KSZ9477 I2C driver, but that one lacks the relevant compatible entry. Add it. DT bindings already describe this compatible. Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19net/smc: unlock on error paths in __smc_setsockopt()Dan Carpenter1-4/+8
These two error paths need to release_sock(sk) before returning. Fixes: a6a6fe27bab4 ("net/smc: Dynamic control handshake limitation by socket options") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19net: dsa: microchip: ksz9477: export HW stats over stats64 interfaceOleksij Rempel3-0/+104
Provide access to HW offloaded packets over stats64 interface. The rx/tx_bytes values needed some fixing since HW is accounting size of the Ethernet frame together with FCS. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19Merge branch 'phylink-remove-pcs_poll'David S. Miller4-10/+1
Russell King says: ==================== net: phylink: remove pcs_poll This small series removes the now unused pcs_poll members from DSA and phylink. "git grep pcs_poll drivers/net/ net/" on net-next confirms that the only places that reference this are in DSA core code and phylink code: drivers/net/phy/phylink.c: if (pl->config->pcs_poll || pcs->poll) drivers/net/phy/phylink.c: poll |= pl->config->pcs_poll; net/dsa/port.c: dp->pl_config.pcs_poll = ds->pcs_poll; ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19net: phylink: remove phylink_config's pcs_pollRussell King (Oracle)2-4/+1
phylink_config's pcs_poll is no longer used, let's get rid of it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19net: dsa: remove pcs_pollRussell King (Oracle)2-6/+0
With drivers converted over to using phylink PCS, there is no need for the struct dsa_switch member "pcs_poll" to exist anymore - there is a flag in the struct phylink_pcs which indicates whether this PCS needs to be polled which supersedes this. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19net: hsr: fix suspicious RCU usage warning in hsr_node_get_first()Juhee Kang2-7/+11
When hsr_create_self_node() calls hsr_node_get_first(), the suspicious RCU usage warning is occurred. The reason why this warning is raised is the callers of hsr_node_get_first() use rcu_read_lock_bh() and other different synchronization mechanisms. Thus, this patch solved by replacing rcu_dereference() with rcu_dereference_bh_check(). The kernel test robot reports: [ 50.083470][ T3596] ============================= [ 50.088648][ T3596] WARNING: suspicious RCU usage [ 50.093785][ T3596] 5.17.0-rc3-next-20220208-syzkaller #0 Not tainted [ 50.100669][ T3596] ----------------------------- [ 50.105513][ T3596] net/hsr/hsr_framereg.c:34 suspicious rcu_dereference_check() usage! [ 50.113799][ T3596] [ 50.113799][ T3596] other info that might help us debug this: [ 50.113799][ T3596] [ 50.124257][ T3596] [ 50.124257][ T3596] rcu_scheduler_active = 2, debug_locks = 1 [ 50.132368][ T3596] 2 locks held by syz-executor.0/3596: [ 50.137863][ T3596] #0: ffffffff8d3357e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x3be/0xb80 [ 50.147470][ T3596] #1: ffff88807ec9d5f0 (&hsr->list_lock){+...}-{2:2}, at: hsr_create_self_node+0x225/0x650 [ 50.157623][ T3596] [ 50.157623][ T3596] stack backtrace: [ 50.163510][ T3596] CPU: 1 PID: 3596 Comm: syz-executor.0 Not tainted 5.17.0-rc3-next-20220208-syzkaller #0 [ 50.173381][ T3596] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ 50.183623][ T3596] Call Trace: [ 50.186904][ T3596] <TASK> [ 50.189844][ T3596] dump_stack_lvl+0xcd/0x134 [ 50.194640][ T3596] hsr_node_get_first+0x9b/0xb0 [ 50.199499][ T3596] hsr_create_self_node+0x22d/0x650 [ 50.204688][ T3596] hsr_dev_finalize+0x2c1/0x7d0 [ 50.209669][ T3596] hsr_newlink+0x315/0x730 [ 50.214113][ T3596] ? hsr_dellink+0x130/0x130 [ 50.218789][ T3596] ? rtnl_create_link+0x7e8/0xc00 [ 50.223803][ T3596] ? hsr_dellink+0x130/0x130 [ 50.228397][ T3596] __rtnl_newlink+0x107c/0x1760 [ 50.233249][ T3596] ? rtnl_setlink+0x3c0/0x3c0 [ 50.238043][ T3596] ? is_bpf_text_address+0x77/0x170 [ 50.243362][ T3596] ? lock_downgrade+0x6e0/0x6e0 [ 50.248219][ T3596] ? unwind_next_frame+0xee1/0x1ce0 [ 50.253605][ T3596] ? entry_SYSCALL_64_after_hwframe+0x44/0xae [ 50.259669][ T3596] ? __sanitizer_cov_trace_cmp4+0x1c/0x70 [ 50.265423][ T3596] ? is_bpf_text_address+0x99/0x170 [ 50.270819][ T3596] ? kernel_text_address+0x39/0x80 [ 50.275950][ T3596] ? __kernel_text_address+0x9/0x30 [ 50.281336][ T3596] ? unwind_get_return_address+0x51/0x90 [ 50.286975][ T3596] ? create_prof_cpu_mask+0x20/0x20 [ 50.292178][ T3596] ? arch_stack_walk+0x93/0xe0 [ 50.297172][ T3596] ? kmem_cache_alloc_trace+0x42/0x2c0 [ 50.302637][ T3596] ? rcu_read_lock_sched_held+0x3a/0x70 [ 50.308194][ T3596] rtnl_newlink+0x64/0xa0 [ 50.312524][ T3596] ? __rtnl_newlink+0x1760/0x1760 [ 50.317545][ T3596] rtnetlink_rcv_msg+0x413/0xb80 [ 50.322631][ T3596] ? rtnl_newlink+0xa0/0xa0 [ 50.327159][ T3596] netlink_rcv_skb+0x153/0x420 [ 50.331931][ T3596] ? rtnl_newlink+0xa0/0xa0 [ 50.336436][ T3596] ? netlink_ack+0xa80/0xa80 [ 50.341095][ T3596] ? netlink_deliver_tap+0x1a2/0xc40 [ 50.346532][ T3596] ? netlink_deliver_tap+0x1b1/0xc40 [ 50.351839][ T3596] netlink_unicast+0x539/0x7e0 [ 50.356633][ T3596] ? netlink_attachskb+0x880/0x880 [ 50.361750][ T3596] ? __sanitizer_cov_trace_const_cmp8+0x1d/0x70 [ 50.368003][ T3596] ? __sanitizer_cov_trace_const_cmp8+0x1d/0x70 [ 50.374707][ T3596] ? __phys_addr_symbol+0x2c/0x70 [ 50.379753][ T3596] ? __sanitizer_cov_trace_cmp8+0x1d/0x70 [ 50.385568][ T3596] ? __check_object_size+0x16c/0x4f0 [ 50.390859][ T3596] netlink_sendmsg+0x904/0xe00 [ 50.395715][ T3596] ? netlink_unicast+0x7e0/0x7e0 [ 50.400722][ T3596] ? __sanitizer_cov_trace_const_cmp4+0x1c/0x70 [ 50.407003][ T3596] ? netlink_unicast+0x7e0/0x7e0 [ 50.412119][ T3596] sock_sendmsg+0xcf/0x120 [ 50.416548][ T3596] __sys_sendto+0x21c/0x320 [ 50.421052][ T3596] ? __ia32_sys_getpeername+0xb0/0xb0 [ 50.426427][ T3596] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 50.432721][ T3596] ? __context_tracking_exit+0xb8/0xe0 [ 50.438188][ T3596] ? lock_downgrade+0x6e0/0x6e0 [ 50.443041][ T3596] ? lock_downgrade+0x6e0/0x6e0 [ 50.447902][ T3596] __x64_sys_sendto+0xdd/0x1b0 [ 50.452759][ T3596] ? lockdep_hardirqs_on+0x79/0x100 [ 50.457964][ T3596] ? syscall_enter_from_user_mode+0x21/0x70 [ 50.464150][ T3596] do_syscall_64+0x35/0xb0 [ 50.468565][ T3596] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 50.474452][ T3596] RIP: 0033:0x7f3148504e1c [ 50.479052][ T3596] Code: fa fa ff ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 20 fb ff ff 48 8b [ 50.498926][ T3596] RSP: 002b:00007ffeab5f2ab0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c [ 50.507342][ T3596] RAX: ffffffffffffffda RBX: 00007f314959d320 RCX: 00007f3148504e1c [ 50.515393][ T3596] RDX: 0000000000000048 RSI: 00007f314959d370 RDI: 0000000000000003 [ 50.523444][ T3596] RBP: 0000000000000000 R08: 00007ffeab5f2b04 R09: 000000000000000c [ 50.531492][ T3596] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 [ 50.539455][ T3596] R13: 00007f314959d370 R14: 0000000000000003 R15: 0000000000000000 Fixes: 4acc45db7115 ("net: hsr: use hlist_head instead of list_head for mac addresses") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-and-tested-by: syzbot+f0eb4f3876de066b128c@syzkaller.appspotmail.com Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19atm: nicstar: Use kcalloc() to simplify codeChristophe JAILLET1-8/+2
Use kcalloc() instead of kmalloc_array() and a loop to set all the values of the array to NULL. While at it, remove a duplicated assignment to 'scq->num_entries'. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19Merge branch 'dpaa2-eth-one-step-register'David S. Miller5-11/+106
Radu Bulie says: ==================== Provide direct access to 1588 one step register DPAA2 MAC supports 1588 one step timestamping. If this option is enabled then for each transmitted PTP event packet, the 1588 SINGLE_STEP register is accessed to modify the following fields: -offset of the correction field inside the PTP packet -UDP checksum update bit, in case the PTP event packet has UDP encapsulation These values can change any time, because there may be multiple PTP clients connected, that receive various 1588 frame types: - L2 only frame - UDP / Ipv4 - UDP / Ipv6 - other The current implementation uses dpni_set_single_step_cfg to update the SINLGE_STEP register. Using an MC command on the Tx datapath for each transmitted 1588 message introduces high delays, leading to low throughput and consequently to a small number of supported PTP clients. Besides these, the nanosecond correction field from the PTP packet will contain the high delay from the driver which together with the originTimestamp will render timestamp values that are unacceptable in a GM clock implementation. This patch series replaces the dpni_set_single_step_cfg function call from the Tx datapath for 1588 messages (when one step timestamping is enabled) with a callback that either implements direct access to the SINGLE_STEP register, eliminating the overhead caused by the MC command that will need to be dispatched by the MC firmware through the MC command portal interface or falls back to the dpni_set_single_step_cfg in case the MC version does not have support for returning the single step register base address. In other words all the delay introduced by dpni_set_single_step_cfg function will be eliminated (if MC version has support for returning the base address of the single step register), improving the egress driver performance for PTP packets when single step timestamping is enabled. The first patch adds a new attribute that contains the base address of the SINGLE_STEP register. It will be used to directly update the register on the Tx datapath. The second patch updates the driver such that the SINGLE_STEP register is either accessed directly if MC version >= 10.32 or is accessed through dpni_set_single_step_cfg command when 1588 messages are transmitted. Changes in v2: - move global function pointer into the driver's private structure in 2/2 - move repetitive code outside the body of the callback functions in 2/2 - update function dpaa2_ptp_onestep_reg_update_method and remove goto statement from non error path in 2/2 Changes in v3: - remove static storage class specifier from within the structure in 2/2 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19dpaa2-eth: Update SINGLE_STEP register accessRadu Bulie2-10/+93
DPAA2 MAC supports 1588 one step timestamping. If this option is enabled then for each transmitted PTP event packet, the 1588 SINGLE_STEP register is accessed to modify the following fields: -offset of the correction field inside the PTP packet -UDP checksum update bit, in case the PTP event packet has UDP encapsulation These values can change any time, because there may be multiple PTP clients connected, that receive various 1588 frame types: - L2 only frame - UDP / Ipv4 - UDP / Ipv6 - other The current implementation uses dpni_set_single_step_cfg to update the SINLGE_STEP register. Using an MC command on the Tx datapath for each transmitted 1588 message introduces high delays, leading to low throughput and consequently to a small number of supported PTP clients. Besides these, the nanosecond correction field from the PTP packet will contain the high delay from the driver which together with the originTimestamp will render timestamp values that are unacceptable in a GM clock implementation. This patch updates the Tx datapath for 1588 messages when single step timestamp is enabled and provides direct access to SINGLE_STEP register, eliminating the overhead caused by the dpni_set_single_step_cfg MC command. MC version >= 10.32 implements this functionality. If the MC version does not have support for returning the single step register base address, the driver will use dpni_set_single_step_cfg command for updates operations. All the delay introduced by dpni_set_single_step_cfg function will be eliminated (if MC version has support for returning the base address of the single step register), improving the egress driver performance for PTP packets when single step timestamping is enabled. Before these changes the maximum throughput for 1588 messages with single step hardware timestamp enabled was around 2000pps. After the updates the throughput increased up to 32.82 Mbps / 46631.02 pps. Signed-off-by: Radu Bulie <radu-andrei.bulie@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19dpaa2-eth: Update dpni_get_single_step_cfg commandRadu Bulie3-1/+13
dpni_get_single_step_cfg is an MC firmware command used for retrieving the contents of SINGLE_STEP 1588 register available in a DPMAC. This patch adds a new version of this command that returns as an extra argument the physical base address of the aforementioned register. The address will be used to directly modify the contents of the SINGLE_STEP register instead of invoking the MC command dpni_set_single_step_cgf. The former approach introduced huge delays on the TX datapath when one step PTP events were transmitted. This led to low throughput and high latencies observed in the PTP correction field. Signed-off-by: Radu Bulie <radu-andrei.bulie@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19net: get rid of rtnl_lock_unregistering()Eric Dumazet1-42/+0
After recent patches, and in particular commits faab39f63c1f ("net: allow out-of-order netdev unregistration") and e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev") we no longer need the barrier implemented in rtnl_lock_unregistering(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19net: dsa: microchip: fix bridging with more than two member portsSvenning Sørensen1-3/+23
Commit b3612ccdf284 ("net: dsa: microchip: implement multi-bridge support") plugged a packet leak between ports that were members of different bridges. Unfortunately, this broke another use case, namely that of more than two ports that are members of the same bridge. After that commit, when a port is added to a bridge, hardware bridging between other member ports of that bridge will be cleared, preventing packet exchange between them. Fix by ensuring that the Port VLAN Membership bitmap includes any existing ports in the bridge, not just the port being added. Fixes: b3612ccdf284 ("net: dsa: microchip: implement multi-bridge support") Signed-off-by: Svenning Sørensen <sss@secomea.com> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19net: prestera: flower: fix destroy tmpl in chainVolodymyr Mytnyk1-9/+19
Fix flower destroy template callback to release template only for specific tc chain instead of all chain tempaltes. The issue was intruduced by previous commit that introduced multi-chain support. Fixes: fa5d824ce5dd ("net: prestera: acl: add multi-chain support offload") Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19bridge: switch br_net_exit to batch modeEric Dumazet1-6/+9
cleanup_net() is competing with other rtnl users. Instead of calling br_net_exit() for each netns, call br_net_exit_batch() once. This gives cleanup_net() ability to group more devices and call unregister_netdevice_many() only once for all bridge devices. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19Merge branch 'mctp-i2c'David S. Miller5-0/+1190
Matt Johnston says: ==================== MCTP I2C driver This patch series adds a netdev driver providing MCTP transport over I2C. I think I've addressed all the points raised in v5. It now has mctp_i2c_unregister() to run things in the correct order, waiting for the worker thread and I2C rx to complete. Cheers, Matt -- v6: - Changed netdev register/unregister/free to avoid races. Ensure that netif functions are not used by irq handler/threads after unregister. - Fix incoming I2C hwaddr that was previously incorrect (left shifted 1 bit) - Add a check that byte_count wire header matches the length received - Renamed I2C driver to mctp-i2c-interface - Removed __func__ from print messages, added missing newlines - Removed sysfs mctp_current_mux file which was used for debug - Renamed curr_lock to sel_lock - Tidied comment formatting - Fix newline in Kconfig v5: - Fix incorrect format string v4: - Switch to __i2c_transfer() rather than __i2c_smbus_xfer(), drop 255 byte smbus patches - Use wait_event_idle() for the sleeping TX thread - Use dev_addr_set() v3: - Added Reviewed-bys for npcm7xx - Resend with net-next open v2: - Simpler Kconfig condition for i2c-mux dependency, from Randy Dunlap ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19mctp i2c: MCTP I2C binding driverMatt Johnston3-0/+1094
Provides MCTP network transport over an I2C bus, as specified in DMTF DSP0237. All messages between nodes are sent as SMBus Block Writes. Each I2C bus to be used for MCTP is flagged in devicetree by a 'mctp-controller' property on the bus node. Each flagged bus gets a mctpi2cX net device created based on the bus number. A 'mctp-i2c-controller' I2C client needs to be added under the adapter. In an I2C mux situation the mctp-i2c-controller node must be attached only to the root I2C bus. The I2C client will handle incoming I2C slave block write data for subordinate busses as well as its own bus. In configurations without devicetree a driver instance can be attached to a bus using the I2C slave new_device mechanism. The MCTP core will hold/release the MCTP I2C device while responses are pending (a 6 second timeout or once a socket is closed, response received etc). While held the MCTP I2C driver will lock the I2C bus so that the correct I2C mux remains selected while responses are received. (Ideally we would just lock the mux to keep the current bus selected for the response rather than a full I2C bus lock, but that isn't exposed in the I2C mux API) Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Wolfram Sang <wsa@kernel.org> # I2C transport parts Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19dt-bindings: net: New binding mctp-i2c-controllerMatt Johnston2-0/+96
Used to define a local endpoint to communicate with MCTP peripherals attached to an I2C bus. This I2C endpoint can communicate with remote MCTP devices on the I2C bus. In the example I2C topology below (matching the second yaml example) we have MCTP devices on busses i2c1 and i2c6. MCTP-supporting busses are indicated by the 'mctp-controller' DT property on an I2C bus node. A mctp-i2c-controller I2C client DT node is placed at the top of the mux topology, since only the root I2C adapter will support I2C slave functionality. .-------. |eeprom | .------------. .------. /'-------' | adapter | | mux --@0,i2c5------' | i2c1 ----.*| --@1,i2c6--.--. |............| \'------' \ \ ......... | mctp-i2c- | \ \ \ .mctpB . | controller | \ \ '.0x30 . | | \ ......... \ '.......' | 0x50 | \ .mctpA . \ ......... '------------' '.0x1d . '.mctpC . '.......' '.0x31 . '.......' (mctpX boxes above are remote MCTP devices not included in the DT at present, they can be hotplugged/probed at runtime. A DT binding for specific fixed MCTP devices could be added later if required) Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19net: Force inlining of checksum functions in net/checksum.hChristophe Leroy1-23/+24
All functions defined as static inline in net/checksum.h are meant to be inlined for performance reason. But since commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") the compiler is allowed to uninline functions when it wants. Fair enough in the general case, but for tiny performance critical checksum helpers that's counter-productive. The problem mainly arises when selecting CONFIG_CC_OPTIMISE_FOR_SIZE, Those helpers being 'static inline' in header files you suddenly find them duplicated many times in the resulting vmlinux. Here is a typical exemple when building powerpc pmac32_defconfig with CONFIG_CC_OPTIMISE_FOR_SIZE. csum_sub() appears 4 times: c04a23cc <csum_sub>: c04a23cc: 7c 84 20 f8 not r4,r4 c04a23d0: 7c 63 20 14 addc r3,r3,r4 c04a23d4: 7c 63 01 94 addze r3,r3 c04a23d8: 4e 80 00 20 blr ... c04a2ce8: 4b ff f6 e5 bl c04a23cc <csum_sub> ... c04a2d2c: 4b ff f6 a1 bl c04a23cc <csum_sub> ... c04a2d54: 4b ff f6 79 bl c04a23cc <csum_sub> ... c04a754c <csum_sub>: c04a754c: 7c 84 20 f8 not r4,r4 c04a7550: 7c 63 20 14 addc r3,r3,r4 c04a7554: 7c 63 01 94 addze r3,r3 c04a7558: 4e 80 00 20 blr ... c04ac930: 4b ff ac 1d bl c04a754c <csum_sub> ... c04ad264: 4b ff a2 e9 bl c04a754c <csum_sub> ... c04e3b08 <csum_sub>: c04e3b08: 7c 84 20 f8 not r4,r4 c04e3b0c: 7c 63 20 14 addc r3,r3,r4 c04e3b10: 7c 63 01 94 addze r3,r3 c04e3b14: 4e 80 00 20 blr ... c04e5788: 4b ff e3 81 bl c04e3b08 <csum_sub> ... c04e65c8: 4b ff d5 41 bl c04e3b08 <csum_sub> ... c0512d34 <csum_sub>: c0512d34: 7c 84 20 f8 not r4,r4 c0512d38: 7c 63 20 14 addc r3,r3,r4 c0512d3c: 7c 63 01 94 addze r3,r3 c0512d40: 4e 80 00 20 blr ... c0512dfc: 4b ff ff 39 bl c0512d34 <csum_sub> ... c05138bc: 4b ff f4 79 bl c0512d34 <csum_sub> ... Restore the expected behaviour by using __always_inline for all functions defined in net/checksum.h vmlinux size is even reduced by 256 bytes with this patch: text data bss dec hex filename 6980022 2515362 194384 9689768 93daa8 vmlinux.before 6979862 2515266 194384 9689512 93d9a8 vmlinux.now Fixes: ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19net: ip6mr: add support for passing full packet on wrong mifMobashshera Rasool2-4/+15
This patch adds support for MRT6MSG_WRMIFWHOLE which is used to pass full packet and real vif id when the incoming interface is wrong. While the RP and FHR are setting up state we need to be sending the registers encapsulated with all the data inside otherwise we lose it. The RP then decapsulates it and forwards it to the interested parties. Currently with WRONGMIF we can only be sending empty register packets and will lose that data. This behaviour can be enabled by using MRT_PIM with val == MRT6MSG_WRMIFWHOLE. This doesn't prevent MRT6MSG_WRONGMIF from happening, it happens in addition to it, also it is controlled by the same throttling parameters as WRONGMIF (i.e. 1 packet per 3 seconds currently). Both messages are generated to keep backwards compatibily and avoid breaking someone who was enabling MRT_PIM with val == 4, since any positive val is accepted and treated the same. Signed-off-by: Mobashshera Rasool <mobash.rasool.linux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19Merge branch '100GbE' of ↵David S. Miller9-23/+39
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-02-18 This series contains updates to ice driver only. Wojciech fixes protocol matching for slow-path switchdev so that all packets are correctly redirected. Michal removes accidental unconditional setting of l4 port filtering flag. Jake adds locking to protect VF reset and removal to fix various issues that can be encountered when they race with each other. Tom Rix propagates an error and initializes a struct to resolve reported Clang issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19i40e: remove dead stores on XSK hotpathAlexander Lobakin1-2/+1
The 'if (ntu == rx_ring->count)' block in i40e_alloc_rx_buffers_zc() was previously residing in the loop, but after introducing the batched interface it is used only to wrap-around the NTU descriptor, thus no more need to assign 'xdp'. 'cleaned_count' in i40e_clean_rx_irq_zc() was previously being incremented in the loop, but after commit f12738b6ec06 ("i40e: remove unnecessary cleaned_count updates") it gets assigned only once after it, so the initialization can be dropped. Fixes: 6aab0bb0c5cd ("i40e: Use the xsk batched rx allocation interface") Fixes: f12738b6ec06 ("i40e: remove unnecessary cleaned_count updates") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19Merge branch 'mptcp-fixes'David S. Miller6-21/+96
Mat Martineau says: ==================== mptcp: Fix address advertisement races and stabilize tests Patches 1, 2, and 7 modify two self tests to give consistent, accurate results by fixing timing issues and accounting for syncookie behavior. Paches 3-6 fix two races in overlapping address advertisement send and receive. Associated self tests are updated, including addition of two MIBs to enable testing and tracking dropped address events. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19selftests: mptcp: be more conservative with cookie MPJ limitsPaolo Abeni1-3/+12
Since commit 2843ff6f36db ("mptcp: remote addresses fullmesh"), an MPTCP client can attempt creating multiple MPJ subflow simultaneusly. In such scenario the server, when syncookies are enabled, could end-up accepting incoming MPJ syn even above the configured subflow limit, as the such limit can be enforced in a reliable way only after the subflow creation. In case of syncookie, only after the 3rd ack reception. As a consequence the related self-tests case sporadically fails, as it verify that the server always accept the expected number of MPJ syn. Address the issues relaxing the MPJ syn number constrain. Note that the check on the accepted number of MPJ 3rd ack still remains intact. Fixes: 2843ff6f36db ("mptcp: remote addresses fullmesh") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19selftests: mptcp: more robust signal race testPaolo Abeni1-3/+12
The in kernel MPTCP PM implementation can process a single incoming add address option at any given time. In the mentioned test the server can surpass such limit. Let the setup cope with that allowing a faster add_addr retransmission. Fixes: a88c9e496937 ("mptcp: do not block subflows creation on errors") Fixes: f7efc7771eac ("mptcp: drop argument port from mptcp_pm_announce_addr") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/254 Reported-and-tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19mptcp: add mibs counter for ignored incoming optionsPaolo Abeni3-2/+10
The MPTCP in kernel path manager has some constraints on incoming addresses announce processing, so that in edge scenarios it can end-up dropping (ignoring) some of such announces. The above is not very limiting in practice since such scenarios are very uncommon and MPTCP will recover due to ADD_ADDR retransmissions. This patch adds a few MIB counters to account for such drop events to allow easier introspection of the critical scenarios. Fixes: f7efc7771eac ("mptcp: drop argument port from mptcp_pm_announce_addr") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>