summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-02-11mptcp: pm: add id parameter for get_addrGeliang Tang4-30/+26
The address id is parsed both in mptcp_pm_nl_get_addr() and mptcp_userspace_pm_get_addr(), this makes the code somewhat repetitive. So this patch adds a new parameter 'id' for all get_addr() interfaces. The address id is only parsed in mptcp_pm_nl_get_addr_doit(), then pass it to both mptcp_pm_nl_get_addr() and mptcp_userspace_pm_get_addr(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11mptcp: pm: drop skb parameter of get_addrGeliang Tang4-10/+8
The first parameters 'skb' of get_addr() interfaces are now useless since mptcp_userspace_pm_get_sock() helper is used. This patch drops these useless parameters of them. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11mptcp: pm: make three pm wrappers staticGeliang Tang3-22/+20
Three netlink functions: mptcp_pm_nl_get_addr_doit() mptcp_pm_nl_get_addr_dumpit() mptcp_pm_nl_set_flags_doit() are generic, implemented for each PM, in-kernel PM and userspace PM. It's clearer to move them from pm_netlink.c to pm.c. And the linked three path manager wrappers mptcp_pm_get_addr() mptcp_pm_dump_addr() mptcp_pm_set_flags() can be changed as static functions, no need to export them in protocol.h. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11mptcp: pm: use NL_SET_ERR_MSG_ATTR when possibleMatthieu Baerts (NGI0)2-22/+31
Instead of only returning a text message with GENL_SET_ERR_MSG(), NL_SET_ERR_MSG_ATTR() can help the userspace developers by also reporting which attribute is faulty. When the error is specific to an attribute, NL_SET_ERR_MSG_ATTR() is now used. The error messages have not been modified in this commit. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11mptcp: pm: mark missing address attributesMatthieu Baerts (NGI0)2-7/+32
mptcp_pm_parse_entry() will check if the given attribute is defined. If not, it will return a generic error: "missing address info". It might then not be clear for the userspace developer which attribute is missing, especially when the command takes multiple addresses. By using GENL_REQ_ATTR_CHECK(), the userspace will get a hint about which attribute is missing, making thing clearer. Note that this is what was already done for most of the other MPTCP NL commands, this patch simply adds the missing ones. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11mptcp: pm: remove duplicated error messagesMatthieu Baerts (NGI0)1-15/+5
mptcp_pm_parse_entry() and mptcp_pm_parse_addr() will already set a error message in case of parsing issue. Then, no need to override this error message with another less precise one: "error parsing address". Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11mptcp: pm: userspace: use GENL_REQ_ATTR_CHECKGeliang Tang1-22/+19
A more general way to check if MPTCP_PM_ATTR_* exists in 'info' is to use GENL_REQ_ATTR_CHECK(info, MPTCP_PM_ATTR_*) instead of directly reading info->attrs[MPTCP_PM_ATTR_*] and then checking if it's NULL. So this patch uses GENL_REQ_ATTR_CHECK() for userspace PM in mptcp_pm_nl_announce_doit(), mptcp_pm_nl_remove_doit(), mptcp_pm_nl_subflow_create_doit(), mptcp_pm_nl_subflow_destroy_doit() and mptcp_userspace_pm_get_sock(). Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11mptcp: pm: improve error messagesMatthieu Baerts (NGI0)2-3/+13
Some error messages were: - too generic: "missing input", "invalid request" - not precise enough: "limit greater than maximum" but what's the max? - missing: subflow not found, or connect error. This can be easily improved by being more precise, or adding new error messages. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11mptcp: pm: more precise error messagesMatthieu Baerts (NGI0)1-7/+24
Some errors reported by the userspace PM were vague: "this or that is invalid". It is easier for the userspace to know which part is wrong, instead of having to guess that. While at it, in mptcp_userspace_pm_set_flags() move the parsing after the check linked to the local attribute. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11mptcp: pm: userspace: flags: clearer msg if no remote addrMatthieu Baerts (NGI0)1-5/+3
Since its introduction in commit 892f396c8e68 ("mptcp: netlink: issue MP_PRIO signals from userspace PMs"), it was mandatory to specify the remote address, because of the 'if (rem->addr.family == AF_UNSPEC)' check done later one. In theory, this attribute can be optional, but it sounds better to be precise to avoid sending the MP_PRIO on the wrong subflow, e.g. if there are multiple subflows attached to the same local ID. This can be relaxed later on if there is a need to act on multiple subflows with one command. For the moment, the check to see if attr_rem is NULL can be removed, because mptcp_pm_parse_entry() will do this check as well, no need to do that differently here. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11mptcp: pm: drop info of userspace_pm_remove_id_zero_addressGeliang Tang1-7/+8
The only use of 'info' parameter of userspace_pm_remove_id_zero_address() is to set an error message into it. Plus, this helper will only fail when it cannot find any subflows with a local address ID 0. This patch drops this parameter and sets the error message where this function is called in mptcp_pm_nl_remove_doit(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11selftests/net: Add selftest for IPv4 RTM_GETMULTICAST supportYuyang Huang5-1/+59
This change introduces a new selftest case to verify the functionality of dumping IPv4 multicast addresses using the RTM_GETMULTICAST netlink message. The test utilizes the ynl library to interact with the netlink interface and validate that the kernel correctly reports the joined IPv4 multicast addresses. To run the test, execute the following command: $ vng -v --user root --cpus 16 -- \ make -C tools/testing/selftests TARGETS=net \ TEST_PROGS=rtnetlink.py TEST_GEN_PROGS="" run_tests Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Yuyang Huang <yuyanghuang@google.com> Link: https://patch.msgid.link/20250207110836.2407224-2-yuyanghuang@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11netlink: support dumping IPv4 multicast addressesYuyang Huang3-18/+90
Extended RTM_GETMULTICAST to support dumping joined IPv4 multicast addresses, in addition to the existing IPv6 functionality. This allows userspace applications to retrieve both IPv4 and IPv6 multicast addresses through similar netlink command and then monitor future changes by registering to RTNLGRP_IPV4_MCADDR and RTNLGRP_IPV6_MCADDR. Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuyang Huang <yuyanghuang@google.com> Link: https://patch.msgid.link/20250207110836.2407224-1-yuyanghuang@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11net: fec: Refactor MAC reset to functionCsókás, Bence1-27/+25
The core is reset both in `fec_restart()` (called on link-up) and `fec_stop()` (going to sleep, driver remove etc.). These two functions had their separate implementations, which was at first only a register write and a `udelay()` (and the accompanying block comment). However, since then we got soft-reset (MAC disable) and Wake-on-LAN support, which meant that these implementations diverged, often causing bugs. For instance, as of now, `fec_stop()` does not check for `FEC_QUIRK_NO_HARD_RESET`, meaning the MII/RMII mode is cleared on eg. a PM power-down event; and `fec_restart()` missed the refactor renaming the "magic" constant `1` to `FEC_ECR_RESET`. To harmonize current implementations, and eliminate this source of potential future bugs, refactor implementation to a common function. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu> Link: https://patch.msgid.link/20250207121255.161146-2-csokas.bence@prolan.hu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11Merge tag 'batadv-net-pullrequest-20250207' of ↵Paolo Abeni5-50/+91
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - Fix panic during interface removal in BATMAN V, by Andy Strohman - Cleanup BATMAN V/ELP metric handling, by Sven Eckelmann (2 patches) - Fix incorrect offset in batadv_tt_tvlv_ogm_handler_v1(), by Remi Pommarel * tag 'batadv-net-pullrequest-20250207' of git://git.open-mesh.org/linux-merge: batman-adv: Fix incorrect offset in batadv_tt_tvlv_ogm_handler_v1() batman-adv: Drop unmanaged ELP metric worker batman-adv: Ignore neighbor throughput metrics in error case batman-adv: fix panic during interface removal ==================== Link: https://patch.msgid.link/20250207095823.26043-1-sw@simonwunderlich.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11Merge branch 'ptp-vmclock-bugfixes-and-cleanups-for-error-handling'Paolo Abeni1-26/+21
says: ==================== ptp: vmclock: bugfixes and cleanups for error handling Some error handling issues I noticed while looking at the code. Only compile-tested. v1: https://lore.kernel.org/r/20250206-vmclock-probe-v1-0-17a3ea07be34@linutronix.de Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> ==================== Link: https://patch.msgid.link/20250207-vmclock-probe-v2-0-bc2fce0bdf07@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11ptp: vmclock: Remove goto-based cleanup logicThomas Weißschuh1-19/+13
vmclock_probe() uses an "out:" label to return from the function on error. This indicates that some cleanup operation is necessary. However the label does not do anything as all resources are managed through devres, making the code slightly harder to read. Remove the label and just return directly. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11ptp: vmclock: Clean up miscdev and ptp clock through devresThomas Weißschuh1-7/+6
Most resources owned by the vmclock device are managed through devres. Only the miscdev and ptp clock are managed manually. This makes the code slightly harder to understand than necessary. Switch them over to devres and remove the now unnecessary drvdata. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11ptp: vmclock: Don't unregister misc device if it was not registeredThomas Weißschuh1-1/+2
vmclock_remove() tries to detect the successful registration of the misc device based on the value of its minor value. However that check is incorrect if the misc device registration was not attempted in the first place. Always initialize the minor number, so the check works properly. Fixes: 205032724226 ("ptp: Add support for the AMZNC10C 'vmclock' device") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11ptp: vmclock: Set driver data before its usageThomas Weißschuh1-2/+2
If vmclock_ptp_register() fails during probing, vmclock_remove() is called to clean up the ptp clock and misc device. It uses dev_get_drvdata() to access the vmclock state. However the driver data is not yet set at this point. Assign the driver data earlier. Fixes: 205032724226 ("ptp: Add support for the AMZNC10C 'vmclock' device") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11ptp: vmclock: Add .owner to vmclock_miscdev_fopsDavid Woodhouse1-0/+1
Without the .owner field, the module can be unloaded while /dev/vmclock0 is open, leading to an oops. Fixes: 205032724226 ("ptp: Add support for the AMZNC10C 'vmclock' device") Cc: stable@vger.kernel.org Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11Merge tag 'linux-can-fixes-for-6.14-20250208' of ↵Jakub Kicinski7-14/+22
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2025-02-08 The first patch is by Reyders Morales and fixes a code example in the CAN ISO15765-2 documentation. The next patch is contributed by Alexander Hölzl and fixes sending of J1939 messages with zero data length. Fedor Pchelkin's patch for the ctucanfd driver adds a missing handling for an skb allocation error. Krzysztof Kozlowski contributes a patch for the c_can driver to fix unbalanced runtime PM disable in error path. The next patch is by Vincent Mailhol and fixes a NULL pointer dereference on udev->serial in the etas_es58x driver. The patch is by Robin van der Gracht and fixes the handling for an skb allocation error. * tag 'linux-can-fixes-for-6.14-20250208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: rockchip: rkcanfd_handle_rx_fifo_overflow_int(): bail out if skb cannot be allocated can: etas_es58x: fix potential NULL pointer dereference on udev->serial can: c_can: fix unbalanced runtime PM disable in error path can: ctucanfd: handle skb allocation failure can: j1939: j1939_sk_send_loop(): fix unable to send messages with data length zero Documentation/networking: fix basic node example document ISO 15765-2 ==================== Link: https://patch.msgid.link/20250208115120.237274-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11mlxsw: Enable Tx checksum offloadIdo Schimmel3-2/+11
The device is able to checksum plain TCP / UDP packets over IPv4 / IPv6 when the 'ipcs' bit in the send descriptor is set. Advertise support for the 'NETIF_F_IP{,6}_CSUM' features in net devices registered by the driver and VLAN uppers and set the 'ipcs' bit when the stack requests Tx checksum offload. Note that the device also calculates the IPv4 checksum, but it first zeroes the current checksum so there should not be any difference compared to the checksum calculated by the kernel. On SN5600 (Spectrum-4) there is about 10% improvement in Tx packet rate with 1400 byte packets when using pktgen. Tested on Spectrum-{1,2,3,4} with all the combinations of IPv4 / IPv6, TCP / UDP, with and without VLAN. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/8dc86c95474ce10572a0fa83b8adb0259558e982.1738950446.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11selftests: drv-net: add helper for path resolutionJakub Kicinski2-1/+13
Refering to C binaries from Python code is going to be a common need. Add a helper to convert from path in relation to the test. Meaning, if the test is in the same directory as the binary, the call would be simply: cfg.rpath("binary"). The helper name "rpath" is not great. I can't think of a better name that would be accurate yet concise. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250207184140.1730466-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11selftests: drv-net: factor out a DrvEnv base classJakub Kicinski1-28/+35
We have separate Env classes for local tests and tests with a remote endpoint. Make it easier to share the code by creating a base class. Make env loading a method of this class. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250207184140.1730466-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11selftests: drv-net: remove an unnecessary libmnl includeJakub Kicinski1-1/+0
ncdevmem doesn't need libmnl, remove the unnecessary include. Since YNL doesn't depend on libmnl either, any more, it's actually possible to build selftests without having libmnl installed. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250207183119.1721424-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11Merge branch 'fib-rules-convert-rtm_newrule-and-rtm_delrule-to-per-netns-rtnl'Jakub Kicinski5-63/+110
Kuniyuki Iwashima says: ==================== fib: rules: Convert RTM_NEWRULE and RTM_DELRULE to per-netns RTNL. Patch 1 ~ 2 are small cleanup, and patch 3 ~ 8 make fib_nl_newrule() and fib_nl_delrule() hold per-netns RTNL. v1: https://lore.kernel.org/20250206084629.16602-1-kuniyu@amazon.com ==================== Link: https://patch.msgid.link/20250207072502.87775-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: fib_rules: Convert RTM_DELRULE to per-netns RTNL.Kuniyuki Iwashima1-7/+16
fib_nl_delrule() is the doit() handler for RTM_DELRULE but also called from vrf_newlink() in case something fails in vrf_add_fib_rules(). In the latter case, RTNL is already held and the 4th arg is true. Let's hold per-netns RTNL in fib_delrule() if rtnl_held is false. Now we can place ASSERT_RTNL_NET() in call_fib_rule_notifiers(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: fib_rules: Add error_free label in fib_delrule().Kuniyuki Iwashima1-5/+6
We will hold RTNL just before calling fib_nl2rule_rtnl() in fib_delrule() and release it before kfree(nlrule). Let's add a new rule to make the following change cleaner. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: fib_rules: Convert RTM_NEWRULE to per-netns RTNL.Kuniyuki Iwashima1-2/+14
fib_nl_newrule() is the doit() handler for RTM_NEWRULE but also called from vrf_newlink(). In the latter case, RTNL is already held and the 4th arg is true. Let's hold per-netns RTNL in fib_newrule() if rtnl_held is false. Note that we call fib_rule_get() before releasing per-netns RTNL to call notify_rule_change() without RTNL and prevent freeing the new rule. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: fib_rules: Factorise fib_newrule() and fib_delrule().Kuniyuki Iwashima3-21/+29
fib_nl_newrule() / fib_nl_delrule() is the doit() handler for RTM_NEWRULE / RTM_DELRULE but also called from vrf_newlink(). Currently, we hold RTNL on both paths but will not on the former. Also, we set dev_net(dev)->rtnl to skb->sk in vrf_fib_rule() because fib_nl_newrule() / fib_nl_delrule() fetch net as sock_net(skb->sk). Let's Factorise the two functions and pass net and rtnl_held flag. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure().Kuniyuki Iwashima2-4/+4
The following patch will not set skb->sk from VRF path. Let's fetch net from fib_rule->fr_net instead of sock_net(skb->sk) in fib[46]_rule_configure(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: fib_rules: Split fib_nl2rule().Kuniyuki Iwashima1-17/+41
We will move RTNL down to fib_nl_newrule() and fib_nl_delrule(). Some operations in fib_nl2rule() require RTNL: fib_default_rule_pref() and __dev_get_by_name(). Let's split the RTNL parts as fib_nl2rule_rtnl(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: fib_rules: Pass net to fib_nl2rule() instead of skb.Kuniyuki Iwashima1-4/+3
skb is not used in fib_nl2rule() other than sock_net(skb->sk), which is already available in callers, fib_nl_newrule() and fib_nl_delrule(). Let's pass net directly to fib_nl2rule(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: fib_rules: Don't check net in rule_exists() and rule_find().Kuniyuki Iwashima1-6/+0
fib_nl_newrule() / fib_nl_delrule() looks up struct fib_rules_ops in sock_net(skb->sk) and calls rule_exists() / rule_find() respectively. fib_nl_newrule() creates a new rule and links it to the found ops, so struct fib_rule never belongs to a different netns's ops->rules_list. Let's remove redundant netns check in rule_exists() and rule_find(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11Merge branch 'tun-unify-vnet-implementation'Jakub Kicinski4-317/+229
Akihiko Odaki says: ==================== tun: Unify vnet implementation When I implemented virtio's hash-related features to tun/tap [1], I found tun/tap does not fill the entire region reserved for the virtio header, leaving some uninitialized hole in the middle of the buffer after read()/recvmesg(). This series fills the uninitialized hole. More concretely, the num_buffers field will be initialized with 1, and the other fields will be inialized with 0. Setting the num_buffers field to 1 is mandated by virtio 1.0 [2]. The change to virtio header is preceded by another change that refactors tun and tap to unify their virtio-related code. [1]: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com [2]: https://lore.kernel.org/r/20241227084256-mutt-send-email-mst@kernel.org/ ==================== Link: https://patch.msgid.link/20250207-tun-v6-0-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11tap: Use tun's vnet-related codeAkihiko Odaki1-136/+16
tun and tap implements the same vnet-related features so reuse the code. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-7-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11tap: Keep hdr_len in tap_get_user()Akihiko Odaki1-18/+10
hdr_len is repeatedly used so keep it in a local variable. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-6-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11tun: Extract the vnet handling codeAkihiko Odaki3-179/+188
The vnet handling code will be reused by tap. Functions are renamed to ensure that their names contain "vnet" to clarify that they are part of the decoupled vnet handling code. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-5-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11tun: Decouple vnet handlingAkihiko Odaki1-98/+139
Decouple the vnet handling code so that we can reuse it for tap. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-4-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11tun: Decouple vnet from tun_structAkihiko Odaki1-25/+26
Decouple vnet-related functions from tun_struct so that we can reuse them for tap in the future. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-3-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11tun: Keep hdr_len in tun_get_user()Akihiko Odaki1-13/+11
hdr_len is repeatedly used so keep it in a local variable. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-2-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11tun: Refactor CONFIG_TUN_VNET_CROSS_LEAkihiko Odaki1-19/+10
Check IS_ENABLED(CONFIG_TUN_VNET_CROSS_LE) to save some lines and make future changes easier. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-1-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11Merge branch 'net-xilinx-axienet-enable-adaptive-irq-coalescing-with-dim'Jakub Kicinski3-77/+263
Sean Anderson says: ==================== net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM To improve performance without sacrificing latency under low load, enable DIM. While I appreciate not having to write the library myself, I do think there are many unusual aspects to DIM, as detailed in the last patch. ==================== Link: https://patch.msgid.link/20250206201036.1516800-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: xilinx: axienet: Enable adaptive IRQ coalescing with DIMSean Anderson3-9/+82
The default RX IRQ coalescing settings of one IRQ per packet can represent a significant CPU load. However, increasing the coalescing unilaterally can result in undesirable latency under low load. Adaptive IRQ coalescing with DIM offers a way to adjust the coalescing settings based on load. This device only supports "CQE" mode [1], where each packet resets the timer. Therefore, an interrupt is fired either when we receive coalesce_count_rx packets or when the interface is idle for coalesce_usec_rx. With this in mind, consider the following scenarios: Link saturated Here we want to set coalesce_count_rx to a large value, in order to coalesce more packets and reduce CPU load. coalesce_usec_rx should be set to at least the time for one packet. Otherwise the link will be "idle" and we will get an interrupt for each packet anyway. Bursts of packets Each burst should be coalesced into a single interrupt, although it may be prudent to reduce coalesce_count_rx for better latency. coalesce_usec_rx should be set to at least the time for one packet so bursts are coalesced. However, additional time beyond the packet time will just increase latency at the end of a burst. Sporadic packets Due to low load, we can set coalesce_count_rx to 1 in order to reduce latency to the minimum. coalesce_usec_rx does not matter in this case. Based on this analysis, I expected the CQE profiles to look something like usec = 0, pkts = 1 // Low load usec = 16, pkts = 4 usec = 16, pkts = 16 usec = 16, pkts = 64 usec = 16, pkts = 256 // High load Where usec is set to 16 to be a few us greater than the 12.3 us packet time of a 1500 MTU packet at 1 GBit/s. However, the CQE profile is instead usec = 2, pkts = 256 // Low load usec = 8, pkts = 128 usec = 16, pkts = 64 usec = 32, pkts = 64 usec = 64, pkts = 64 // High load I found this very surprising. The number of coalesced packets *decreases* as load increases. But as load increases we have more opportunities to coalesce packets without affecting latency as much. Additionally, the profile *increases* the usec as the load increases. But as load increases, the gaps between packets will tend to become smaller, making it possible to *decrease* usec for better latency at the end of a "burst". I consider the default CQE profile unsuitable for this NIC. Therefore, we use the first profile outlined in this commit instead. coalesce_usec_rx is set to 16 by default, but the user can customize it. This may be necessary if they are using jumbo frames. I think adjusting the profile times based on the link speed/mtu would be good improvement for generic DIM. In addition to the above profile problems, I noticed the following additional issues with DIM while testing: - DIM tends to "wander" when at low load, since the performance gradient is pretty flat. If you only have 10p/ms anyway then adjusting the coalescing settings will not affect throughput very much. - DIM takes a long time to adjust back to low indices when load is decreased following a period of high load. This is because it only re-evaluates its settings once every 64 interrupts. However, at low load 64 interrupts can be several seconds. Finally: performance. This patch increases receive throughput with iperf3 from 840 Mbits/sec to 938 Mbits/sec, decreases interrupts from 69920/sec to 316/sec, and decreases CPU utilization (4x Cortex-A53) from 43% to 9%. [1] Who names this stuff? Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-5-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: xilinx: axienet: Get coalesce parameters from driver stateSean Anderson2-31/+47
The cr variables now contain the same values as the control registers themselves. Extract/calculate the values from the variables instead of saving the user-specified values. This allows us to remove some bookeeping, and also lets the user know what the actual coalesce settings are. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-4-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: xilinx: axienet: Support adjusting coalesce settings while runningSean Anderson2-23/+119
In preparation for adaptive IRQ coalescing, we first need to support adjusting the settings at runtime. The existing code doesn't require any locking because - dma_start is the only function that modifies rx/tx_dma_cr. It is always called with IRQs and NAPI disabled, so nothing else is touching the hardware. - The IRQs don't race with poll, since the latter is a softirq. - The IRQs don't race with dma_stop since they both just clear the control registers. - dma_stop doesn't race with poll since the former is called with NAPI disabled. However, once we introduce another function that modifies rx/tx_dma_cr, we need to have some locking to prevent races. Introduce two locks to protect these variables and their registers. The control register values are now generated where the coalescing settings are set. Converting coalescing settings to control register values may require sleeping because of clk_get_rate. However, the read/modify/write of the control registers themselves can't sleep because it needs to happen in IRQ context. By pre-calculating the control register values, we avoid introducing an additional mutex. Since axienet_dma_start writes the control settings when it runs, we don't bother updating the CR registers when rx/tx_dma_started is false. This prevents any issues from writing to the control registers in the middle of a reset sequence. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-3-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: xilinx: axienet: Combine CR calculationSean Anderson2-33/+34
Combine the common parts of the CR calculations for better code reuse. While we're at it, simplify the code a bit. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-2-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11Merge tag 'wireless-2025-02-07' of ↵Jakub Kicinski10-29/+49
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.14-rc3 We have only one fix for ath12k and one fix for brcmfmac. Also this will be my last pull request as I'm stepping down as wireless driver maintainer. * tag 'wireless-2025-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: MAINTAINERS: wifi: remove Kalle MAINTAINERS: wifi: ath: remove Kalle wifi: brcmfmac: use random seed flag for BCM4355 and BCM4364 firmware wifi: ath12k: fix handling of 6 GHz rules ==================== Link: https://patch.msgid.link/20250207182957.23315C4CED1@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11Merge branch 'net-second-round-to-use-dev_net_rcu'Jakub Kicinski6-33/+48
Eric Dumazet says: ==================== net: second round to use dev_net_rcu() dev_net(dev) should either be protected by RTNL or RCU. There is no LOCKDEP support yet for this helper. Adding it would trigger too many splats. This second series fixes some of them. ==================== Link: https://patch.msgid.link/20250207135841.1948589-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>