summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-10-12ipv6: use READ_ONCE()/WRITE_ONCE() on fib6_table->fib_seqEric Dumazet3-12/+12
Using RTNL to protect ops->fib_rules_seq reads seems a big hammer. Writes are protected by RTNL. We can use READ_ONCE() when reading it. Constify 'struct net' argument of fib6_tables_seq_read() and fib6_rules_seq_read(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241009184405.3752829-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-12ipv4: use READ_ONCE()/WRITE_ONCE() on net->ipv4.fib_seqEric Dumazet4-8/+8
Using RTNL to protect ops->fib_rules_seq reads seems a big hammer. Writes are protected by RTNL. We can use READ_ONCE() when reading it. Constify 'struct net' argument of fib4_rules_seq_read() Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241009184405.3752829-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-12fib: rules: use READ_ONCE()/WRITE_ONCE() on ops->fib_rules_seqEric Dumazet2-7/+9
Using RTNL to protect ops->fib_rules_seq reads seems a big hammer. Writes are protected by RTNL. We can use READ_ONCE() on readers. Constify 'struct net' argument of fib_rules_seq_read() and lookup_rules_ops(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241009184405.3752829-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11tcp: move sysctl_tcp_l3mdev_accept to netns_ipv4_read_rxEric Dumazet3-5/+6
sysctl_tcp_l3mdev_accept is read from TCP receive fast path from tcp_v6_early_demux(), __inet6_lookup_established, inet_request_bound_dev_if(). Move it to netns_ipv4_read_rx. Remove the '#ifdef CONFIG_NET_L3_MASTER_DEV' that was guarding its definition. Note this adds a hole of three bytes that could be filled later. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Cc: Wei Wang <weiwan@google.com> Cc: Coco Li <lixiaoyan@google.com> Link: https://patch.msgid.link/20241010034100.320832-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11net: phy: aquantia: poll status registerAryan Srivastava1-3/+16
The system interface connection status register is not immediately correct upon line side link up. This results in the status being read as OFF and then transitioning to the correct host side link mode with a short delay. This causes the phylink framework passing the OFF status down to all MAC config drivers, resulting in the host side link being misconfigured, which in turn can lead to link flapping or complete packet loss in some cases. Mitigate this by periodically polling the register until it not showing the OFF state. This will be done every 1ms for 10ms, using the same poll/timeout as the processor intensive operation reads. If the phy is still expressing the OFF state after the timeout, then set the link to false and pass the NA interface mode onto the phylink framework. Signed-off-by: Aryan Srivastava <aryan.srivastava@alliedtelesis.co.nz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241010004935.1774601-1-aryan.srivastava@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11eth: remove the DLink/Sundance (ST201) driverJakub Kicinski6-2014/+0
Konstantin reports the maintainer's address bounces. There is no other maintainer and the driver is quite old. There is a good chance nobody is using this driver any more. Let's try to remove it completely, we can revert it back in if someone complains. Link: https://lore.kernel.org/20240925-bizarre-earwig-from-pluto-1484aa@lemu/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Denis Kirjanov <dkirjanov@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-11Merge branch 'tg3-link-irqs-napis-and-queues'Jakub Kicinski1-7/+40
Joe Damato says: ==================== tg3: Link IRQs, NAPIs, and queues This follows from a previous RFC (wherein I botched the subject lines of all the messages) [1]. I've taken Michael Chan's suggestion on modifying patch 2 and I've updated the commit messages of both patches to test and show the output for the default 1 TX 4 RX queues and the 4 TX and 4 RX queues cases. Reviewers: please check the commit messages carefully to ensure the output is correct (or on your own systems to verify, if you like). I am not a tg3 expert and it's possible that I got something wrong. [1]: https://lore.kernel.org/all/20241005145717.302575-3-jdamato@fastly.com/T/ ==================== Link: https://patch.msgid.link/20241009175509.31753-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11tg3: Link queues to NAPIsJoe Damato1-4/+35
Link queues to NAPIs using the netdev-genl API so this information is queryable. First, test with the default setting on my tg3 NIC at boot with 1 TX queue: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}] Now, adjust the number of TX queues to be 4 via ethtool: $ sudo ethtool -L eth0 tx 4 $ sudo ethtool -l eth0 | tail -5 Current hardware settings: RX: 4 TX: 4 Other: n/a Combined: n/a Despite "Combined: n/a" in the ethtool output, /proc/interrupts shows the tg3 has renamed the IRQs to be combined: 343: [...] eth0-0 344: [...] eth0-txrx-1 345: [...] eth0-txrx-2 346: [...] eth0-txrx-3 347: [...] eth0-txrx-4 Now query this via netlink to ensure the queues are linked properly to their NAPIs: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'tx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'tx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'tx'}] As you can see above, id 0 for both TX and RX share a NAPI, NAPI ID 8960, and so on for each queue index up to 3. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241009175509.31753-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11tg3: Link IRQs to NAPI instancesJoe Damato1-3/+5
Link IRQs to NAPI instances with netif_napi_set_irq. This information can be queried with the netdev-genl API. Begin by testing my tg3 device in its default state: 1 TX queue and 4 RX queues. Compare the output of /proc/interrupts for my tg3 device with the output of netdev-genl after applying this patch: $ cat /proc/interrupts | grep eth0 343: [...] eth0-tx-0 344: [...] eth0-rx-1 345: [...] eth0-rx-2 346: [...] eth0-rx-3 347: [...] eth0-rx-4 As you can see above, tg3 has named the IRQs such that there is a dedicated tx IRQ and 4 dedicated rx IRQs, for a total of 5 IRQs. $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8197, 'ifindex': 2, 'irq': 347}, {'id': 8196, 'ifindex': 2, 'irq': 346}, {'id': 8195, 'ifindex': 2, 'irq': 345}, {'id': 8194, 'ifindex': 2, 'irq': 344}, {'id': 8193, 'ifindex': 2, 'irq': 343}] Netlink displays the same IRQs as above, noting that each is mapped to a unique NAPI instance. Now, reconfigure the NIC to have 4 TX queues and 4 RX queues: $ sudo ethtool -L eth0 rx 4 tx 4 $ sudo ethtool -l eth0 | tail -5 Current hardware settings: RX: 4 TX: 4 Other: n/a Combined: n/a Examine /proc/interrupts once again, noting that tg3 will now rename the IRQs to suggest that they are combined tx and rx without allocating additional IRQs, so the total IRQ count in /proc/interrupts is unchanged: 343: [...] eth0-0 344: [...] eth0-txrx-1 345: [...] eth0-txrx-2 346: [...] eth0-txrx-3 347: [...] eth0-txrx-4 Check the output from netlink again: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8973, 'ifindex': 2, 'irq': 347}, {'id': 8972, 'ifindex': 2, 'irq': 346}, {'id': 8971, 'ifindex': 2, 'irq': 345}, {'id': 8970, 'ifindex': 2, 'irq': 344}, {'id': 8969, 'ifindex': 2, 'irq': 343}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20241009175509.31753-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11r8169: remove original workaround for RTL8125 broken rx issueHeiner Kallweit1-4/+0
Now that we have b9c7ac4fe22c ("r8169: disable ALDPS per default for RTL8125"), the first attempt to fix the issue shouldn't be needed any longer. So let's effectively revert 621735f59064 ("r8169: fix rare issue with broken rx after link-down on RTL8125") and see whether anybody complains. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/382d8c88-cbce-400f-ad62-fda0181c7e38@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11r8169: don't apply UDP padding quirk on RTL8126AHeiner Kallweit1-4/+10
Vendor drivers r8125/r8126 indicate that this quirk isn't needed any longer for RTL8126A. Mimic this in r8169. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/d1317187-aa81-4a69-b831-678436e4de62@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-11Merge branch 'selftests/bpf: add coverage for xdp_features in test_progs'Martin KaFai Lau2-9/+42
Alexis Lothoré says: ==================== this small series aims to increase coverage of xdp features in test_progs. The initial versions proposed to rework test_xdp_features.sh to make it fit in test_progs, but some discussions in v1 and v2 showed that the script is still needed as a standalone tool. So this new revision lets test_xdp_features.sh as-is, and rather adds missing coverage in existing test (cpu map). The new revision is now also a follow-up to the update performed by Florian Kauer in [1] for devmap programs testing. [1] https://lore.kernel.org/bpf/20240911-devel-koalo-fix-ingress-ifindex-v4-2-5c643ae10258@linutronix.de/ --- Changes in v3: - Drop xdp_features rework commit - update xdp_cpumap_attach to extend its coverage - Link to v2: https://lore.kernel.org/r/20240910-convert_xdp_tests-v2-1-a46367c9d038@bootlin.com Changes in v2: - fix endianness management in userspace packet parsing (call htonl on constant rather than packet part) The new test has been run in a local x86 environment and in CI: #560/1 xdp_cpumap_attach/CPUMAP with programs in entries:OK #560/2 xdp_cpumap_attach/CPUMAP with frags programs in entries:OK #560 xdp_cpumap_attach:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-11selftests/bpf: check program redirect in xdp_cpumap_attachAlexis Lothoré (eBPF Foundation)2-1/+9
xdp_cpumap_attach, in its current form, only checks that an xdp cpumap program can be executed, but not that it performs correctly the cpu redirect as configured by userspace (bpf_prog_test_run_opts will return success even if the redirect program returns an error) Add a check to ensure that the program performs the configured redirect as well. The check is based on a global variable incremented by a chained program executed only if the redirect program properly executes. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20241009-convert_xdp_tests-v3-3-51cea913710c@bootlin.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-11selftests/bpf: make xdp_cpumap_attach keep redirect prog attachedAlexis Lothoré (eBPF Foundation)1-8/+33
Current test only checks attach/detach on cpu map type program, and so does not check that it can be properly executed, neither that it redirects correctly. Update the existing test to extend its coverage: - keep the redirected program loaded - try to execute it through bpf_prog_test_run_opts with some dummy context While at it, bring the following minor improvements: - isolate test interface in its own namespace Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20241009-convert_xdp_tests-v3-2-51cea913710c@bootlin.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-11selftests/bpf: fix bpf_map_redirect call for cpu map testAlexis Lothoré (eBPF Foundation)1-1/+1
xdp_redir_prog currently redirects packets based on the entry at index 1 in cpu_map, but the corresponding test only manipulates the entry at index 0. This does not really affect the test in its current form since the program is detached before having the opportunity to execute, but it needs to be fixed before being able improve the corresponding test (ie, not only test attach/detach but also the redirect feature) Fix this XDP program by making it redirect packets based on entry 0 in cpu_map instead of entry 1. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20241009-convert_xdp_tests-v3-1-51cea913710c@bootlin.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski457-6171/+7688
Cross-merge networking fixes after downstream PR (net-6.12-rc3). No conflicts and no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10Merge tag 'net-6.12-rc3' of ↵Linus Torvalds115-488/+1339
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth and netfilter. Current release - regressions: - dsa: sja1105: fix reception from VLAN-unaware bridges - Revert "net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is enabled" - eth: fec: don't save PTP state if PTP is unsupported Current release - new code bugs: - smc: fix lack of icsk_syn_mss with IPPROTO_SMC, prevent null-deref - eth: airoha: update Tx CPU DMA ring idx at the end of xmit loop - phy: aquantia: AQR115c fix up PMA capabilities Previous releases - regressions: - tcp: 3 fixes for retrans_stamp and undo logic Previous releases - always broken: - net: do not delay dst_entries_add() in dst_release() - netfilter: restrict xtables extensions to families that are safe, syzbot found a way to combine ebtables with extensions that are never used by userspace tools - sctp: ensure sk_state is set to CLOSED if hashing fails in sctp_listen_start - mptcp: handle consistently DSS corruption, and prevent corruption due to large pmtu xmit" * tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) MAINTAINERS: Add headers and mailing list to UDP section MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL] slip: make slhc_remember() more robust against malicious packets net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC ppp: fix ppp_async_encode() illegal access docs: netdev: document guidance on cleanup patches phonet: Handle error of rtnl_register_module(). mpls: Handle error of rtnl_register_module(). mctp: Handle error of rtnl_register_module(). bridge: Handle error of rtnl_register_module(). vxlan: Handle error of rtnl_register_module(). rtnetlink: Add bulk registration helpers for rtnetlink message handlers. net: do not delay dst_entries_add() in dst_release() mptcp: pm: do not remove closing subflows mptcp: fallback when MPTCP opts are dropped after 1st data tcp: fix mptcp DSS corruption due to large pmtu xmit mptcp: handle consistently DSS corruption net: netconsole: fix wrong warning net: dsa: refuse cross-chip mirroring operations net: fec: don't save PTP state if PTP is unsupported ...
2024-10-10Merge tag 'trace-ringbuffer-v6.12-rc2' of ↵Linus Torvalds1-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: "Ring-buffer fix: do not have boot-mapped buffers use CPU hotplug callbacks When a ring buffer is mapped to memory assigned at boot, it also splits it up evenly between the possible CPUs. But the allocation code still attached a CPU notifier callback to this ring buffer. When a CPU is added, the callback will happen and another per-cpu buffer is created for the ring buffer. But for boot mapped buffers, there is no room to add another one (as they were all created already). The result of calling the CPU hotplug notifier on a boot mapped ring buffer is unpredictable and could lead to a system crash. If the ring buffer is boot mapped simply do not attach the CPU notifier to it" * tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Do not have boot mapped buffers hook to CPU hotplug
2024-10-10Merge tag 'for-6.12-rc2-tag' of ↵Linus Torvalds7-24/+76
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - update fstrim loop and add more cancellation points, fix reported delayed or blocked suspend if there's a huge chunk queued - fix error handling in recent qgroup xarray conversion - in zoned mode, fix warning printing device path without RCU protection - again fix invalid extent xarray state (6252690f7e1b), lost due to refactoring * tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix clear_dirty and writeback ordering in submit_one_sector() btrfs: zoned: fix missing RCU locking in error message when loading zone info btrfs: fix missing error handling when adding delayed ref with qgroups enabled btrfs: add cancellation points to trim loops btrfs: split remaining space to discard in chunks
2024-10-10Merge tag 'nfsd-6.12-1' of ↵Linus Torvalds3-4/+7
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix NFSD bring-up / shutdown - Fix a UAF when releasing a stateid * tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: fix possible badness in FREE_STATEID nfsd: nfsd_destroy_serv() must call svc_destroy() even if nfsd_startup_net() failed NFSD: Mark filecache "down" if init fails
2024-10-10Merge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds15-261/+207
Pull xfs fixes from Carlos Maiolino: - A few small typo fixes - fstests xfs/538 DEBUG-only fix - Performance fix on blockgc on COW'ed files, by skipping trims on cowblock inodes currently opened for write - Prevent cowblocks to be freed under dirty pagecache during unshare - Update MAINTAINERS file to quote the new maintainer * tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix a typo xfs: don't free cowblocks from under dirty pagecache on unshare xfs: skip background cowblock trims on inodes open for write xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc xfs: don't ifdef around the exact minlen allocations xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split xfs: return bool from xfs_attr3_leaf_add xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate() xfs: scrub: convert comma to semicolon xfs: Remove empty declartion in header file MAINTAINERS: add Carlos Maiolino as XFS release manager
2024-10-10Merge branch 'maintainers-networking-file-coverage-updates'Jakub Kicinski1-0/+15
Simon Horman says: ==================== MAINTAINERS: Networking file coverage updates The aim of this proposal is to make the handling of some files, related to Networking and Wireless, more consistently. It does so by: 1. Adding some more headers to the UDP section, making it consistent with the TCP section. 2. Excluding some files relating to Wireless from NETWORKING [GENERAL], making their handling consistent with other files related to Wireless. The aim of this is to make things more consistent. And for MAINTAINERS to better reflect the situation on the ground. I am more than happy to be told that the current state of affairs is fine. Or for other ideas to be discussed. v1: https://lore.kernel.org/20241004-maint-net-hdrs-v1-0-41fd555aacc5@kernel.org ==================== Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-0-f2c86e7309c8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10MAINTAINERS: Add headers and mailing list to UDP sectionSimon Horman1-0/+4
Add netdev mailing list and some more udp.h headers to the UDP section. This is now more consistent with the TCP section. Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-2-f2c86e7309c8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]Simon Horman1-0/+11
We already exclude wireless drivers from the netdev@ traffic, to delegate it to linux-wireless@, and avoid overwhelming netdev@. Many of the following wireless-related sections MAINTAINERS are already not included in the NETWORKING [GENERAL] section. For consistency, exclude those that are. * 802.11 (including CFG80211/NL80211) * MAC80211 * RFKILL Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-1-f2c86e7309c8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10slip: make slhc_remember() more robust against malicious packetsEric Dumazet1-23/+34
syzbot found that slhc_remember() was missing checks against malicious packets [1]. slhc_remember() only checked the size of the packet was at least 20, which is not good enough. We need to make sure the packet includes the IPv4 and TCP header that are supposed to be carried. Add iph and th pointers to make the code more readable. [1] BUG: KMSAN: uninit-value in slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666 slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666 ppp_receive_nonmp_frame+0xe45/0x35e0 drivers/net/ppp/ppp_generic.c:2455 ppp_receive_frame drivers/net/ppp/ppp_generic.c:2372 [inline] ppp_do_recv+0x65f/0x40d0 drivers/net/ppp/ppp_generic.c:2212 ppp_input+0x7dc/0xe60 drivers/net/ppp/ppp_generic.c:2327 pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379 sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113 __release_sock+0x1da/0x330 net/core/sock.c:3072 release_sock+0x6b/0x250 net/core/sock.c:3626 pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4091 [inline] slab_alloc_node mm/slub.c:4134 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678 alloc_skb include/linux/skbuff.h:1322 [inline] sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732 pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 0 UID: 0 PID: 5460 Comm: syz.2.33 Not tainted 6.12.0-rc2-syzkaller-00006-g87d6aab2389e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Fixes: b5451d783ade ("slip: Move the SLIP drivers") Reported-by: syzbot+2ada1bc857496353be5a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/670646db.050a0220.3f80e.0027.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241009091132.2136321-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net/smc: Address spelling errorsSimon Horman4-5/+5
Address spelling errors flagged by codespell. This patch is intended to cover all files under drivers/smc Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://patch.msgid.link/20241009-smc-starspell-v1-1-b8b395bbaf82@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMCD. Wythe1-0/+11
Eric report a panic on IPPROTO_SMC, and give the facts that when INET_PROTOSW_ICSK was set, icsk->icsk_sync_mss must be set too. Bug: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000086000005 EC = 0x21: IABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault user pgtable: 4k pages, 48-bit VAs, pgdp=00000001195d1000 [0000000000000000] pgd=0800000109c46003, p4d=0800000109c46003, pud=0000000000000000 Internal error: Oops: 0000000086000005 [#1] PREEMPT SMP Modules linked in: CPU: 1 UID: 0 PID: 8037 Comm: syz.3.265 Not tainted 6.11.0-rc7-syzkaller-g5f5673607153 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : 0x0 lr : cipso_v4_sock_setattr+0x2a8/0x3c0 net/ipv4/cipso_ipv4.c:1910 sp : ffff80009b887a90 x29: ffff80009b887aa0 x28: ffff80008db94050 x27: 0000000000000000 x26: 1fffe0001aa6f5b3 x25: dfff800000000000 x24: ffff0000db75da00 x23: 0000000000000000 x22: ffff0000d8b78518 x21: 0000000000000000 x20: ffff0000d537ad80 x19: ffff0000d8b78000 x18: 1fffe000366d79ee x17: ffff8000800614a8 x16: ffff800080569b84 x15: 0000000000000001 x14: 000000008b336894 x13: 00000000cd96feaa x12: 0000000000000003 x11: 0000000000040000 x10: 00000000000020a3 x9 : 1fffe0001b16f0f1 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f x5 : 0000000000000040 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000000002 x1 : 0000000000000000 x0 : ffff0000d8b78000 Call trace: 0x0 netlbl_sock_setattr+0x2e4/0x338 net/netlabel/netlabel_kapi.c:1000 smack_netlbl_add+0xa4/0x154 security/smack/smack_lsm.c:2593 smack_socket_post_create+0xa8/0x14c security/smack/smack_lsm.c:2973 security_socket_post_create+0x94/0xd4 security/security.c:4425 __sock_create+0x4c8/0x884 net/socket.c:1587 sock_create net/socket.c:1622 [inline] __sys_socket_create net/socket.c:1659 [inline] __sys_socket+0x134/0x340 net/socket.c:1706 __do_sys_socket net/socket.c:1720 [inline] __se_sys_socket net/socket.c:1718 [inline] __arm64_sys_socket+0x7c/0x94 net/socket.c:1718 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Code: ???????? ???????? ???????? ???????? (????????) ---[ end trace 0000000000000000 ]--- This patch add a toy implementation that performs a simple return to prevent such panic. This is because MSS can be set in sock_create_kern or smc_setsockopt, similar to how it's done in AF_SMC. However, for AF_SMC, there is currently no way to synchronize MSS within __sys_connect_file. This toy implementation lays the groundwork for us to support such feature for IPPROTO_SMC in the future. Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://patch.msgid.link/1728456916-67035-1-git-send-email-alibuda@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10ppp: fix ppp_async_encode() illegal accessEric Dumazet1-1/+1
syzbot reported an issue in ppp_async_encode() [1] In this case, pppoe_sendmsg() is called with a zero size. Then ppp_async_encode() is called with an empty skb. BUG: KMSAN: uninit-value in ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline] BUG: KMSAN: uninit-value in ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675 ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline] ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675 ppp_async_send+0x130/0x1b0 drivers/net/ppp/ppp_async.c:634 ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2280 [inline] ppp_input+0x1f1/0xe60 drivers/net/ppp/ppp_generic.c:2304 pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379 sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113 __release_sock+0x1da/0x330 net/core/sock.c:3072 release_sock+0x6b/0x250 net/core/sock.c:3626 pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4092 [inline] slab_alloc_node mm/slub.c:4135 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4187 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678 alloc_skb include/linux/skbuff.h:1322 [inline] sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732 pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:744 ____sys_sendmsg+0x903/0xb60 net/socket.c:2602 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656 __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742 __do_sys_sendmmsg net/socket.c:2771 [inline] __se_sys_sendmmsg net/socket.c:2768 [inline] __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768 x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 1 UID: 0 PID: 5411 Comm: syz.1.14 Not tainted 6.12.0-rc1-syzkaller-00165-g360c1f1f24c6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+1d121645899e7692f92a@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009185802.3763282-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10docs: netdev: document guidance on cleanup patchesSimon Horman1-0/+17
The purpose of this section is to document what is the current practice regarding clean-up patches which address checkpatch warnings and similar problems. I feel there is a value in having this documented so others can easily refer to it. Clearly this topic is subjective. And to some extent the current practice discourages a wider range of patches than is described here. But I feel it is best to start somewhere, with the most well established part of the current practice. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009-doc-mc-clean-v2-1-e637b665fa81@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10Merge branch 'net-introduce-tx-h-w-shaping-api'Jakub Kicinski46-15/+3640
Paolo Abeni says: ==================== net: introduce TX H/W shaping API We have a plurality of shaping-related drivers API, but none flexible enough to meet existing demand from vendors[1]. This series introduces new device APIs to configure in a flexible way TX H/W shaping. The new functionalities are exposed via a newly defined generic netlink interface and include introspection capabilities. Some self-tests are included, on top of a dummy netdevsim implementation. Finally a basic implementation for the iavf driver is provided. Some usage examples: * Configure shaping on a given queue: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \ --do set --json '{"ifindex": '$IFINDEX', "shaper": {"handle": {"scope": "queue", "id":'$QUEUEID'}, "bw-max": 2000000}}' * Container B/W sharing The orchestration infrastructure wants to group the container-related queues under a RR scheduling and limit the aggregate bandwidth: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID2'}, "weight": '$W2'}], {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope":"node"}, "bw-max": 10000000}' {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 0}} Q1 \ \ Q2 -- node 0 ------- netdev / (bw-max: 10M) Q3 / * Delegation A containers wants to limit the aggregate B/W bandwidth of 2 of the 3 queues it owns - the starting configuration is the one from the previous point: SPEC=Documentation/netlink/specs/net_shaper.yaml ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID2'}, "weight": '$W2'}], "handle": {"scope": "node"}, "bw-max": 5000000 }' {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 1}} Q1 -- node 1 --------\ / (bw-max: 5M) \ Q2 / node 0 ------- netdev /(bw-max: 10M) Q3 ------------------/ In a group operation, when prior to the op itself, the leaves have different parents, the user must specify the parent handle for the group. I.e., starting from the previous config: ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope": "node"}, "bw-max": 3000000 }' Netlink error: Invalid argument nl_len = 96 (80) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'All the leaves shapers must have the same old parent'} ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope": "node"}, "parent": {"scope": "node", "id": 1}, "bw-max": 3000000 } {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 2}} Q1 -- node 2 --- /(bw-max:3M)\ Q3 / \ ---- node 1 \ / (bw-max: 5M)\ Q2 node 0 ------- netdev (bw-max: 10M) * Cleanup: Still starting from config 1To delete a single queue shaper ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID3'}}' Q1 -- node 2 --- (bw-max:3M)\ \ ---- node 1 \ / (bw-max: 5M)\ Q2 node 0 ------- netdev (bw-max: 10M) Deleting a node shaper relinks all its leaves to the node's parent: ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id":2}}' Q1 ---\ \ node 1----- \ / (bw-max: 5M)\ Q2----/ node 0 ------- netdev (bw-max: 10M) Deleting the last shaper under a node shaper deletes the node, too: ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID1'}}' ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID2'}}' ./tools/net/ynl/cli.py --spec $SPEC --do get --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id": 1}}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.handle'} Such delete recurses on parents that are left over with no leaves: ./tools/net/ynl/cli.py --spec $SPEC --do get --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id": 0}}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.handle'} v8: https://lore.kernel.org/cover.1727704215.git.pabeni@redhat.com v7: https://lore.kernel.org/cover.1725919039.git.pabeni@redhat.com v6: https://lore.kernel.org/cover.1725457317.git.pabeni@redhat.com v5: https://lore.kernel.org/cover.1724944116.git.pabeni@redhat.com v4: https://lore.kernel.org/cover.1724165948.git.pabeni@redhat.com v3: https://lore.kernel.org/cover.1722357745.git.pabeni@redhat.com RFC v2: https://lore.kernel.org/cover.1721851988.git.pabeni@redhat.com RFC v1: https://lore.kernel.org/cover.1719518113.git.pabeni@redhat.com ==================== Link: https://patch.msgid.link/cover.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10iavf: add support to exchange qos capabilitiesSudheer Mogilappagari3-3/+150
During driver initialization VF determines QOS capability is allowed by PF and receives QOS parameters. After which quanta size for queues is configured which is not configurable and is set to 1KB currently. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/72cbeb9c88d40e557053c57d7531c96bed490576.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10iavf: Add net_shaper_ops supportSudheer Mogilappagari5-1/+182
Implement net_shaper_ops support for IAVF. This enables configuration of rate limiting on per queue basis. Customer intends to enforce bandwidth limit on Tx traffic steered to the queue by configuring rate limits on the queue. To set rate limiting for a queue, update shaper object of given queues in driver and send VIRTCHNL_OP_CONFIG_QUEUE_BW to PF to update HW configuration. Deleting shaper configured for queue is nothing but configuring shaper with bw_max 0. The PF restores the default rate limiting config when bw_max is zero. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/5a882cb51998c4c2c3d21fed521498eba1c8f079.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10ice: Support VF queue rate limit and quanta size configurationWenjun Wu10-0/+395
Add support to configure VF queue rate limit and quanta size. For quanta size configuration, the quanta profiles are divided evenly by PF numbers. For each port, the first quanta profile is reserved for default. When VF is asked to set queue quanta size, PF will search for an available profile, change the fields and assigned this profile to the queue. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/fddefc2c1ec3ab32b241ce444af401da19e834dd.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10virtchnl: support queue rate limit and quanta size configurationWenjun Wu1-0/+119
This patch adds new virtchnl opcodes and structures for rate limit and quanta size configuration, which include: 1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each VF per queue. 2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue. 3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The configuration is previously set by DCB and PF, and now is the potential QoS capability of VF. VF can take it as reference to configure queue TC mapping. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/839002f7bd6f63b985a060a51b079f6e6dbbe237.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10testing: net-drv: add basic shaper testPaolo Abeni7-0/+510
Leverage a basic/dummy netdevsim implementation to do functional coverage for NL interface. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/43092afbf38365c796088bf8fc155e523ab434ae.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net-shapers: implement cap validation in the corePaolo Abeni1-0/+101
Use the device capabilities to reject invalid attribute values before pushing them to the H/W. Note that validating the metric explicitly avoids NL_SET_BAD_ATTR() usage, to provide unambiguous error messages to the user. Validating the nesting requires the knowledge of the new parent for the given shaper; as such is a chicken-egg problem: to validate the leaf nesting we need to know the node scope, to validate the node nesting we need to know the leafs parent scope. To break the circular dependency, place the leafs nesting validation after the parsing. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/54667601813e4c0348f39bf8ad2446ffc9fcd383.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net: shaper: implement introspection supportPaolo Abeni1-3/+95
The netlink op is a simple wrapper around the device callback. Extend the existing fetch_dev() helper adding an attribute argument for the requested device. Reuse such helper in the newly implemented operation. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/66eb62f22b3a5ba06ca91d01ae77515e5f447e15.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10netlink: spec: add shaper introspection supportPaolo Abeni5-0/+176
Allow the user-space to fine-grain query the shaping features supported by the NIC on each domain. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/3ddd10e450e3fe7d4b944c0d0b886d4483529ee6.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net-shapers: implement shaper cleanup on queue deletionPaolo Abeni3-0/+54
hook into netif_set_real_num_tx_queues() to cleanup any shaper configured on top of the to-be-destroyed TX queues. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/6da4ee03cae2b2a757d7b59e88baf09cc94c5ef1.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net-shapers: implement delete support for NODE scope shaperPaolo Abeni1-12/+74
Leverage the previously introduced group operation to implement the removal of NODE scope shaper, re-linking its leaves under the the parent node before actually deleting the specified NODE scope shaper. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/763d484b5b69e365acccfd8031b183c647a367a4.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net-shapers: implement NL group operationPaolo Abeni1-0/+350
Allow grouping multiple leaves shaper under the given root. The node and the leaves shapers are created, if needed, otherwise the existing shapers are re-linked as requested. Try hard to pre-allocated the needed resources, to avoid non trivial H/W configuration rollbacks in case of any failure. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/8a721274fde18b872d1e3a61aaa916bb7b7996d3.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net-shapers: implement NL set and delete operationsPaolo Abeni1-3/+380
Both NL operations directly map on the homonymous device shaper callbacks, update accordingly the shapers cache and are serialized via a per device lock. Implement the cache modification helpers to additionally deal with NODE scope shaper. That will be needed by the group() operation implemented in the next patch. The delete implementation is partial: does not handle NODE scope shaper yet. Such support will require infrastructure from the next patch and will be implemented later in the series. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/1e6a34a4095b35d773d2b9c476164671bbcf8397.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net-shapers: implement NL get operationPaolo Abeni6-7/+484
Introduce the basic infrastructure to implement the net-shaper core functionality. Each network devices carries a net-shaper cache, the NL get() operation fetches the data from such cache. The cache is initially empty, will be fill by the set()/group() operation implemented later and is destroyed at device cleanup time. The net_shaper_fill_handle(), net_shaper_ctx_init(), and net_shaper_generic_pre() implementations handle generic index type attributes, despite the current caller always pass a constant value to avoid more noise in later patches using them with different attributes. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/ddd10fd645a9367803ad02fca4a5664ea5ace170.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10netlink: spec: add shaper YAML specPaolo Abeni9-0/+579
Define the user-space visible interface to query, configure and delete network shapers via yaml definition. Add dummy implementations for the relevant NL callbacks. set() and delete() operations touch a single shaper creating/updating or deleting it. The group() operation creates a shaper's group, nesting multiple input shapers under the specified output shaper. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/7a33a1ff370bdbcd0cd3f909575c912cd56f41da.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10genetlink: extend info user-storage to match NL cb ctxPaolo Abeni9-12/+17
This allows a more uniform implementation of non-dump and dump operations, and will be used later in the series to avoid some per-operation allocation. Additionally rename the NL_ASSERT_DUMP_CTX_FITS macro, to fit a more extended usage. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/1130cc2896626b84587a2a5f96a5c6829638f4da.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10Merge branch 'rtnetlink-handle-error-of-rtnl_register_module'Paolo Abeni15-89/+176
Kuniyuki Iwashima says: ==================== rtnetlink: Handle error of rtnl_register_module(). While converting phonet to per-netns RTNL, I found a weird comment /* Further rtnl_register_module() cannot fail */ that was true but no longer true after commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message handlers"). Many callers of rtnl_register_module() just ignore the returned value but should handle them properly. This series introduces two helpers, rtnl_register_many() and rtnl_unregister_many(), to do that easily and fix such callers. All rtnl_register() and rtnl_register_module() will be converted to _many() variant and some rtnl_lock() will be saved in _many() later in net-next. Changes: v4: * Add more context in changelog of each patch v3: https://lore.kernel.org/all/20241007124459.5727-1-kuniyu@amazon.com/ * Move module *owner to struct rtnl_msg_handler * Make struct rtnl_msg_handler args/vars const * Update mctp goto labels v2: https://lore.kernel.org/netdev/20241004222358.79129-1-kuniyu@amazon.com/ * Remove __exit from mctp_neigh_exit(). v1: https://lore.kernel.org/netdev/20241003205725.5612-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20241008184737.9619-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10phonet: Handle error of rtnl_register_module().Kuniyuki Iwashima1-17/+11
Before commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message handlers"), once the first rtnl_register_module() allocated rtnl_msg_handlers[PF_PHONET], the following calls never failed. However, after the commit, rtnl_register_module() could fail silently to allocate rtnl_msg_handlers[PF_PHONET][msgtype] and requires error handling for each call. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's use rtnl_register_many() to handle the errors easily. Fixes: addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message handlers") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Rémi Denis-Courmont <courmisch@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10mpls: Handle error of rtnl_register_module().Kuniyuki Iwashima1-11/+21
Since introduced, mpls_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: 03c0566542f4 ("mpls: Netlink commands to add, remove, and dump routes") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10mctp: Handle error of rtnl_register_module().Kuniyuki Iwashima5-36/+66
Since introduced, mctp has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: 583be982d934 ("mctp: Add device handling and netlink interface") Fixes: 831119f88781 ("mctp: Add neighbour netlink interface") Fixes: 06d2f4c583a7 ("mctp: Add netlink route management") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10bridge: Handle error of rtnl_register_module().Kuniyuki Iwashima3-13/+17
Since introduced, br_vlan_rtnl_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: 8dcea187088b ("net: bridge: vlan: add rtm definitions and dump support") Fixes: f26b296585dc ("net: bridge: vlan: add new rtm message support") Fixes: adb3ce9bcb0f ("net: bridge: vlan: add del rtm message support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>