summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-09-12net: dpaa: Pad packets to ETH_ZLENSean Anderson1-1/+8
When sending packets under 60 bytes, up to three bytes of the buffer following the data may be leaked. Avoid this by extending all packets to ETH_ZLEN, ensuring nothing is leaked in the padding. This bug can be reproduced by running $ ping -s 11 destination Fixes: 9ad1a3749333 ("dpaa_eth: add support for DPAA Ethernet") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240910143144.1439910-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12qlcnic: make read-only const array key staticColin Ian King1-5/+7
Don't populate the const read-only array key on the stack at run time, instead make it static. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240910120635.115266-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12mptcp: pm: Fix uaf in __timer_delete_syncEdward Adam Davis1-4/+9
There are two paths to access mptcp_pm_del_add_timer, result in a race condition: CPU1 CPU2 ==== ==== net_rx_action napi_poll netlink_sendmsg __napi_poll netlink_unicast process_backlog netlink_unicast_kernel __netif_receive_skb genl_rcv __netif_receive_skb_one_core netlink_rcv_skb NF_HOOK genl_rcv_msg ip_local_deliver_finish genl_family_rcv_msg ip_protocol_deliver_rcu genl_family_rcv_msg_doit tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit tcp_v4_do_rcv mptcp_nl_remove_addrs_list tcp_rcv_established mptcp_pm_remove_addrs_and_subflows tcp_data_queue remove_anno_list_by_saddr mptcp_incoming_options mptcp_pm_del_add_timer mptcp_pm_del_add_timer kfree(entry) In remove_anno_list_by_saddr(running on CPU2), after leaving the critical zone protected by "pm.lock", the entry will be released, which leads to the occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1). Keeping a reference to add_timer inside the lock, and calling sk_stop_timer_sync() with this reference, instead of "entry->add_timer". Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock, do not directly access any members of the entry outside the pm lock, which can avoid similar "entry->x" uaf. Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Edward Adam Davis <eadavis@qq.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/tencent_7142963A37944B4A74EF76CD66EA3C253609@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12net: libwx: fix number of Rx and Tx descriptorsJiawen Wu1-3/+3
The number of transmit and receive descriptors must be a multiple of 128 due to the hardware limitation. If it is set to a multiple of 8 instead of a multiple 128, the queues will easily be hung. Cc: stable@vger.kernel.org Fixes: 883b5984a5d2 ("net: wangxun: add ethtool_ops for ring parameters") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240910095629.570674-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12Merge branch 'mptcp-fallback-to-tcp-after-3-mpc-drop-cache'Jakub Kicinski9-11/+182
Matthieu Baerts says: ==================== mptcp: fallback to TCP after 3 MPC drop + cache The SYN + MPTCP_CAPABLE packets could be explicitly dropped by firewalls somewhere in the network, e.g. if they decide to drop packets based on the TCP options, instead of stripping them off. The idea of this series is to fallback to TCP after 3 SYN+MPC drop (patch 2). If the connection succeeds after the fallback, it very likely means a blackhole has been detected. In this case (patch 3), MPTCP can be disabled for a certain period of time, 1h by default. If after this period, MPTCP is still blocked, the period is doubled. This technique is inspired by the one used by TCP FastOpen. This should help applications which want to use MPTCP by default on the client side if available. ==================== Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-0-da7ebb4cd2a3@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12mptcp: disable active MPTCP in case of blackholeMatthieu Baerts (NGI0)7-8/+151
An MPTCP firewall blackhole can be detected if the following SYN retransmission after a fallback to "plain" TCP is accepted. In case of blackhole, a similar technique to the one in place with TFO is now used: MPTCP can be disabled for a certain period of time, 1h by default. This time period will grow exponentially when more blackhole issues get detected right after MPTCP is re-enabled and will reset to the initial value when the blackhole issue goes away. The blackhole period can be modified thanks to a new sysctl knob: blackhole_timeout. Two new MIB counters help understanding what's happening: - 'Blackhole', incremented when a blackhole is detected. - 'MPCapableSYNTXDisabled', incremented when an MPTCP connection directly falls back to TCP during the blackhole period. Because the technique is inspired by the one used by TFO, an important part of the new code is similar to what can find in tcp_fastopen.c, with some adaptations to the MPTCP case. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/57 Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-3-da7ebb4cd2a3@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12mptcp: fallback to TCP after SYN+MPC dropsMatthieu Baerts (NGI0)5-0/+27
Some middleboxes might be nasty with MPTCP, and decide to drop packets with MPTCP options, instead of just dropping the MPTCP options (or letting them pass...). In this case, it sounds better to fallback to "plain" TCP after 2 retransmissions, and try again. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/477 Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-2-da7ebb4cd2a3@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12mptcp: export mptcp_subflow_early_fallback()Matthieu Baerts (NGI0)2-7/+8
This helper will be used outside protocol.h in the following commit. While at it, also add a 'pr_fallback()' debug print, to help identifying fallbacks. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-1-da7ebb4cd2a3@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12net: dsa: felix: ignore pending status of TAS module when it's disabledXiaoliang Yang1-4/+7
The TAS module could not be configured when it's running in pending status. We need disable the module and configure it again. However, the pending status is not cleared after the module disabled. TC taprio set will always return busy even it's disabled. For example, a user uses tc-taprio to configure Qbv and a future basetime. The TAS module will run in a pending status. There is no way to reconfigure Qbv, it always returns busy. Actually the TAS module can be reconfigured when it's disabled. So it doesn't need to check the pending status if the TAS module is disabled. After the patch, user can delete the tc taprio configuration to disable Qbv and reconfigure it again. Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload") Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Link: https://patch.msgid.link/20240906093550.29985-1-xiaoliang.yang_1@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12net: hsr: prevent NULL pointer dereference in hsr_proxy_announce()Jeongjun Park1-0/+4
In the function hsr_proxy_annouance() added in the previous commit 5f703ce5c981 ("net: hsr: Send supervisory frames to HSR network with ProxyNodeTable data"), the return value of the hsr_port_get_hsr() function is not checked to be a NULL pointer, which causes a NULL pointer dereference. To solve this, we need to add code to check whether the return value of hsr_port_get_hsr() is NULL. Reported-by: syzbot+02a42d9b1bd395cbcab4@syzkaller.appspotmail.com Fixes: 5f703ce5c981 ("net: hsr: Send supervisory frames to HSR network with ProxyNodeTable data") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Lukasz Majewski <lukma@denx.de> Link: https://patch.msgid.link/20240907190341.162289-1-aha310510@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12Merge branch 'net-hsr-use-the-seqnr-lock-for-frames-received-via-interlink-port'Jakub Kicinski3-3/+10
Sebastian Andrzej Siewior says: ==================== net: hsr: Use the seqnr lock for frames received via interlink port. This is follow-up to the thread at https://lore.kernel.org/all/20240904133725.1073963-1-edumazet@google.com/ ==================== Link: https://patch.msgid.link/20240906132816.657485-1-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12net: hsr: Remove interlink_sequence_nr.Eric Dumazet2-2/+0
Remove interlink_sequence_nr which is unused. [ bigeasy: split out from Eric's patch ]. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240906132816.657485-3-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12net: hsr: Use the seqnr lock for frames received via interlink port.Sebastian Andrzej Siewior1-1/+10
syzbot reported that the seqnr_lock is not acquire for frames received over the interlink port. In the interlink case a new seqnr is generated and assigned to the frame. Frames, which are received over the slave port have already a sequence number assigned so the lock is not required. Acquire the hsr_priv::seqnr_lock during in the invocation of hsr_forward_skb() if a packet has been received from the interlink port. Reported-by: syzbot+3d602af7549af539274e@syzkaller.appspotmail.com Closes: https://groups.google.com/g/syzkaller-bugs/c/KppVvGviGg4/m/EItSdCZdBAAJ Fixes: 5055cccfc2d1c ("net: hsr: Provide RedBox support (HSR-SAN)") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Lukasz Majewski <lukma@denx.de> Tested-by: Lukasz Majewski <lukma@denx.de> Link: https://patch.msgid.link/20240906132816.657485-2-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12Merge branch 'selftests-mptcp-misc-small-fixes'Jakub Kicinski2-1/+5
Matthieu Baerts says: ==================== selftests: mptcp: misc. small fixes Here are some various fixes for the MPTCP selftests. Patch 1 fixes a recently modified test to continue to work as expected on older kernels. This is a fix for a recent fix that can be backported up to v5.15. Patch 2 and 3 include dependences when exporting or installing the tests. Two fixes for v6.11-rc1. ==================== Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-0-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12selftests: mptcp: include net_helper.sh fileMatthieu Baerts (NGI0)1-1/+1
Similar to the previous commit, the net_helper.sh file from the parent directory is used by the MPTCP selftests and it needs to be present when running the tests. This file then needs to be listed in the Makefile to be included when exporting or installing the tests, e.g. with: make -C tools/testing/selftests \ TARGETS=net/mptcp \ install INSTALL_PATH=$KSFT_INSTALL_PATH cd $KSFT_INSTALL_PATH ./run_kselftest.sh -c net/mptcp Fixes: 1af3bc912eac ("selftests: mptcp: lib: use wait_local_port_listen helper") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-3-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12selftests: mptcp: include lib.sh fileMatthieu Baerts (NGI0)1-0/+2
The lib.sh file from the parent directory is used by the MPTCP selftests and it needs to be present when running the tests. This file then needs to be listed in the Makefile to be included when exporting or installing the tests, e.g. with: make -C tools/testing/selftests \ TARGETS=net/mptcp \ install INSTALL_PATH=$KSFT_INSTALL_PATH cd $KSFT_INSTALL_PATH ./run_kselftest.sh -c net/mptcp Fixes: f265d3119a29 ("selftests: mptcp: lib: use setup/cleanup_ns helpers") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-2-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12selftests: mptcp: join: restrict fullmesh endp on 1st sfMatthieu Baerts (NGI0)1-1/+3
A new endpoint using the IP of the initial subflow has been recently added to increase the code coverage. But it breaks the test when using old kernels not having commit 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk"), e.g. on v5.15. Similar to commit d4c81bbb8600 ("selftests: mptcp: join: support local endpoint being tracked or not"), it is possible to add the new endpoint conditionally, by checking if "mptcp_pm_subflow_check_next" is present in kallsyms: this is not directly linked to the commit introducing this symbol but for the parent one which is linked anyway. So we can know in advance what will be the expected behaviour, and add the new endpoint only when it makes sense to do so. Fixes: 4878f9f8421f ("selftests: mptcp: join: validate fullmesh endp on 1st sf") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-1-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12netfilter: nft_socket: make cgroupsv2 matching work with namespacesFlorian Westphal1-3/+38
When running in container environmment, /sys/fs/cgroup/ might not be the real root node of the sk-attached cgroup. Example: In container: % stat /sys//fs/cgroup/ Device: 0,21 Inode: 2214 .. % stat /sys/fs/cgroup/foo Device: 0,21 Inode: 2264 .. The expectation would be for: nft add rule .. socket cgroupv2 level 1 "foo" counter to match traffic from a process that got added to "foo" via "echo $pid > /sys/fs/cgroup/foo/cgroup.procs". However, 'level 3' is needed to make this work. Seen from initial namespace, the complete hierarchy is: % stat /sys/fs/cgroup/system.slice/docker-.../foo Device: 0,21 Inode: 2264 .. i.e. hierarchy is 0 1 2 3 / -> system.slice -> docker-1... -> foo ... but the container doesn't know that its "/" is the "docker-1.." cgroup. Current code will retrieve the 'system.slice' cgroup node and store its kn->id in the destination register, so compare with 2264 ("foo" cgroup id) will not match. Fetch "/" cgroup from ->init() and add its level to the level we try to extract. cgroup root-level is 0 for the init-namespace or the level of the ancestor that is exposed as the cgroup root inside the container. In the above case, cgrp->level of "/" resolved in the container is 2 (docker-1...scope/) and request for 'level 1' will get adjusted to fetch the actual level (3). v2: use CONFIG_SOCK_CGROUP_DATA, eval function depends on it. (kernel test robot) Cc: cgroups@vger.kernel.org Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2") Reported-by: Nadia Pinaeva <n.m.pinaeva@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-12netfilter: nft_socket: fix sk refcount leaksFlorian Westphal1-3/+4
We must put 'sk' reference before returning. Fixes: 039b1f4f24ec ("netfilter: nft_socket: fix erroneous socket assignment") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-11Merge tag 'wireless-next-2024-09-11' of ↵Jakub Kicinski159-1565/+3126
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.12 The last -next "new features" pull request for v6.12. The stack now supports DFS on MLO but otherwise nothing really standing out. Major changes: cfg80211/mac80211 * EHT rate support in AQL airtime * DFS support for MLO rtw89 * complete BT-coexistence code for RTL8852BT * RTL8922A WoWLAN net-detect support * tag 'wireless-next-2024-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (105 commits) wifi: brcmfmac: cfg80211: Convert comma to semicolon wifi: rsi: Remove an unused field in struct rsi_debugfs wifi: libertas: Cleanup unused declarations wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_bus_probe() wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_sdio_probe() wifi: wilc1000: fix potential RCU dereference issue in wilc_parse_join_bss_param wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_cmd_802_11_scan_ext() wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop() wifi: cfg80211: fix two more possible UBSAN-detected off-by-one errors wifi: cfg80211: fix kernel-doc for per-link data wifi: mt76: mt7925: replace chan config with extend txpower config for clc wifi: mt76: mt7925: fix a potential array-index-out-of-bounds issue for clc wifi: mt76: mt7615: check devm_kasprintf() returned value wifi: mt76: mt7925: convert comma to semicolon wifi: mt76: mt7925: fix a potential association failure upon resuming wifi: mt76: Avoid multiple -Wflex-array-member-not-at-end warnings wifi: mt76: mt7921: Check devm_kasprintf() returned value wifi: mt76: mt7915: check devm_kasprintf() returned value wifi: mt76: mt7915: avoid long MCU command timeouts during SER wifi: mt76: mt7996: fix uninitialized TLV data ... ==================== Link: https://patch.msgid.link/20240911084147.A205DC4AF0F@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11sock_map: Add a cond_resched() in sock_hash_free()Eric Dumazet1-0/+1
Several syzbot soft lockup reports all have in common sock_hash_free() If a map with a large number of buckets is destroyed, we need to yield the cpu when needed. Fixes: 75e68e5bf2c7 ("bpf, sockhash: Synchronize delete from bucket list on map free") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240906154449.3742932-1-edumazet@google.com
2024-09-11Merge tag 'arm-fixes-6.11-3' of ↵Linus Torvalds8-13/+60
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "The bulk of the changes this time are for device tree files in the rockchips platform, addressing correctness issues on individual boards, plus one change in the rk356x SoC file to make it match the binding. The only other changes that came in are - a CPU frequencey scaling fix for JH7110 (RISC-V) - a build fix for the cznic hwrandom driver - a fix for a deadlock in qualcomm uefi secure application firmware driver" * tag 'arm-fixes-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: platform: cznic: turris-omnia-mcu: fix HW_RANDOM dependency riscv: dts: starfive: jh7110-common: Fix lower rate of CPUfreq by setting PLL0 rate to 1.5GHz firmware: qcom: uefisecapp: Fix deadlock in qcuefi_acquire() arm64: dts: rockchip: Fix compatibles for RK3588 VO{0,1}_GRF dt-bindings: soc: rockchip: Fix compatibles for RK3588 VO{0,1}_GRF arm64: dts: rockchip: override BIOS_DISABLE signal via GPIO hog on RK3399 Puma arm64: dts: rockchip: fix eMMC/SPI corruption when audio has been used on RK3399 Puma arm64: dts: rockchip: fix PMIC interrupt pin in pinctrl for ROCK Pi E arm64: dts: rockchip: Remove broken tsadc pinctrl binding for rk356x
2024-09-11Merge tag 'for-6.11/dm-fixes-2' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fix from Mikulas Patocka: - fix a race condition in dm-integrity * tag 'for-6.11/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm-integrity: fix a race condition when accessing recalc_sector
2024-09-11Merge tag 'printk-for-6.11-fixup' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fix from Petr Mladek: - Fix build of serial_core as a module * tag 'printk-for-6.11-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: Export match_devname_and_update_preferred_console()
2024-09-11minmax: reduce min/max macro expansion in atomisp driverLorenzo Stoakes1-7/+19
Avoid unnecessary nested min()/max() which results in egregious macro expansion. Use clamp_t() as this introduces the least possible expansion, and turn the {s,u}DIGIT_FITTING() macros into inline functions to avoid the nested expansion. This resolves an issue with slackware 15.0 32-bit compilation as reported by Richard Narron. Presumably the min/max fixups would be difficult to backport, this patch should be easier and fix's Richard's problem in 5.15. Reported-by: Richard Narron <richard@aaazen.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Closes: https://lore.kernel.org/all/4a5321bd-b1f-1832-f0c-cea8694dc5aa@aaazen.com/ Fixes: 867046cc7027 ("minmax: relax check to allow comparison between unsigned arguments and signed constants") Cc: stable@vger.kernel.org Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-11Merge branch 'bpf: Allow skb dynptr for tp_btf'Martin KaFai Lau11-13/+172
Philo Lu says: ==================== This makes bpf_dynptr_from_skb usable for tp_btf, so that we can easily parse skb in tracepoints. This has been discussed in [0], and Martin suggested to use dynptr (instead of helpers like bpf_skb_load_bytes). For safety, skb dynptr shouldn't be used in fentry/fexit. This is achieved by add KF_TRUSTED_ARGS flag in bpf_dynptr_from_skb defination, because pointers passed by tracepoint are trusted (PTR_TRUSTED) while those of fentry/fexit are not. Another problem raises that NULL pointers could be passed to tracepoint, such as trace_tcp_send_reset, and we need to recognize them. This is done by add a "__nullable" suffix in the func_proto of the tracepoint, discussed in [1]. 2 Test cases are added, one for "__nullable" suffix, and the other for using skb dynptr in tp_btf. changelog v2 -> v3 (Andrii Nakryiko): Patch 1: - Remove prog type check in prog_arg_maybe_null() - Add bpf_put_raw_tracepoint() after get() - Use kallsyms_lookup() instead of sprintf("%ps") Patch 2: Add separate test "tp_btf_nullable", and use full failure msg v1 -> v2: - Add "__nullable" suffix support (Alexei Starovoitov) - Replace "struct __sk_buff*" with "void*" in test (Martin KaFai Lau) [0] https://lore.kernel.org/all/20240205121038.41344-1-lulie@linux.alibaba.com/T/ [1] https://lore.kernel.org/all/20240430121805.104618-1-lulie@linux.alibaba.com/T/ ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11selftests/bpf: Expand skb dynptr selftests for tp_btfPhilo Lu3-2/+83
Add 3 test cases for skb dynptr used in tp_btf: - test_dynptr_skb_tp_btf: use skb dynptr in tp_btf and make sure it is read-only. - skb_invalid_ctx_fentry/skb_invalid_ctx_fexit: bpf_dynptr_from_skb should fail in fentry/fexit. In test_dynptr_skb_tp_btf, to trigger the tracepoint in kfree_skb, test_pkt_access is used for its test_run, as in kfree_skb.c. Because the test process is different from others, a new setup type is defined, i.e., SETUP_SKB_PROG_TP. The result is like: $ ./test_progs -t 'dynptr/test_dynptr_skb_tp_btf' #84/14 dynptr/test_dynptr_skb_tp_btf:OK #84 dynptr:OK #127 kfunc_dynptr_param:OK Summary: 2/1 PASSED, 0 SKIPPED, 0 FAILED $ ./test_progs -t 'dynptr/skb_invalid_ctx_f' #84/85 dynptr/skb_invalid_ctx_fentry:OK #84/86 dynptr/skb_invalid_ctx_fexit:OK #84 dynptr:OK #127 kfunc_dynptr_param:OK Summary: 2/2 PASSED, 0 SKIPPED, 0 FAILED Also fix two coding style nits (change spaces to tabs). Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Link: https://lore.kernel.org/r/20240911033719.91468-6-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11bpf: Allow bpf_dynptr_from_skb() for tp_btfPhilo Lu1-1/+2
Making tp_btf able to use bpf_dynptr_from_skb(), which is useful for skb parsing, especially for non-linear paged skb data. This is achieved by adding KF_TRUSTED_ARGS flag to bpf_dynptr_from_skb and registering it for TRACING progs. With KF_TRUSTED_ARGS, args from fentry/fexit are excluded, so that unsafe progs like fexit/__kfree_skb are not allowed. We also need the skb dynptr to be read-only in tp_btf. Because may_access_direct_pkt_data() returns false by default when checking bpf_dynptr_from_skb, there is no need to add BPF_PROG_TYPE_TRACING to it explicitly. Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240911033719.91468-5-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11tcp: Use skb__nullable in trace_tcp_send_resetPhilo Lu1-6/+6
Replace skb with skb__nullable as the argument name. The suffix tells bpf verifier through btf that the arg could be NULL and should be checked in tp_btf prog. For now, this is the only nullable argument in tcp tracepoints. Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240911033719.91468-4-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11selftests/bpf: Add test for __nullable suffix in tp_btfPhilo Lu4-0/+46
Add a tracepoint with __nullable suffix in bpf_testmod, and add cases for it: $ ./test_progs -t "tp_btf_nullable" #406/1 tp_btf_nullable/handle_tp_btf_nullable_bare1:OK #406/2 tp_btf_nullable/handle_tp_btf_nullable_bare2:OK #406 tp_btf_nullable:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Link: https://lore.kernel.org/r/20240911033719.91468-3-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11bpf: Support __nullable argument suffix for tp_btfPhilo Lu2-4/+35
Pointers passed to tp_btf were trusted to be valid, but some tracepoints do take NULL pointer as input, such as trace_tcp_send_reset(). Then the invalid memory access cannot be detected by verifier. This patch fix it by add a suffix "__nullable" to the unreliable argument. The suffix is shown in btf, and PTR_MAYBE_NULL will be added to nullable arguments. Then users must check the pointer before use it. A problem here is that we use "btf_trace_##call" to search func_proto. As it is a typedef, argument names as well as the suffix are not recorded. To solve this, I use bpf_raw_event_map to find "__bpf_trace##template" from "btf_trace_##call", and then we can see the suffix. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Link: https://lore.kernel.org/r/20240911033719.91468-2-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11bpf, cpumap: Move xdp:xdp_cpumap_kthread tracepoint before rcvDaniel Xu1-2/+4
cpumap takes RX processing out of softirq and onto a separate kthread. Since the kthread needs to be scheduled in order to run (versus softirq which does not), we can theoretically experience extra latency if the system is under load and the scheduler is being unfair to us. Moving the tracepoint to before passing the skb list up the stack allows users to more accurately measure enqueue/dequeue latency introduced by cpumap via xdp:xdp_cpumap_enqueue and xdp:xdp_cpumap_kthread tracepoints. f9419f7bd7a5 ("bpf: cpumap add tracepoints") which added the tracepoints states that the intent behind them was for general observability and for a feedback loop to see if the queues are being overwhelmed. This change does not mess with either of those use cases but rather adds a third one. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/bpf/47615d5b5e302e4bd30220473779e98b492d47cd.1725585718.git.dxu@dxuuu.xyz
2024-09-11selftests/xsk: Read current MAX_SKB_FRAGS from sysctl knobMaciej Fijalkowski2-6/+38
Currently, xskxceiver assumes that MAX_SKB_FRAGS value is always 17 which is not true - since the introduction of BIG TCP this can now take any value between 17 to 45 via CONFIG_MAX_SKB_FRAGS. Adjust the TOO_MANY_FRAGS test case to read the currently configured MAX_SKB_FRAGS value by reading it from /proc/sys/net/core/max_skb_frags. If running system does not provide that sysctl file then let us try running the test with a default value. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240910124129.289874-1-maciej.fijalkowski@intel.com
2024-09-11Merge branch 'lan743x-phylink'David S. Miller6-324/+498
Raju Lakkaraju says: ==================== Add support to PHYLINK for LAN743x/PCI11x1x chips This is the follow-up patch series of https://lkml.iu.edu/hypermail/linux/kernel/2310.2/02078.html Divide the PHYLINK adaptation and SFP modifications into two separate patch series. The current patch series focuses on transitioning the LAN743x driver's PHY support from phylib to phylink. Tested on PCI11010 Rev-1 Evaluation board Change List: ============ V5 -> V6: - Remove the lan743x_find_max_speed( ) function. Not require - Add EEE enable check before calling lan743x_mac_eee_enable( ) function V4 -> V5: - Remove the fixed_phy_unregister( ) function. Not require - Remove the "phydev->eee_enabled" check to update the MAC EEE enable/disable - Call lan743x_mac_eee_enable() with true after update tx_lpi_timer. - Add phy_support_eee() to initialize the EEE flags V3 -> V4: - Add fixed-link patch along with this series. Note: Note: This code was developed by Mr.Russell King Ref: https://lore.kernel.org/netdev/LV8PR11MB8700C786F5F1C274C73036CC9F8E2@LV8PR11MB8700.namprd11.prod.outlook.com/T/#me943adf54f1ea082edf294aba448fa003a116815 - Change phylink fixed-link function header's string from "Returns" to "Returns:" - Remove the EEE private variable from LAN743x adapter strcture and fix the EEE's set/get functions - set the individual caps (i.e. _RGMII, _RGMII_ID, _RGMII_RXID and __RGMII_TXID) replace with phy_interface_set_rgmii( ) function - Change lan743x_set_eee( ) to lan743x_mac_eee_enable( ) V2 -> V3: - Remove the unwanted parens in each of these if() sub-blocks - Replace "to_net_dev(config->dev)" with "netdev". - Add GMII_ID/RGMII_TXID/RGMII_RXID in supported_interfaces - Fix the lan743x_phy_handle_exists( ) return type V1 -> V2: - Fix the Russell King's comments i.e. remove the speed, duplex update in lan743x_phylink_mac_config( ) - pre-March 2020 legacy support has been removed V0 -> V1: - Integrate with Synopsys DesignWare XPCS drivers - Based on external review comments, - Changes made to SGMII interface support only 1G/100M/10M bps speed - Changes made to 2500Base-X interface support only 2.5Gbps speed - Add check for not is_sgmii_en with is_sfp_support_en support - Change the "pci11x1x_strap_get_status" function return type from void to int - Add ethtool phylink wol, eee, pause get/set functions ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-11net: lan743x: Add support to ethtool phylink get and set settingsRaju Lakkaraju3-79/+67
Add support to ethtool phylink functions: - get/set settings like speed, duplex etc - get/set the wake-on-lan (WOL) - get/set the energy-efficient ethernet (EEE) - get/set the pause Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-11net: lan743x: Migrate phylib to phylinkRaju Lakkaraju3-225/+349
Migrate phy support from phylib to phylink. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-11net: lan743x: Create separate Link Speed Duplex state functionRaju Lakkaraju1-30/+45
Create separate Link Speed Duplex (LSD) update state function from lan743x_sgmii_config () to use as subroutine. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-11net: lan743x: Create separate PCS power reset functionRaju Lakkaraju1-26/+29
Create separate PCS power reset function from lan743x_sgmii_config () to use as subroutine. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-11net: phylink: Add phylink_set_fixed_link() to configure fixed link state in ↵Russell King2-0/+44
phylink The function allows for the configuration of a fixed link state for a given phylink instance. This addition is particularly useful for network devices that operate with a fixed link configuration, where the link parameters do not change dynamically. By using `phylink_set_fixed_link()`, drivers can easily set up the fixed link state during initialization or configuration changes. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-11Merge tag 'riscv-soc-fixes-for-v6.11-final' of ↵Arnd Bergmann1-0/+6
https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes RISC-V soc fixes for v6.11-final StarFive: A fix to return one of the clocks on the JH7110 from 1 GHz to 1.5 GHz Signed-off-by: Conor Dooley <conor.dooley@microchip.com> * tag 'riscv-soc-fixes-for-v6.11-final' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: riscv: dts: starfive: jh7110-common: Fix lower rate of CPUfreq by setting PLL0 rate to 1.5GHz Link: https://lore.kernel.org/r/20240909-hybrid-groovy-601a33b5b309@spud Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-09-11platform: cznic: turris-omnia-mcu: fix HW_RANDOM dependencyArnd Bergmann1-1/+1
There is still a build failure when the rwrng support is in a loadable module but the mcu driver is built-in: arm-linux-gnueabi-ld: drivers/platform/cznic/turris-omnia-mcu-trng.o: in function `omnia_mcu_register_trng': turris-omnia-mcu-trng.c:(.text.omnia_mcu_register_trng+0x11c): undefined reference to `devm_hwrng_register' Change the dependency to explicitly disallow the broken configuration. Fixes: 41bb142a4028 ("platform: cznic: turris-omnia-mcu: Add support for MCU provided TRNG") Reviewed-by: Marek Behún <kabel@kernel.org> Link: https://lore.kernel.org/r/20240909110417.247453-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-09-11Merge branch 'for-6.11-fixup' into for-linusPetr Mladek1-0/+1
2024-09-11Merge branch '100GbE' of ↵Jakub Kicinski4-15/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-09-09 (ice, igb) This series contains updates to ice and igb drivers. Martyna moves LLDP rule removal to the proper uninitialization function for ice. Jake corrects accounting logic for FWD_TO_VSI_LIST switch filters on ice. Przemek removes incorrect, explicit calls to pci_disable_device() for ice. Michal Schmidt stops incorrect use of VSI list for VLAN use on ice. Sriram Yagnaraman adjusts igb_xdp_ring_update_tail() to be called under Tx lock on igb. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igb: Always call igb_xdp_ring_update_tail() under Tx lock ice: fix VSI lists confusion when adding VLANs ice: stop calling pci_disable_device() as we use pcim ice: fix accounting for filters shared by multiple VSIs ice: Fix lldp packets dropping after changing the number of channels ==================== Link: https://patch.msgid.link/20240909203842.3109822-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11Merge tag 'mlx5-fixes-2024-09-09' of ↵Jakub Kicinski6-23/+60
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2024-09-09 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2024-09-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Fix bridge mode operations when there are no VFs net/mlx5: Verify support for scheduling element and TSAR type net/mlx5: Add missing masks and QoS bit masks for scheduling elements net/mlx5: Explicitly set scheduling element and TSAR type net/mlx5e: Add missing link mode to ptys2ext_ethtool_map net/mlx5e: Add missing link modes to ptys2ethtool_map net/mlx5: Update the list of the PCI supported devices ==================== Link: https://patch.msgid.link/20240909194505.69715-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11Merge branch '100GbE' of ↵Jakub Kicinski26-135/+1394
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: support devlink subfunction Michal Swiatkowski says: Currently ice driver does not allow creating more than one networking device per physical function. The only way to have more hardware backed netdev is to use SR-IOV. Following patchset adds support for devlink port API. For each new pcisf type port, driver allocates new VSI, configures all resources needed, including dynamically MSIX vectors, program rules and registers new netdev. This series supports only one Tx/Rx queue pair per subfunction. Example commands: devlink port add pci/0000:31:00.1 flavour pcisf pfnum 1 sfnum 1000 devlink port function set pci/0000:31:00.1/1 hw_addr 00:00:00:00:03:14 devlink port function set pci/0000:31:00.1/1 state active devlink port function del pci/0000:31:00.1/1 Make the port representor and eswitch code generic to support subfunction representor type. VSI configuration is slightly different between VF and SF. It needs to be reflected in the code. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: subfunction activation and base devlink ops ice: basic support for VLAN in subfunctions ice: support subfunction devlink Tx topology ice: implement netdevice ops for SF representor ice: check if SF is ready in ethtool ops ice: don't set target VSI for subfunction ice: create port representor for SF ice: make representor code generic ice: implement netdev for subfunction ice: base subfunction aux driver ice: allocate devlink for subfunction ice: treat subfunction VSI the same as PF VSI ice: add basic devlink subfunctions support ice: export ice ndo_ops functions ice: add new VSI type for subfunctions ==================== Link: https://patch.msgid.link/20240906223010.2194591-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11Merge tag 'mlx5-updates-2024-09-02' of ↵Jakub Kicinski41-36/+17285
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2024-08-29 HW-Managed Flow Steering in mlx5 driver Yevgeny Kliteynik says: ======================= 1. Overview ----------- ConnectX devices support packet matching, modification, and redirection. This functionality is referred as Flow Steering. To configure a steering rule, the rule is written to the device-owned memory. This memory is accessed and cached by the device when processing a packet. The first implementation of Flow Steering was done in FW, and it is referred in the mlx5 driver as Device-Managed Flow Steering (DMFS). Later we introduced SW-managed Flow Steering (SWS or SMFS), where the driver is writing directly to the device's configuration memory (ICM) through RC QP using RDMA operations (RDMA-read and RDAM-write), thus achieving higher rates of rule insertion/deletion. Now we introduce a new flow steering implementation: HW-Managed Flow Steering (HWS or HMFS). In this new approach, the driver is configuring steering rules directly to the HW using the WQs with a special new type of WQE. This way we can reach higher rule insertion/deletion rate with much lower CPU utilization compared to SWS. The key benefits of HWS as opposed to SWS: + HW manages the steering decision tree - HW calculates CRC for each entry - HW handles tree hash collisions - HW & FW manage objects refcount + HW keeps cache coherency: - HW provides tree access locking and synchronization - HW provides notification on completion + Insertion rate isn’t affected by background traffic - Dedicated HW components that handle insertion 2. Performance -------------- Measuring Connection Tracking with simple IPv4 flows w/o NAT, we are able to get ~5 times more flows offloaded per second using HWS. 3. Configuration ---------------- The enablement of HWS mode in eswitch manager is done using the same devlink param that is already used for switching between FW-managed steering and SW-managed steering modes: # devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs 4. Upstream Submission ---------------------- HWS support consists of 3 main components: + Steering: - The lower layer that exposes HWS API to upper layers and implements all the management of flow steering building blocks + FS-Core - Implementation of fs_hws layer to enable fs_core to use HWS instead of FW or SW steering - Create HW steering action pools to utilize the ability of HWS to share steering actions among different rules - Add support for configuring HWS mode through devlink command, similar to configuring SWS mode + Connection Tracking - Implementation of CT support for HW steering - Hooks up the CT ops for the new steering mode and uses the HWS API to implement connection tracking. Because of the large number of patches, we need to perform the submission in several separate patch series. This series is the first submission that lays the ground work for the next submissions, where an actual user of HWS will be added. 5. Patches in this series ------------------------- This patch series contains implementation of the first bullet from above. ======================= * tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: HWS, added API and enabled HWS support net/mlx5: HWS, added send engine and context handling net/mlx5: HWS, added debug dump and internal headers net/mlx5: HWS, added backward-compatible API handling net/mlx5: HWS, added memory management handling net/mlx5: HWS, added vport handling net/mlx5: HWS, added modify header pattern and args handling net/mlx5: HWS, added FW commands handling net/mlx5: HWS, added matchers functionality net/mlx5: HWS, added definers handling net/mlx5: HWS, added rules handling net/mlx5: HWS, added tables handling net/mlx5: HWS, added actions handling net/mlx5: Added missing definitions in preparation for HW Steering net/mlx5: Added missing mlx5_ifc definition for HW Steering ==================== Link: https://patch.msgid.link/20240909181250.41596-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11Merge tag 'ipsec-next-2024-09-10' of ↵Jakub Kicinski7-136/+258
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-09-10 1) Remove an unneeded WARN_ON on packet offload. From Patrisious Haddad. 2) Add a copy from skb_seq_state to buffer function. This is needed for the upcomming IPTFS patchset. From Christian Hopps. 3) Spelling fix in xfrm.h. From Simon Horman. 4) Speed up xfrm policy insertions. From Florian Westphal. 5) Add and revert a patch to support xfrm interfaces for packet offload. This patch was just half cooked. 6) Extend usage of the new xfrm_policy_is_dead_or_sk helper. From Florian Westphal. 7) Update comments on sdb and xfrm_policy. From Florian Westphal. 8) Fix a null pointer dereference in the new policy insertion code From Florian Westphal. 9) Fix an uninitialized variable in the new policy insertion code. From Nathan Chancellor. * tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: policy: Restore dir assignments in xfrm_hash_rebuild() xfrm: policy: fix null dereference Revert "xfrm: add SA information to the offloaded packet" xfrm: minor update to sdb and xfrm_policy comments xfrm: policy: use recently added helper in more places xfrm: add SA information to the offloaded packet xfrm: policy: remove remaining use of inexact list xfrm: switch migrate to xfrm_policy_lookup_bytype xfrm: policy: don't iterate inexact policies twice at insert time selftests: add xfrm policy insertion speed test script xfrm: Correct spelling in xfrm.h net: add copy from skb_seq_state to buffer function xfrm: Remove documentation WARN_ON to limit return values for offloaded SA ==================== Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11Merge branch 'bnxt_en-msix-improvements'Jakub Kicinski5-13/+42
Michael Chan says: ==================== bnxt_en: MSIX improvements This patchset makes some improvements related to MSIX. The first patch adjusts the default MSIX vectors assigned for RoCE. On the PF, the number of MSIX is increased to 64 from the current 9. The second patch allocates additional MSIX vectors ahead of time when changing ethtool channels if dynamic MSIX is supported. The 3rd patch makes sure that the IRQ name is not truncated. ==================== Link: https://patch.msgid.link/20240909202737.93852-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11bnxt_en: resize bnxt_irq name field to fit format stringEdwin Peer1-1/+4
The name field of struct bnxt_irq is written using snprintf in bnxt_setup_msix(). Make the field large enough to fit the maximal formatted string to prevent truncation. Truncated IRQ names are less meaningful to the user. For example, "enp4s0f0np0-TxRx-0" gets truncated to "enp4s0f0np0-TxRx-" with the existing code. Make sure we have space for the extra characters added to the IRQ names: - the characters introduced by the static format string: hyphens - the maximal static substituted ring type string: "TxRx" - the maximum length of an integer formatted as a string, even though reasonable ring numbers would never be as long as this. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240909202737.93852-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11bnxt_en: Add MSIX check in bnxt_check_rings()Michael Chan2-6/+24
bnxt_check_rings() is called to ensure that we have the hardware ring resources before committing to reinitialize with the new number of rings. MSIX vectors are never checked at this point, because up until recently we must first disable MSIX before we can allocate the new set of MSIX vectors. Now that we support dynamic MSIX allocation, check to make sure we can dynamically allocate the new MSIX vectors as the last step in bnxt_check_rings() if dynamic MSIX is supported. For example, the IOMMU group may limit the number of MSIX vectors for the device. With this patch, the ring change will fail more gracefully when there is not enough MSIX vectors. It is also better to move bnxt_check_rings() to be called as the last step when changing ethtool rings. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240909202737.93852-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>