summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-02-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski395-1515/+3232
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/udp.c f796feabb9f5 ("udp: add local "peek offset enabled" flag") 56667da7399e ("net: implement lockless setsockopt(SO_PEEK_OFF)") Adjacent changes: net/unix/garbage.c aa82ac51d633 ("af_unix: Drop oob_skb ref before purging queue in GC.") 11498715f266 ("af_unix: Remove io_uring code for GC.") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-23Merge tag 'wireless-next-2024-02-22' of ↵Jakub Kicinski83-961/+2947
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.9 The third "new features" pull request for v6.9. This is a quick followup to send commit 04edb5dc68f4 ("wifi: ath12k: Fix uninitialized use of ret in ath12k_mac_allocate()") to fix the ath12k clang warning introduced in the previous pull request. We also have support for QCA2066 in ath11k, several new features in ath12k and few other changes in drivers. In stack it's mostly cleanup and refactoring. Major changes: ath12k * firmware-2.bin support * support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) * QCN9274: support split-PHY devices * WCN7850: enable Power Save Mode in station mode * WCN7850: P2P support ath11k: * QCA6390 & WCN6855: support 2 concurrent station interfaces * QCA2066 support iwlwifi * mvm: support wider-bandwidth OFDMA * bump firmware API to 90 for BZ/SC devices brcmfmac * DMI nvram filename quirk for ACEPC W5 Pro * tag 'wireless-next-2024-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (75 commits) wifi: wilc1000: revert reset line logic flip wifi: brcmfmac: Add DMI nvram filename quirk for ACEPC W5 Pro wifi: rtlwifi: set initial values for unexpected cases of USB endpoint priority wifi: rtl8xxxu: check vif before using in rtl8xxxu_tx() wifi: rtlwifi: rtl8192cu: Fix TX aggregation wifi: wilc1000: remove AKM suite be32 conversion for external auth request wifi: nl80211: refactor parsing CSA offsets wifi: nl80211: force WLAN_AKM_SUITE_SAE in big endian in NL80211_CMD_EXTERNAL_AUTH wifi: iwlwifi: load b0 version of ucode for HR1/HR2 wifi: iwlwifi: handle per-phy statistics from fw wifi: iwlwifi: iwl-fh.h: fix kernel-doc issues wifi: iwlwifi: api: fix kernel-doc reference wifi: iwlwifi: mvm: unlock mvm if there is no primary link wifi: iwlwifi: bump FW API to 90 for BZ/SC devices wifi: iwlwifi: mvm: support PHY context version 6 wifi: iwlwifi: mvm: partially support PHY context version 6 wifi: iwlwifi: mvm: support wider-bandwidth OFDMA wifi: cfg80211: use ML element parsing helpers wifi: mac80211: align ieee80211_mle_get_bss_param_ch_cnt() wifi: cfg80211: refactor RNR parsing ... ==================== Link: https://lore.kernel.org/r/20240222105205.CEC54C433F1@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22Merge tag 'net-6.8.0-rc6' of ↵Linus Torvalds75-378/+870
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - af_unix: fix another unix GC hangup Previous releases - regressions: - core: fix a possible AF_UNIX deadlock - bpf: fix NULL pointer dereference in sk_psock_verdict_data_ready() - netfilter: nft_flow_offload: release dst in case direct xmit path is used - bridge: switchdev: ensure MDB events are delivered exactly once - l2tp: pass correct message length to ip6_append_data - dccp/tcp: unhash sk from ehash for tb2 alloc failure after check_estalblished() - tls: fixes for record type handling with PEEK - devlink: fix possible use-after-free and memory leaks in devlink_init() Previous releases - always broken: - bpf: fix an oops when attempting to read the vsyscall page through bpf_probe_read_kernel - sched: act_mirred: use the backlog for mirred ingress - netfilter: nft_flow_offload: fix dst refcount underflow - ipv6: sr: fix possible use-after-free and null-ptr-deref - mptcp: fix several data races - phonet: take correct lock to peek at the RX queue Misc: - handful of fixes and reliability improvements for selftests" * tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) l2tp: pass correct message length to ip6_append_data net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY selftests: ioam: refactoring to align with the fix Fix write to cloned skb in ipv6_hop_ioam() phonet/pep: fix racy skb_queue_empty() use phonet: take correct lock to peek at the RX queue net: sparx5: Add spinlock for frame transmission from CPU net/sched: flower: Add lock protection when remove filter handle devlink: fix port dump cmd type net: stmmac: Fix EST offset for dwmac 5.10 tools: ynl: don't leak mcast_groups on init error tools: ynl: make sure we always pass yarg to mnl_cb_run net: mctp: put sock on tag allocation failure netfilter: nf_tables: use kzalloc for hook allocation netfilter: nf_tables: register hooks last when adding new chain/flowtable netfilter: nft_flow_offload: release dst in case direct xmit path is used netfilter: nft_flow_offload: reset dst in route object after setting up flow netfilter: nf_tables: set dormant flag on hook register failure selftests: tls: add test for peeking past a record of a different type selftests: tls: add test for merging of same-type control messages ...
2024-02-22Merge tag 'trace-v6.8-rc5' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: - While working on the ring buffer I noticed that the counter used for knowing where the end of the data is on a sub-buffer was not a full "int" but just 20 bits. It was masked out to 0xfffff. With the new code that allows the user to change the size of the sub-buffer, it is theoretically possible to ask for a size bigger than 2^20. If that happens, unexpected results may occur as there's no code checking if the counter overflowed the 20 bits of the write mask. There are other checks to make sure events fit in the sub-buffer, but if the sub-buffer itself is too big, that is not checked. Add a check in the resize of the sub-buffer to make sure that it never goes beyond the size of the counter that holds how much data is on it. * tag 'trace-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Do not let subbuf be bigger than write mask
2024-02-22Merge branch 'bnxt_en-ntuple-filter-improvements'Paolo Abeni3-200/+311
Michael Chan says: ==================== bnxt_en: Ntuple filter improvements The current Ntuple filter implementation has a limitation on 5750X (P5) and newer chips. The destination ring of the ntuple filter must be a valid ring in the RSS indirection table. Ntuple filters may not work if the RSS indirection table is modified by the user to only contain a subset of the rings. If an ntuple filter is set to a ring destination that is not in the RSS indirection table, the packet matching that filter will be placed in a random ring instead of the specified destination ring. This series of patches will fix the problem by using a separate VNIC for ntuple filters. The default VNIC will be dedicated for RSS and so the indirection table can be setup in any way and will not affect ntuple filters using the separate VNIC. Quite a bit of refactoring is needed to do the the VNIC and RSS context accounting in the first few patches. This is technically a bug fix, but I think the changes are too big for -net. ==================== Link: https://lore.kernel.org/r/20240220230317.96341-1-michael.chan@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Use the new VNIC to create ntuple filtersPavan Chebbi1-6/+27
The newly created vnic (BNXT_VNIC_NTUPLE) is ready to be used to create ntuple filters when supported by firmware. All RX rings can be used regardless of the RSS indirection setting on the default VNIC. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Create and setup the additional VNIC for adding ntuple filtersPavan Chebbi2-10/+36
Allocate and setup the additional VNIC for ntuple filters if this new method is supported by the firmware. Even though this VNIC is only used for ntuple filters with direct ring destinations, we still setup the RSS hash to be identical to the default VNIC so that each RX packet will have the correct hash in the RX completion. This VNIC is always at VNIC index BNXT_VNIC_NTUPLE. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Provision for an additional VNIC for ntuple filtersPavan Chebbi2-5/+32
On newer chips that support the ring table index method for ntuple filters, the current scheme of using the same VNIC for both RSS and ntuple filters will not work in all cases. An ntuple filter can only be directed to a destination ring if that destination ring is also in the RSS indirection table. To support ntuple filters with any arbitratry RSS indirection table that may only include a subset of the rings, we need to use a separate VNIC for ntuple filters. This patch provisions the additional VNIC. The next patch will allocate additional VNIC from firmware and set it up. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic indexPavan Chebbi3-21/+24
Replace hard coded 0 index with more meaningful BNXT_VNIC_DEFAULT. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Refactor bnxt_set_features()Pavan Chebbi1-7/+12
Refactor bnxt_set_features() function to have a common function to re-init. We'll need this to reinitialize when ntuple configuration changes. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Add bnxt_get_total_vnics() to calculate number of VNICsVenkat Duvvuru1-11/+17
Refactor the code by adding a new function to calculate the number of required VNICs. This is used in multiple places when reserving or checking resources. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Check additional resources in bnxt_check_rings()Michael Chan1-0/+2
bnxt_check_rings() is called to check if we have enough resource assets to satisfy the new number of ethtool channels. If the asset test fails, the ethtool operation will fail gracefully. Otherwise we will proceed and commit to use the new number of channels. If it fails to allocate any resources, the chip will fail to come up. For completeness, check all possible resources before committing to the new settings. Add the missing ring group and RSS context asset tests in bnxt_check_rings(). Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Improve RSS context reservation infrastructurePavan Chebbi2-23/+36
Add RSS context fields to struct bnxt_hw_rings and struct bnxt_hw_resc. With these, we can now specific the exact number of RSS contexts to reserve and store the reserved value. The original code relies on other resources to infer the number of RSS contexts to reserve and the reserved value is not stored. This improved infrastructure will make the RSS context accounting more complete and is needed by later patches. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Explicitly specify P5 completion rings to reserveMichael Chan2-7/+13
The current code assumes that every RX ring group and every TX ring requires a completion ring on P5_PLUS chips. Now that we have the bnxt_hw_rings structure, add the cp_p5 field so that it can be explicitly specified. This makes the logic more clear. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Refactor ring reservation functionsMichael Chan2-130/+132
The current functions to reserve hardware rings pass in 6 different ring or resource types as parameters. Add a structure bnxt_hw_rings to consolidate all these parameters and pass the structure pointer instead to these functions. Add 2 related helper functions also. This makes the code cleaner and makes it easier to add new resources to be reserved. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22Merge branch 'mctp-core-protocol-updates-minor-fixes-tests'Paolo Abeni9-55/+630
Jeremy Kerr says: ==================== MCTP core protocol updates, minor fixes & tests This series implements some procotol improvements for AF_MCTP, particularly for systems with multiple MCTP networks defined. For those, we need to add the network ID to the tag lookups, which then suggests an updated version of the tag allocate / drop ioctl to allow the net ID to be specified there too. The ioctl change affects uabi, so might warrant some extra attention. There are also a couple of new kunit tests for multiple-net configurations. We have a fix for populating the flow data when fragmenting, and a testcase for that too. Of course, any queries/comments/etc., please let me know! ==================== Link: https://lore.kernel.org/r/cover.1708335994.git.jk@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: tests: Add a test for proper tag creation on local outputJeremy Kerr1-0/+75
Ensure we have the correct key parameters on sending a message. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: tests: Test that outgoing skbs have flow data populatedJeremy Kerr3-0/+138
When CONFIG_MCTP_FLOWS is enabled, outgoing skbs should have their SKB_EXT_MCTP extension set for drivers to consume. Add two tests for local-to-output routing that check for the flow extensions: one for the simple single-packet case, and one for fragmentation. We now make MCTP_TEST select MCTP_FLOWS, so we always get coverage of these flow tests. The tests are skippable if MCTP_FLOWS is (otherwise) disabled, but that would need manual config tweaking. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: copy skb ext data when fragmentingJeremy Kerr2-0/+11
If we're fragmenting on local output, the original packet may contain ext data for the MCTP flows. We'll want this in the resulting fragment skbs too. So, do a skb_ext_copy() in the fragmentation path, and implement the MCTP-specific parts of an ext copy operation. Fixes: 67737c457281 ("mctp: Pass flow data & flow release events to drivers") Reported-by: Jian Zhang <zhangjian.3032@bytedance.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: tests: Add MCTP net isolation testsJeremy Kerr1-0/+161
Add a couple of tests that excersise the new net-specific sk_key and bind lookups Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: tests: Add netid argument to __mctp_route_test_initJeremy Kerr1-4/+7
We'll want to create net-specific test setups in an upcoming change, so allow the caller to provide a non-default netid. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: provide a more specific tag allocation ioctlJeremy Kerr2-20/+129
Now that we have net-specific tags, extend the tag allocation ioctls (SIOCMCTPALLOCTAG / SIOCMCTPDROPTAG) to allow a network parameter to be passed to the tag allocation. We also add a local_addr member to the ioc struct, to allow for a future finer-grained tag allocation using local EIDs too. We don't add any specific support for that now though, so require MCTP_ADDR_ANY or MCTP_ADDR_NULL for those at present. The old ioctls will still work, but allocate for the default MCTP net. These are now marked as deprecated in the header. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: separate key correlation across netsJeremy Kerr4-16/+40
Currently, we lookup sk_keys from the entire struct net_namespace, which may contain multiple MCTP net IDs. In those cases we want to distinguish between endpoints with the same EID but different net ID. Add the net ID data to the struct mctp_sk_key, populate on add and filter on this during route lookup. For the ioctl interface, we use a default net of MCTP_INITIAL_DEFAULT_NET (ie., what will be in use for single-net configurations), but we'll extend the ioctl interface to provide net-specific tag allocation in an upcoming change. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: tests: create test skbs with the correct net and deviceJeremy Kerr2-8/+17
In our test skb creation functions, we're not setting up the net and device data. This doesn't matter at the moment, but we will want to add support for distinct net IDs in future. Set the ->net identifier on the test MCTP device, and ensure that test skbs are set up with the correct device-related data on creation. Create a helper for setting skb->dev and mctp_skb_cb->net. We have a few cases where we're calling __mctp_cb() to initialise the cb (which we need for the above) separately, so integrate this into the skb creation helpers. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: make key lookups match the ANY address on either local or peerJeremy Kerr1-3/+11
We may have an ANY address in either the local or peer address of a sk_key, and may want to match on an incoming daddr or saddr being ANY. Do this by altering the conflicting-tag lookup to also accept ANY as the local/peer address. We don't want mctp_address_matches to match on the requested EID being ANY, as that is a specific lookup case on packet input. Reported-by: Eric Chuang <echuang@google.com> Reported-by: Anthony <anthonyhkf@google.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: Add some detail on the key allocation implementationJeremy Kerr1-0/+37
We could do with a little more comment on where MCTP_ADDR_ANY will match in the key allocations. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: mctp: avoid confusion over local/peer dest/source addressesJeremy Kerr3-11/+11
We have a double-swap of local and peer addresses in mctp_alloc_local_tag; the arguments in both call sites are swapped, but there is also a swap in the implementation of alloc_local_tag. This is opaque because we're using source/dest address references, which don't match the local/peer semantics. Avoid this confusion by naming the arguments as 'local' and 'peer', and remove the double swap. The calling order now matches mctp_key_alloc. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22Merge tag 'ath-next-20240222' of ↵Kalle Valo41-412/+2335
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath ath.git patches for v6.9 We have support for QCA2066 now and also several new features in ath12k. Major changes: ath12k * firmware-2.bin support * support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) * QCN9274: support split-PHY devices * WCN7850: enable Power Save Mode in station mode * WCN7850: P2P support ath11k: * QCA6390 & WCN6855: support 2 concurrent station interfaces * QCA2066 support
2024-02-22l2tp: pass correct message length to ip6_append_dataTom Parkin1-1/+1
l2tp_ip6_sendmsg needs to avoid accounting for the transport header twice when splicing more data into an already partially-occupied skbuff. To manage this, we check whether the skbuff contains data using skb_queue_empty when deciding how much data to append using ip6_append_data. However, the code which performed the calculation was incorrect: ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0; ...due to C operator precedence, this ends up setting ulen to transhdrlen for messages with a non-zero length, which results in corrupted packets on the wire. Add parentheses to correct the calculation in line with the original intent. Fixes: 9d4c75800f61 ("ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()") Cc: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Tom Parkin <tparkin@katalix.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240220122156.43131-1-tparkin@katalix.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22Merge tag 'nf-24-02-22' of ↵Paolo Abeni3-43/+57
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) If user requests to wake up a table and hook fails, restore the dormant flag from the error path, from Florian Westphal. 2) Reset dst after transferring it to the flow object, otherwise dst gets released twice from the error path. 3) Release dst in case the flowtable selects a direct xmit path, eg. transmission to bridge port. Otherwise, dst is memleaked. 4) Register basechain and flowtable hooks at the end of the command. Error path releases these datastructure without waiting for the rcu grace period. 5) Use kzalloc() to initialize struct nft_hook to fix a KMSAN report on access to hook type, also from Florian Westphal. netfilter pull request 24-02-22 * tag 'nf-24-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: use kzalloc for hook allocation netfilter: nf_tables: register hooks last when adding new chain/flowtable netfilter: nft_flow_offload: release dst in case direct xmit path is used netfilter: nft_flow_offload: reset dst in route object after setting up flow netfilter: nf_tables: set dormant flag on hook register failure ==================== Link: https://lore.kernel.org/r/20240222000843.146665-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22Merge tag 'for-netdev' of ↵Paolo Abeni15-17/+217
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-02-22 The following pull-request contains BPF updates for your *net* tree. We've added 11 non-merge commits during the last 24 day(s) which contain a total of 15 files changed, 217 insertions(+), 17 deletions(-). The main changes are: 1) Fix a syzkaller-triggered oops when attempting to read the vsyscall page through bpf_probe_read_kernel and friends, from Hou Tao. 2) Fix a kernel panic due to uninitialized iter position pointer in bpf_iter_task, from Yafang Shao. 3) Fix a race between bpf_timer_cancel_and_free and bpf_timer_cancel, from Martin KaFai Lau. 4) Fix a xsk warning in skb_add_rx_frag() (under CONFIG_DEBUG_NET) due to incorrect truesize accounting, from Sebastian Andrzej Siewior. 5) Fix a NULL pointer dereference in sk_psock_verdict_data_ready, from Shigeru Yoshida. 6) Fix a resolve_btfids warning when bpf_cpumask symbol cannot be resolved, from Hari Bathini. bpf-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready() selftests/bpf: Add negtive test cases for task iter bpf: Fix an issue due to uninitialized bpf_iter_task selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel selftest/bpf: Test the read of vsyscall page under x86-64 x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault() x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h bpf, scripts: Correct GPL license name xsk: Add truesize to skb_add_rx_frag(). bpf: Fix warning for bpf_cpumask in verifier ==================== Link: https://lore.kernel.org/r/20240221231826.1404-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHYSiddharth Vadapalli1-1/+3
Commit bb726b753f75 ("net: phy: realtek: add support for RTL8211F(D)(I)-VD-CG") extended support of the driver from the existing support for RTL8211F(D)(I)-CG PHY to the newer RTL8211F(D)(I)-VD-CG PHY. While that commit indicated that the RTL8211F_PHYCR2 register is not supported by the "VD-CG" PHY model and therefore updated the corresponding section in rtl8211f_config_init() to be invoked conditionally, the call to "genphy_soft_reset()" was left as-is, when it should have also been invoked conditionally. This is because the call to "genphy_soft_reset()" was first introduced by the commit 0a4355c2b7f8 ("net: phy: realtek: add dt property to disable CLKOUT clock") since the RTL8211F guide indicates that a PHY reset should be issued after setting bits in the PHYCR2 register. As the PHYCR2 register is not applicable to the "VD-CG" PHY model, fix the rtl8211f_config_init() function by invoking "genphy_soft_reset()" conditionally based on the presence of the "PHYCR2" register. Fixes: bb726b753f75 ("net: phy: realtek: add support for RTL8211F(D)(I)-VD-CG") Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240220070007.968762-1-s-vadapalli@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22Merge branch 'ioam6-fix-write-to-cloned-skb-s'Paolo Abeni3-67/+76
Justin Iurman says: ==================== ioam6: fix write to cloned skb's Make sure the IOAM data insertion is not applied on cloned skb's. As a consequence, ioam selftests needed a refactoring. ==================== Link: https://lore.kernel.org/r/20240219135255.15429-1-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22selftests: ioam: refactoring to align with the fixJustin Iurman2-67/+66
ioam6_parser uses a packet socket. After the fix to prevent writing to cloned skb's, the receiver does not see its IOAM data anymore, which makes input/forward ioam-selftests to fail. As a workaround, ioam6_parser now uses an IPv6 raw socket and leverages ancillary data to get hop-by-hop options. As a consequence, the hook is "after" the IOAM data insertion by the receiver and all tests are working again. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22Fix write to cloned skb in ipv6_hop_ioam()Justin Iurman1-0/+10
ioam6_fill_trace_data() writes inside the skb payload without ensuring it's writeable (e.g., not cloned). This function is called both from the input and output path. The output path (ioam6_iptunnel) already does the check. This commit provides a fix for the input path, inside ipv6_hop_ioam(). It also updates ip6_parse_tlv() to refresh the network header pointer ("nh") when returning from ipv6_hop_ioam(). Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22phonet/pep: fix racy skb_queue_empty() useRémi Denis-Courmont1-9/+32
The receive queues are protected by their respective spin-lock, not the socket lock. This could lead to skb_peek() unexpectedly returning NULL or a pointer to an already dequeued socket buffer. Fixes: 9641458d3ec4 ("Phonet: Pipe End Point for Phonet Pipes protocol") Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com> Link: https://lore.kernel.org/r/20240218081214.4806-2-remi@remlab.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22phonet: take correct lock to peek at the RX queueRémi Denis-Courmont1-2/+2
The receive queue is protected by its embedded spin-lock, not the socket lock, so we need the former lock here (and only that one). Fixes: 107d0d9b8d9a ("Phonet: Phonet datagram transport protocol") Reported-by: Luosili <rootlab@huawei.com> Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240218081214.4806-1-remi@remlab.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22PPPoL2TP: Add more code snippetsSamuel Thibault1-6/+129
The existing documentation was not telling that one has to create a PPP channel and a PPP interface to get PPPoL2TP data offloading working. Also, tunnel switching was not mentioned, so that people were thinking it was not supported, while it actually is. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Acked-by: Tom Parkin <tparkin@katalix.com> Link: https://lore.kernel.org/r/20240217211425.qj576u3jmaa6yidf@begin Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22net: sparx5: Add spinlock for frame transmission from CPUHoratiu Vultur3-0/+4
Both registers used when doing manual injection or fdma injection are shared between all the net devices of the switch. It was noticed that when having two process which each of them trying to inject frames on different ethernet ports, that the HW started to behave strange, by sending out more frames then expected. When doing fdma injection it is required to set the frame in the DCB and then make sure that the next pointer of the last DCB is invalid. But because there is no locks for this, then easily this pointer between the DCB can be broken and then it would create a loop of DCBs. And that means that the HW will continuously transmit these frames in a loop. Until the SW will break this loop. Therefore to fix this issue, add a spin lock for when accessing the registers for manual or fdma injection. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Fixes: f3cad2611a77 ("net: sparx5: add hostmode with phylink support") Link: https://lore.kernel.org/r/20240219080043.1561014-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22net/sched: flower: Add lock protection when remove filter handleJianbo Liu1-1/+4
As IDR can't protect itself from the concurrent modification, place idr_remove() under the protection of tp->lock. Fixes: 08a0063df3ae ("net/sched: flower: Move filter handle initialization earlier") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20240220085928.9161-1-jianbol@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22devlink: fix port dump cmd typeJiri Pirko1-1/+1
Unlike other commands, due to a c&p error, port dump fills-up cmd with wrong value, different from port-get request cmd, port-get doit reply and port notification. Fix it by filling cmd with value DEVLINK_CMD_PORT_NEW. Skimmed through devlink userspace implementations, none of them cares about this cmd value. Only ynl, for which, this is actually a fix, as it expects doit and dumpit ops rsp_value to be the same. Omit the fixes tag, even thought this is fix, better to target this for next release. Fixes: bfcd3a466172 ("Introduce devlink infrastructure") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240220075245.75416-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22net: stmmac: Fix EST offset for dwmac 5.10Kurt Kanzenbach1-1/+1
Fix EST offset for dwmac 5.10. Currently configuring Qbv doesn't work as expected. The schedule is configured, but never confirmed: |[ 128.250219] imx-dwmac 428a0000.ethernet eth1: configured EST The reason seems to be the refactoring of the EST code which set the wrong EST offset for the dwmac 5.10. After fixing this it works as before: |[ 106.359577] imx-dwmac 428a0000.ethernet eth1: configured EST |[ 128.430715] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched Tested on imx93. Fixes: c3f3b97238f6 ("net: stmmac: Refactor EST implementation") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/20240220-stmmac_est-v1-1-c41f9ae2e7b7@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22udp: add local "peek offset enabled" flagPaolo Abeni4-3/+13
We want to re-organize the struct sock layout. The sk_peek_off field location is problematic, as most protocols want it in the RX read area, while UDP wants it on a cacheline different from sk_receive_queue. Create a local (inside udp_sock) copy of the 'peek offset is enabled' flag and place it inside the same cacheline of reader_queue. Check such flag before reading sk_peek_off. This will save potential false sharing and cache misses in the fast-path. Tested under UDP flood with small packets. The struct sock layout update causes a 4% performance drop, and this patch restores completely the original tput. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/67ab679c15fbf49fa05b3ffe05d91c47ab84f147.1708426665.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22Merge branch 'tools-ynl-fix-impossible-errors'Jakub Kicinski1-4/+15
Jakub Kicinski says: ==================== tools: ynl: fix impossible errors Fix bugs discovered while I was hacking in low level stuff in YNL and kept breaking the socket, exercising the "impossible" error paths. v1: https://lore.kernel.org/all/20240217001742.2466993-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20240220161112.2735195-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22tools: ynl: don't leak mcast_groups on init errorJakub Kicinski1-1/+7
Make sure to free the already-parsed mcast_groups if we don't get an ack from the kernel when reading family info. This is part of the ynl_sock_create() error path, so we won't get a call to ynl_sock_destroy() to free them later. Fixes: 86878f14d71a ("tools: ynl: user space helpers") Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240220161112.2735195-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22tools: ynl: make sure we always pass yarg to mnl_cb_runJakub Kicinski1-3/+8
There is one common error handler in ynl - ynl_cb_error(). It expects priv to be a pointer to struct ynl_parse_arg AKA yarg. To avoid potential crashes if we encounter a stray NLMSG_ERROR always pass yarg as priv (or a struct which has it as the first member). ynl_cb_null() has a similar problem directly - it expects yarg but priv passed by the caller is ys. Found by code inspection. Fixes: 86878f14d71a ("tools: ynl: user space helpers") Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240220161112.2735195-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22net: mctp: put sock on tag allocation failureJeremy Kerr1-1/+1
We may hold an extra reference on a socket if a tag allocation fails: we optimistically allocate the sk_key, and take a ref there, but do not drop if we end up not using the allocated key. Ensure we're dropping the sock on this failure by doing a proper unref rather than directly kfree()ing. Fixes: de8a6b15d965 ("net: mctp: add an explicit reference from a mctp_sk_key to sock") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/ce9b61e44d1cdae7797be0c5e3141baf582d23a0.1707983487.git.jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22netfilter: nf_tables: use kzalloc for hook allocationFlorian Westphal1-1/+1
KMSAN reports unitialized variable when registering the hook, reg->hook_ops_type == NF_HOOK_OP_BPF) ~~~~~~~~~~~ undefined This is a small structure, just use kzalloc to make sure this won't happen again when new fields get added to nf_hook_ops. Fixes: 7b4b2fa37587 ("netfilter: annotate nf_tables base hook ops") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-22netfilter: nf_tables: register hooks last when adding new chain/flowtablePablo Neira Ayuso1-38/+40
Register hooks last when adding chain/flowtable to ensure that packets do not walk over datastructure that is being released in the error path without waiting for the rcu grace period. Fixes: 91c7b38dc9f0 ("netfilter: nf_tables: use new transaction infrastructure to handle chain") Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-22netfilter: nft_flow_offload: release dst in case direct xmit path is usedPablo Neira Ayuso1-0/+1
Direct xmit does not use it since it calls dev_queue_xmit() to send packets, hence it calls dst_release(). kmemleak reports: unreferenced object 0xffff88814f440900 (size 184): comm "softirq", pid 0, jiffies 4294951896 hex dump (first 32 bytes): 00 60 5b 04 81 88 ff ff 00 e6 e8 82 ff ff ff ff .`[............. 21 0b 50 82 ff ff ff ff 00 00 00 00 00 00 00 00 !.P............. backtrace (crc cb2bf5d6): [<000000003ee17107>] kmem_cache_alloc+0x286/0x340 [<0000000021a5de2c>] dst_alloc+0x43/0xb0 [<00000000f0671159>] rt_dst_alloc+0x2e/0x190 [<00000000fe5092c9>] __mkroute_output+0x244/0x980 [<000000005fb96fb0>] ip_route_output_flow+0xc0/0x160 [<0000000045367433>] nf_ip_route+0xf/0x30 [<0000000085da1d8e>] nf_route+0x2d/0x60 [<00000000d1ecd1cb>] nft_flow_route+0x171/0x6a0 [nft_flow_offload] [<00000000d9b2fb60>] nft_flow_offload_eval+0x4e8/0x700 [nft_flow_offload] [<000000009f447dbb>] expr_call_ops_eval+0x53/0x330 [nf_tables] [<00000000072e1be6>] nft_do_chain+0x17c/0x840 [nf_tables] [<00000000d0551029>] nft_do_chain_inet+0xa1/0x210 [nf_tables] [<0000000097c9d5c6>] nf_hook_slow+0x5b/0x160 [<0000000005eccab1>] ip_forward+0x8b6/0x9b0 [<00000000553a269b>] ip_rcv+0x221/0x230 [<00000000412872e5>] __netif_receive_skb_one_core+0xfe/0x110 Fixes: fa502c865666 ("netfilter: flowtable: simplify route logic") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>