Age | Commit message (Collapse) | Author | Files | Lines |
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/ipv4/udp.c
f796feabb9f5 ("udp: add local "peek offset enabled" flag")
56667da7399e ("net: implement lockless setsockopt(SO_PEEK_OFF)")
Adjacent changes:
net/unix/garbage.c
aa82ac51d633 ("af_unix: Drop oob_skb ref before purging queue in GC.")
11498715f266 ("af_unix: Remove io_uring code for GC.")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.9
The third "new features" pull request for v6.9. This is a quick
followup to send commit 04edb5dc68f4 ("wifi: ath12k: Fix uninitialized
use of ret in ath12k_mac_allocate()") to fix the ath12k clang warning
introduced in the previous pull request.
We also have support for QCA2066 in ath11k, several new features in
ath12k and few other changes in drivers. In stack it's mostly cleanup
and refactoring.
Major changes:
ath12k
* firmware-2.bin support
* support having multiple identical PCI devices (firmware needs to
have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
* QCN9274: support split-PHY devices
* WCN7850: enable Power Save Mode in station mode
* WCN7850: P2P support
ath11k:
* QCA6390 & WCN6855: support 2 concurrent station interfaces
* QCA2066 support
iwlwifi
* mvm: support wider-bandwidth OFDMA
* bump firmware API to 90 for BZ/SC devices
brcmfmac
* DMI nvram filename quirk for ACEPC W5 Pro
* tag 'wireless-next-2024-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (75 commits)
wifi: wilc1000: revert reset line logic flip
wifi: brcmfmac: Add DMI nvram filename quirk for ACEPC W5 Pro
wifi: rtlwifi: set initial values for unexpected cases of USB endpoint priority
wifi: rtl8xxxu: check vif before using in rtl8xxxu_tx()
wifi: rtlwifi: rtl8192cu: Fix TX aggregation
wifi: wilc1000: remove AKM suite be32 conversion for external auth request
wifi: nl80211: refactor parsing CSA offsets
wifi: nl80211: force WLAN_AKM_SUITE_SAE in big endian in NL80211_CMD_EXTERNAL_AUTH
wifi: iwlwifi: load b0 version of ucode for HR1/HR2
wifi: iwlwifi: handle per-phy statistics from fw
wifi: iwlwifi: iwl-fh.h: fix kernel-doc issues
wifi: iwlwifi: api: fix kernel-doc reference
wifi: iwlwifi: mvm: unlock mvm if there is no primary link
wifi: iwlwifi: bump FW API to 90 for BZ/SC devices
wifi: iwlwifi: mvm: support PHY context version 6
wifi: iwlwifi: mvm: partially support PHY context version 6
wifi: iwlwifi: mvm: support wider-bandwidth OFDMA
wifi: cfg80211: use ML element parsing helpers
wifi: mac80211: align ieee80211_mle_get_bss_param_ch_cnt()
wifi: cfg80211: refactor RNR parsing
...
====================
Link: https://lore.kernel.org/r/20240222105205.CEC54C433F1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and netfilter.
Current release - regressions:
- af_unix: fix another unix GC hangup
Previous releases - regressions:
- core: fix a possible AF_UNIX deadlock
- bpf: fix NULL pointer dereference in sk_psock_verdict_data_ready()
- netfilter: nft_flow_offload: release dst in case direct xmit path
is used
- bridge: switchdev: ensure MDB events are delivered exactly once
- l2tp: pass correct message length to ip6_append_data
- dccp/tcp: unhash sk from ehash for tb2 alloc failure after
check_estalblished()
- tls: fixes for record type handling with PEEK
- devlink: fix possible use-after-free and memory leaks in
devlink_init()
Previous releases - always broken:
- bpf: fix an oops when attempting to read the vsyscall page through
bpf_probe_read_kernel
- sched: act_mirred: use the backlog for mirred ingress
- netfilter: nft_flow_offload: fix dst refcount underflow
- ipv6: sr: fix possible use-after-free and null-ptr-deref
- mptcp: fix several data races
- phonet: take correct lock to peek at the RX queue
Misc:
- handful of fixes and reliability improvements for selftests"
* tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
l2tp: pass correct message length to ip6_append_data
net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY
selftests: ioam: refactoring to align with the fix
Fix write to cloned skb in ipv6_hop_ioam()
phonet/pep: fix racy skb_queue_empty() use
phonet: take correct lock to peek at the RX queue
net: sparx5: Add spinlock for frame transmission from CPU
net/sched: flower: Add lock protection when remove filter handle
devlink: fix port dump cmd type
net: stmmac: Fix EST offset for dwmac 5.10
tools: ynl: don't leak mcast_groups on init error
tools: ynl: make sure we always pass yarg to mnl_cb_run
net: mctp: put sock on tag allocation failure
netfilter: nf_tables: use kzalloc for hook allocation
netfilter: nf_tables: register hooks last when adding new chain/flowtable
netfilter: nft_flow_offload: release dst in case direct xmit path is used
netfilter: nft_flow_offload: reset dst in route object after setting up flow
netfilter: nf_tables: set dormant flag on hook register failure
selftests: tls: add test for peeking past a record of a different type
selftests: tls: add test for merging of same-type control messages
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
- While working on the ring buffer I noticed that the counter used for
knowing where the end of the data is on a sub-buffer was not a full
"int" but just 20 bits. It was masked out to 0xfffff.
With the new code that allows the user to change the size of the
sub-buffer, it is theoretically possible to ask for a size bigger
than 2^20. If that happens, unexpected results may occur as there's
no code checking if the counter overflowed the 20 bits of the write
mask. There are other checks to make sure events fit in the
sub-buffer, but if the sub-buffer itself is too big, that is not
checked.
Add a check in the resize of the sub-buffer to make sure that it
never goes beyond the size of the counter that holds how much data is
on it.
* tag 'trace-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Do not let subbuf be bigger than write mask
|
|
Michael Chan says:
====================
bnxt_en: Ntuple filter improvements
The current Ntuple filter implementation has a limitation on 5750X (P5)
and newer chips. The destination ring of the ntuple filter must be
a valid ring in the RSS indirection table. Ntuple filters may not work
if the RSS indirection table is modified by the user to only contain a
subset of the rings. If an ntuple filter is set to a ring destination
that is not in the RSS indirection table, the packet matching that
filter will be placed in a random ring instead of the specified
destination ring.
This series of patches will fix the problem by using a separate VNIC
for ntuple filters. The default VNIC will be dedicated for RSS and
so the indirection table can be setup in any way and will not affect
ntuple filters using the separate VNIC.
Quite a bit of refactoring is needed to do the the VNIC and RSS
context accounting in the first few patches. This is technically a
bug fix, but I think the changes are too big for -net.
====================
Link: https://lore.kernel.org/r/20240220230317.96341-1-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The newly created vnic (BNXT_VNIC_NTUPLE) is ready to be used to create
ntuple filters when supported by firmware. All RX rings can be used
regardless of the RSS indirection setting on the default VNIC.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Allocate and setup the additional VNIC for ntuple filters if this
new method is supported by the firmware. Even though this VNIC is
only used for ntuple filters with direct ring destinations, we still
setup the RSS hash to be identical to the default VNIC so that each
RX packet will have the correct hash in the RX completion. This
VNIC is always at VNIC index BNXT_VNIC_NTUPLE.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
On newer chips that support the ring table index method for
ntuple filters, the current scheme of using the same VNIC for
both RSS and ntuple filters will not work in all cases. An
ntuple filter can only be directed to a destination ring if
that destination ring is also in the RSS indirection table.
To support ntuple filters with any arbitratry RSS indirection
table that may only include a subset of the rings, we need to
use a separate VNIC for ntuple filters.
This patch provisions the additional VNIC. The next patch will
allocate additional VNIC from firmware and set it up.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Replace hard coded 0 index with more meaningful BNXT_VNIC_DEFAULT.
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Refactor bnxt_set_features() function to have a common
function to re-init. We'll need this to reinitialize when
ntuple configuration changes.
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Refactor the code by adding a new function to calculate the number of
required VNICs. This is used in multiple places when reserving or
checking resources.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
bnxt_check_rings() is called to check if we have enough resource
assets to satisfy the new number of ethtool channels. If the asset
test fails, the ethtool operation will fail gracefully. Otherwise
we will proceed and commit to use the new number of channels. If it
fails to allocate any resources, the chip will fail to come up.
For completeness, check all possible resources before committing to
the new settings. Add the missing ring group and RSS context asset
tests in bnxt_check_rings().
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add RSS context fields to struct bnxt_hw_rings and struct bnxt_hw_resc.
With these, we can now specific the exact number of RSS contexts to
reserve and store the reserved value. The original code relies on
other resources to infer the number of RSS contexts to reserve and the
reserved value is not stored. This improved infrastructure will make
the RSS context accounting more complete and is needed by later
patches.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The current code assumes that every RX ring group and every TX ring
requires a completion ring on P5_PLUS chips. Now that we have the
bnxt_hw_rings structure, add the cp_p5 field so that it can
be explicitly specified. This makes the logic more clear.
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The current functions to reserve hardware rings pass in 6 different ring
or resource types as parameters. Add a structure bnxt_hw_rings to
consolidate all these parameters and pass the structure pointer instead
to these functions. Add 2 related helper functions also. This makes
the code cleaner and makes it easier to add new resources to be
reserved.
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Jeremy Kerr says:
====================
MCTP core protocol updates, minor fixes & tests
This series implements some procotol improvements for AF_MCTP,
particularly for systems with multiple MCTP networks defined. For those,
we need to add the network ID to the tag lookups, which then suggests an
updated version of the tag allocate / drop ioctl to allow the net ID to
be specified there too.
The ioctl change affects uabi, so might warrant some extra attention.
There are also a couple of new kunit tests for multiple-net
configurations.
We have a fix for populating the flow data when fragmenting, and a
testcase for that too.
Of course, any queries/comments/etc., please let me know!
====================
Link: https://lore.kernel.org/r/cover.1708335994.git.jk@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Ensure we have the correct key parameters on sending a message.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When CONFIG_MCTP_FLOWS is enabled, outgoing skbs should have their
SKB_EXT_MCTP extension set for drivers to consume.
Add two tests for local-to-output routing that check for the flow
extensions: one for the simple single-packet case, and one for
fragmentation.
We now make MCTP_TEST select MCTP_FLOWS, so we always get coverage of
these flow tests. The tests are skippable if MCTP_FLOWS is (otherwise)
disabled, but that would need manual config tweaking.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
If we're fragmenting on local output, the original packet may contain
ext data for the MCTP flows. We'll want this in the resulting fragment
skbs too.
So, do a skb_ext_copy() in the fragmentation path, and implement the
MCTP-specific parts of an ext copy operation.
Fixes: 67737c457281 ("mctp: Pass flow data & flow release events to drivers")
Reported-by: Jian Zhang <zhangjian.3032@bytedance.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add a couple of tests that excersise the new net-specific sk_key and
bind lookups
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We'll want to create net-specific test setups in an upcoming change, so
allow the caller to provide a non-default netid.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Now that we have net-specific tags, extend the tag allocation ioctls
(SIOCMCTPALLOCTAG / SIOCMCTPDROPTAG) to allow a network parameter to be
passed to the tag allocation.
We also add a local_addr member to the ioc struct, to allow for a future
finer-grained tag allocation using local EIDs too. We don't add any
specific support for that now though, so require MCTP_ADDR_ANY or
MCTP_ADDR_NULL for those at present.
The old ioctls will still work, but allocate for the default MCTP net.
These are now marked as deprecated in the header.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, we lookup sk_keys from the entire struct net_namespace, which
may contain multiple MCTP net IDs. In those cases we want to distinguish
between endpoints with the same EID but different net ID.
Add the net ID data to the struct mctp_sk_key, populate on add and
filter on this during route lookup.
For the ioctl interface, we use a default net of
MCTP_INITIAL_DEFAULT_NET (ie., what will be in use for single-net
configurations), but we'll extend the ioctl interface to provide
net-specific tag allocation in an upcoming change.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In our test skb creation functions, we're not setting up the net and
device data. This doesn't matter at the moment, but we will want to add
support for distinct net IDs in future.
Set the ->net identifier on the test MCTP device, and ensure that test
skbs are set up with the correct device-related data on creation. Create
a helper for setting skb->dev and mctp_skb_cb->net.
We have a few cases where we're calling __mctp_cb() to initialise the cb
(which we need for the above) separately, so integrate this into the skb
creation helpers.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We may have an ANY address in either the local or peer address of a
sk_key, and may want to match on an incoming daddr or saddr being ANY.
Do this by altering the conflicting-tag lookup to also accept ANY as
the local/peer address.
We don't want mctp_address_matches to match on the requested EID being
ANY, as that is a specific lookup case on packet input.
Reported-by: Eric Chuang <echuang@google.com>
Reported-by: Anthony <anthonyhkf@google.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We could do with a little more comment on where MCTP_ADDR_ANY will match
in the key allocations.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We have a double-swap of local and peer addresses in
mctp_alloc_local_tag; the arguments in both call sites are swapped, but
there is also a swap in the implementation of alloc_local_tag. This is
opaque because we're using source/dest address references, which don't
match the local/peer semantics.
Avoid this confusion by naming the arguments as 'local' and 'peer', and
remove the double swap. The calling order now matches mctp_key_alloc.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath
ath.git patches for v6.9
We have support for QCA2066 now and also several new features in ath12k.
Major changes:
ath12k
* firmware-2.bin support
* support having multiple identical PCI devices (firmware needs to
have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
* QCN9274: support split-PHY devices
* WCN7850: enable Power Save Mode in station mode
* WCN7850: P2P support
ath11k:
* QCA6390 & WCN6855: support 2 concurrent station interfaces
* QCA2066 support
|
|
l2tp_ip6_sendmsg needs to avoid accounting for the transport header
twice when splicing more data into an already partially-occupied skbuff.
To manage this, we check whether the skbuff contains data using
skb_queue_empty when deciding how much data to append using
ip6_append_data.
However, the code which performed the calculation was incorrect:
ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0;
...due to C operator precedence, this ends up setting ulen to
transhdrlen for messages with a non-zero length, which results in
corrupted packets on the wire.
Add parentheses to correct the calculation in line with the original
intent.
Fixes: 9d4c75800f61 ("ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()")
Cc: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220122156.43131-1-tparkin@katalix.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) If user requests to wake up a table and hook fails, restore the
dormant flag from the error path, from Florian Westphal.
2) Reset dst after transferring it to the flow object, otherwise dst
gets released twice from the error path.
3) Release dst in case the flowtable selects a direct xmit path, eg.
transmission to bridge port. Otherwise, dst is memleaked.
4) Register basechain and flowtable hooks at the end of the command.
Error path releases these datastructure without waiting for the
rcu grace period.
5) Use kzalloc() to initialize struct nft_hook to fix a KMSAN report
on access to hook type, also from Florian Westphal.
netfilter pull request 24-02-22
* tag 'nf-24-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: use kzalloc for hook allocation
netfilter: nf_tables: register hooks last when adding new chain/flowtable
netfilter: nft_flow_offload: release dst in case direct xmit path is used
netfilter: nft_flow_offload: reset dst in route object after setting up flow
netfilter: nf_tables: set dormant flag on hook register failure
====================
Link: https://lore.kernel.org/r/20240222000843.146665-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-02-22
The following pull-request contains BPF updates for your *net* tree.
We've added 11 non-merge commits during the last 24 day(s) which contain
a total of 15 files changed, 217 insertions(+), 17 deletions(-).
The main changes are:
1) Fix a syzkaller-triggered oops when attempting to read the vsyscall
page through bpf_probe_read_kernel and friends, from Hou Tao.
2) Fix a kernel panic due to uninitialized iter position pointer in
bpf_iter_task, from Yafang Shao.
3) Fix a race between bpf_timer_cancel_and_free and bpf_timer_cancel,
from Martin KaFai Lau.
4) Fix a xsk warning in skb_add_rx_frag() (under CONFIG_DEBUG_NET)
due to incorrect truesize accounting, from Sebastian Andrzej Siewior.
5) Fix a NULL pointer dereference in sk_psock_verdict_data_ready,
from Shigeru Yoshida.
6) Fix a resolve_btfids warning when bpf_cpumask symbol cannot be
resolved, from Hari Bathini.
bpf-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready()
selftests/bpf: Add negtive test cases for task iter
bpf: Fix an issue due to uninitialized bpf_iter_task
selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel
bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel
selftest/bpf: Test the read of vsyscall page under x86-64
x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
bpf, scripts: Correct GPL license name
xsk: Add truesize to skb_add_rx_frag().
bpf: Fix warning for bpf_cpumask in verifier
====================
Link: https://lore.kernel.org/r/20240221231826.1404-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Commit bb726b753f75 ("net: phy: realtek: add support for
RTL8211F(D)(I)-VD-CG") extended support of the driver from the existing
support for RTL8211F(D)(I)-CG PHY to the newer RTL8211F(D)(I)-VD-CG PHY.
While that commit indicated that the RTL8211F_PHYCR2 register is not
supported by the "VD-CG" PHY model and therefore updated the corresponding
section in rtl8211f_config_init() to be invoked conditionally, the call to
"genphy_soft_reset()" was left as-is, when it should have also been invoked
conditionally. This is because the call to "genphy_soft_reset()" was first
introduced by the commit 0a4355c2b7f8 ("net: phy: realtek: add dt property
to disable CLKOUT clock") since the RTL8211F guide indicates that a PHY
reset should be issued after setting bits in the PHYCR2 register.
As the PHYCR2 register is not applicable to the "VD-CG" PHY model, fix the
rtl8211f_config_init() function by invoking "genphy_soft_reset()"
conditionally based on the presence of the "PHYCR2" register.
Fixes: bb726b753f75 ("net: phy: realtek: add support for RTL8211F(D)(I)-VD-CG")
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220070007.968762-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Justin Iurman says:
====================
ioam6: fix write to cloned skb's
Make sure the IOAM data insertion is not applied on cloned skb's. As a
consequence, ioam selftests needed a refactoring.
====================
Link: https://lore.kernel.org/r/20240219135255.15429-1-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ioam6_parser uses a packet socket. After the fix to prevent writing to
cloned skb's, the receiver does not see its IOAM data anymore, which
makes input/forward ioam-selftests to fail. As a workaround,
ioam6_parser now uses an IPv6 raw socket and leverages ancillary data to
get hop-by-hop options. As a consequence, the hook is "after" the IOAM
data insertion by the receiver and all tests are working again.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ioam6_fill_trace_data() writes inside the skb payload without ensuring
it's writeable (e.g., not cloned). This function is called both from the
input and output path. The output path (ioam6_iptunnel) already does the
check. This commit provides a fix for the input path, inside
ipv6_hop_ioam(). It also updates ip6_parse_tlv() to refresh the network
header pointer ("nh") when returning from ipv6_hop_ioam().
Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The receive queues are protected by their respective spin-lock, not
the socket lock. This could lead to skb_peek() unexpectedly
returning NULL or a pointer to an already dequeued socket buffer.
Fixes: 9641458d3ec4 ("Phonet: Pipe End Point for Phonet Pipes protocol")
Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com>
Link: https://lore.kernel.org/r/20240218081214.4806-2-remi@remlab.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The receive queue is protected by its embedded spin-lock, not the
socket lock, so we need the former lock here (and only that one).
Fixes: 107d0d9b8d9a ("Phonet: Phonet datagram transport protocol")
Reported-by: Luosili <rootlab@huawei.com>
Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240218081214.4806-1-remi@remlab.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The existing documentation was not telling that one has to create a PPP
channel and a PPP interface to get PPPoL2TP data offloading working.
Also, tunnel switching was not mentioned, so that people were thinking
it was not supported, while it actually is.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Tom Parkin <tparkin@katalix.com>
Link: https://lore.kernel.org/r/20240217211425.qj576u3jmaa6yidf@begin
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Both registers used when doing manual injection or fdma injection are
shared between all the net devices of the switch. It was noticed that
when having two process which each of them trying to inject frames on
different ethernet ports, that the HW started to behave strange, by
sending out more frames then expected. When doing fdma injection it is
required to set the frame in the DCB and then make sure that the next
pointer of the last DCB is invalid. But because there is no locks for
this, then easily this pointer between the DCB can be broken and then it
would create a loop of DCBs. And that means that the HW will
continuously transmit these frames in a loop. Until the SW will break
this loop.
Therefore to fix this issue, add a spin lock for when accessing the
registers for manual or fdma injection.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Fixes: f3cad2611a77 ("net: sparx5: add hostmode with phylink support")
Link: https://lore.kernel.org/r/20240219080043.1561014-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As IDR can't protect itself from the concurrent modification, place
idr_remove() under the protection of tp->lock.
Fixes: 08a0063df3ae ("net/sched: flower: Move filter handle initialization earlier")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20240220085928.9161-1-jianbol@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Unlike other commands, due to a c&p error, port dump fills-up cmd with
wrong value, different from port-get request cmd, port-get doit reply
and port notification.
Fix it by filling cmd with value DEVLINK_CMD_PORT_NEW.
Skimmed through devlink userspace implementations, none of them cares
about this cmd value. Only ynl, for which, this is actually a fix, as it
expects doit and dumpit ops rsp_value to be the same.
Omit the fixes tag, even thought this is fix, better to target this for
next release.
Fixes: bfcd3a466172 ("Introduce devlink infrastructure")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240220075245.75416-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix EST offset for dwmac 5.10.
Currently configuring Qbv doesn't work as expected. The schedule is
configured, but never confirmed:
|[ 128.250219] imx-dwmac 428a0000.ethernet eth1: configured EST
The reason seems to be the refactoring of the EST code which set the wrong
EST offset for the dwmac 5.10. After fixing this it works as before:
|[ 106.359577] imx-dwmac 428a0000.ethernet eth1: configured EST
|[ 128.430715] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched
Tested on imx93.
Fixes: c3f3b97238f6 ("net: stmmac: Refactor EST implementation")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/20240220-stmmac_est-v1-1-c41f9ae2e7b7@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We want to re-organize the struct sock layout. The sk_peek_off
field location is problematic, as most protocols want it in the
RX read area, while UDP wants it on a cacheline different from
sk_receive_queue.
Create a local (inside udp_sock) copy of the 'peek offset is enabled'
flag and place it inside the same cacheline of reader_queue.
Check such flag before reading sk_peek_off. This will save potential
false sharing and cache misses in the fast-path.
Tested under UDP flood with small packets. The struct sock layout
update causes a 4% performance drop, and this patch restores completely
the original tput.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/67ab679c15fbf49fa05b3ffe05d91c47ab84f147.1708426665.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
tools: ynl: fix impossible errors
Fix bugs discovered while I was hacking in low level stuff in YNL
and kept breaking the socket, exercising the "impossible" error paths.
v1: https://lore.kernel.org/all/20240217001742.2466993-1-kuba@kernel.org/
====================
Link: https://lore.kernel.org/r/20240220161112.2735195-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Make sure to free the already-parsed mcast_groups if
we don't get an ack from the kernel when reading family info.
This is part of the ynl_sock_create() error path, so we won't
get a call to ynl_sock_destroy() to free them later.
Fixes: 86878f14d71a ("tools: ynl: user space helpers")
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240220161112.2735195-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is one common error handler in ynl - ynl_cb_error().
It expects priv to be a pointer to struct ynl_parse_arg AKA yarg.
To avoid potential crashes if we encounter a stray NLMSG_ERROR
always pass yarg as priv (or a struct which has it as the first
member).
ynl_cb_null() has a similar problem directly - it expects yarg
but priv passed by the caller is ys.
Found by code inspection.
Fixes: 86878f14d71a ("tools: ynl: user space helpers")
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240220161112.2735195-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We may hold an extra reference on a socket if a tag allocation fails: we
optimistically allocate the sk_key, and take a ref there, but do not
drop if we end up not using the allocated key.
Ensure we're dropping the sock on this failure by doing a proper unref
rather than directly kfree()ing.
Fixes: de8a6b15d965 ("net: mctp: add an explicit reference from a mctp_sk_key to sock")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/ce9b61e44d1cdae7797be0c5e3141baf582d23a0.1707983487.git.jk@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
KMSAN reports unitialized variable when registering the hook,
reg->hook_ops_type == NF_HOOK_OP_BPF)
~~~~~~~~~~~ undefined
This is a small structure, just use kzalloc to make sure this
won't happen again when new fields get added to nf_hook_ops.
Fixes: 7b4b2fa37587 ("netfilter: annotate nf_tables base hook ops")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Register hooks last when adding chain/flowtable to ensure that packets do
not walk over datastructure that is being released in the error path
without waiting for the rcu grace period.
Fixes: 91c7b38dc9f0 ("netfilter: nf_tables: use new transaction infrastructure to handle chain")
Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Direct xmit does not use it since it calls dev_queue_xmit() to send
packets, hence it calls dst_release().
kmemleak reports:
unreferenced object 0xffff88814f440900 (size 184):
comm "softirq", pid 0, jiffies 4294951896
hex dump (first 32 bytes):
00 60 5b 04 81 88 ff ff 00 e6 e8 82 ff ff ff ff .`[.............
21 0b 50 82 ff ff ff ff 00 00 00 00 00 00 00 00 !.P.............
backtrace (crc cb2bf5d6):
[<000000003ee17107>] kmem_cache_alloc+0x286/0x340
[<0000000021a5de2c>] dst_alloc+0x43/0xb0
[<00000000f0671159>] rt_dst_alloc+0x2e/0x190
[<00000000fe5092c9>] __mkroute_output+0x244/0x980
[<000000005fb96fb0>] ip_route_output_flow+0xc0/0x160
[<0000000045367433>] nf_ip_route+0xf/0x30
[<0000000085da1d8e>] nf_route+0x2d/0x60
[<00000000d1ecd1cb>] nft_flow_route+0x171/0x6a0 [nft_flow_offload]
[<00000000d9b2fb60>] nft_flow_offload_eval+0x4e8/0x700 [nft_flow_offload]
[<000000009f447dbb>] expr_call_ops_eval+0x53/0x330 [nf_tables]
[<00000000072e1be6>] nft_do_chain+0x17c/0x840 [nf_tables]
[<00000000d0551029>] nft_do_chain_inet+0xa1/0x210 [nf_tables]
[<0000000097c9d5c6>] nf_hook_slow+0x5b/0x160
[<0000000005eccab1>] ip_forward+0x8b6/0x9b0
[<00000000553a269b>] ip_rcv+0x221/0x230
[<00000000412872e5>] __netif_receive_skb_one_core+0xfe/0x110
Fixes: fa502c865666 ("netfilter: flowtable: simplify route logic")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|