summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-09-09idpf: enable WB_ON_ITRJoshua Hay5-3/+41
Tell hardware to write back completed descriptors even when interrupts are disabled. Otherwise, descriptors might not be written back until the hardware can flush a full cacheline of descriptors. This can cause unnecessary delays when traffic is light (or even trigger Tx queue timeout). The example scenario to reproduce the Tx timeout if the fix is not applied: - configure at least 2 Tx queues to be assigned to the same q_vector, - generate a huge Tx traffic on the first Tx queue - try to send a few packets using the second Tx queue. In such a case Tx timeout will appear on the second Tx queue because no completion descriptors are written back for that queue while interrupts are disabled due to NAPI polling. Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: fix netdev Tx queue stop/wakeMichal Kubiak3-27/+21
netif_txq_maybe_stop() returns -1, 0, or 1, while idpf_tx_maybe_stop_common() says it returns 0 or -EBUSY. As a result, there sometimes are Tx queue timeout warnings despite that the queue is empty or there is at least enough space to restart it. Make idpf_tx_maybe_stop_common() inline and returning true or false, handling the return of netif_txq_maybe_stop() properly. Use a correct goto in idpf_tx_maybe_stop_splitq() to avoid stopping the queue or incrementing the stops counter twice. Fixes: 6818c4d5b3c2 ("idpf: add splitq start_xmit") Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Cc: stable@vger.kernel.org # 6.7+ Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: refactor Tx completion routinesJoshua Hay3-76/+122
Add a mechanism to guard against stashing partial packets into the hash table to make the driver more robust, with more efficient decision making when cleaning. Don't stash partial packets. This can happen when an RE (Report Event) completion is received in flow scheduling mode, or when an out of order RS (Report Status) completion is received. The first buffer with the skb is stashed, but some or all of its frags are not because the stack is out of reserve buffers. This leaves the ring in a weird state since the frags are still on the ring. Use the field libeth_sqe::nr_frags to track the number of fragments/tx_bufs representing the packet. The clean routines check to make sure there are enough reserve buffers on the stack before stashing any part of the packet. If there are not, next_to_clean is left pointing to the first buffer of the packet that failed to be stashed. This leaves the whole packet on the ring, and the next time around, cleaning will start from this packet. An RS completion is still expected for this packet in either case. So instead of being cleaned from the hash table, it will be cleaned from the ring directly. This should all still be fine since the DESC_UNUSED and BUFS_UNUSED will reflect the state of the ring. If we ever fall below the thresholds, the TxQ will still be stopped, giving the completion queue time to catch up. This may lead to stopping the queue more frequently, but it guarantees the Tx ring will always be in a good state. Also, always use the idpf_tx_splitq_clean function to clean descriptors, i.e. use it from clean_buf_ring as well. This way we avoid duplicating the logic and make sure we're using the same reserve buffers guard rail. This does require a switch from the s16 next_to_clean overflow descriptor ring wrap calculation to u16 and the normal ring size check. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09netdevice: add netdev_tx_reset_subqueue() shorthandAlexander Lobakin1-1/+12
Add a shorthand similar to other net*_subqueue() helpers for resetting the queue by its index w/o obtaining &netdev_tx_queue beforehand manually. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: convert to libeth Tx buffer completionAlexander Lobakin3-232/+105
&idpf_tx_buffer is almost identical to the previous generations, as well as the way it's handled. Moreover, relying on dma_unmap_addr() and !!buf->skb instead of explicit defining of buffer's type was never good. Use the newly added libeth helpers to do it properly and reduce the copy-paste around the Tx code. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09libeth: add Tx buffer completion helpersAlexander Lobakin2-0/+154
Software-side Tx buffers for storing DMA, frame size, skb pointers etc. are pretty much generic and every driver defines them the same way. The same can be said for software Tx completions -- same napi_consume_skb()s and all that... Add a couple simple wrappers for doing that to stop repeating the old tale at least within the Intel code. Drivers are free to use 'priv' member at the end of the structure. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09Merge branch 'unmask-dscp-part-four'David S. Miller10-14/+23
Ido Schimmel says: ==================== Unmask upper DSCP bits - part 4 (last) tl;dr - This patchset finishes to unmask the upper DSCP bits in the IPv4 flow key in preparation for allowing IPv4 FIB rules to match on DSCP. No functional changes are expected. The TOS field in the IPv4 flow key ('flowi4_tos') is used during FIB lookup to match against the TOS selector in FIB rules and routes. It is currently impossible for user space to configure FIB rules that match on the DSCP value as the upper DSCP bits are either masked in the various call sites that initialize the IPv4 flow key or along the path to the FIB core. In preparation for adding a DSCP selector to IPv4 and IPv6 FIB rules, we need to make sure the entire DSCP value is present in the IPv4 flow key. This patchset finishes to unmask the upper DSCP bits by adjusting all the callers of ip_route_output_key() to properly initialize the full DSCP value in the IPv4 flow key. No functional changes are expected as commit 1fa3314c14c6 ("ipv4: Centralize TOS matching") moved the masking of the upper DSCP bits to the core where 'flowi4_tos' is matched against the TOS selector. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09sctp: Unmask upper DSCP bits in sctp_v4_get_dst()Ido Schimmel1-1/+2
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Note that the 'tos' variable holds the full DS field. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: udp_tunnel: Unmask upper DSCP bits in udp_tunnel_dst_lookup()Ido Schimmel1-1/+2
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Note that callers of udp_tunnel_dst_lookup() pass the entire DS field in the 'tos' argument. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09netfilter: nf_dup4: Unmask upper DSCP bits in nf_dup_ipv4_route()Ido Schimmel1-1/+2
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09netfilter: nft_flow_offload: Unmask upper DSCP bits in nft_flow_route()Ido Schimmel1-1/+2
Unmask the upper DSCP bits when calling nf_route() which eventually calls ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: netfilter: Unmask upper DSCP bits in ip_route_me_harder()Ido Schimmel1-1/+2
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_xmit()Ido Schimmel1-1/+1
Unmask the upper DSCP bits when initializing an IPv4 flow key via ip_tunnel_init_flow() before passing it to ip_route_output_key() so that in the future we could perform the FIB lookup according to the full DSCP value. Note that the 'tos' variable includes the full DS field. Either the one specified as part of the tunnel parameters or the one inherited from the inner packet. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: ip_tunnel: Unmask upper DSCP bits in ip_md_tunnel_xmit()Ido Schimmel1-3/+4
Unmask the upper DSCP bits when initializing an IPv4 flow key via ip_tunnel_init_flow() before passing it to ip_route_output_key() so that in the future we could perform the FIB lookup according to the full DSCP value. Note that the 'tos' variable includes the full DS field. Either the one specified via the tunnel key or the one inherited from the inner packet. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_bind_dev()Ido Schimmel1-1/+1
Unmask the upper DSCP bits when initializing an IPv4 flow key via ip_tunnel_init_flow() before passing it to ip_route_output_key() so that in the future we could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: icmp: Unmask upper DSCP bits in icmp_reply()Ido Schimmel1-1/+1
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09bpf: lwtunnel: Unmask upper DSCP bits in bpf_lwt_xmit_reroute()Ido Schimmel1-1/+2
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: ip_gre: Unmask upper DSCP bits in ipgre_open()Ido Schimmel1-1/+2
Unmask the upper DSCP bits when calling ip_route_output_gre() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09netfilter: br_netfilter: Unmask upper DSCP bits in br_nf_pre_routing_finish()Ido Schimmel1-1/+2
Unmask upper DSCP bits when calling ip_route_output() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09net: sysfs: Fix weird usage of class's namespace relevant fieldsZijun Hu1-2/+2
Device class has two namespace relevant fields which are associated by the following usage: struct class { ... const struct kobj_ns_type_operations *ns_type; const void *(*namespace)(const struct device *dev); ... } if (dev->class && dev->class->ns_type) dev->class->namespace(dev); The usage looks weird since it checks @ns_type but calls namespace() it is found for all existing class definitions that the other filed is also assigned once one is assigned in current kernel tree, so fix this weird usage by checking @namespace to call namespace(). Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09Merge branch 'fs_enet-cleanup'David S. Miller8-307/+219
Maxime Chevallier says: ==================== net: ethernet: fs_enet: Cleanup and phylink conversion This is V3 of a series that cleans-up fs_enet, with the ultimate goal of converting it to phylink (patch 8). The main changes compared to V2 are : - Reviewed-by tags from Andrew were gathered - Patch 5 now includes the removal of now unused includes, thanks Andrew for spotting this - Patch 4 is new, it reworks the adjust_link to move the spinlock acquisition to a more suitable location. Although this dissapears in the actual phylink port, it makes the phylink conversion clearer on that point - Patch 8 includes fixes in the tx_timeout cancellation, to prevent taking rtnl twice when canceling a pending tx_timeout. Thanks Jakub for spotting this. Link to V2: https://lore.kernel.org/netdev/20240829161531.610874-1-maxime.chevallier@bootlin.com/ Link to V1: https://lore.kernel.org/netdev/20240828095103.132625-1-maxime.chevallier@bootlin.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09net: ethernet: fs_enet: phylink conversionMaxime Chevallier6-122/+132
fs_enet is a quite old but still used Ethernet driver found on some NXP devices. It has support for 10/100 Mbps ethernet, with half and full duplex. Some variants of it can use RMII, while other integrations are MII-only. Add phylink support, thus removing custom fixed-link hanldling. This also allows removing some internal flags such as the use_rmii flag. Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09net: ethernet: fs_enet: simplify clock handling with devm accessorsMaxime Chevallier2-14/+4
devm_clock_get_enabled() can be used to simplify clock handling for the PER register clock. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09net: ethernet: fs_enet: use macros for speed and duplex valuesMaxime Chevallier3-4/+4
The PHY speed and duplex should be manipulated using the SPEED_XXX and DUPLEX_XXX macros available. Use it in the fcc, fec and scc MAC for fs_enet. Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09net: ethernet: fs_enet: drop unused phy_info and mii_if_infoMaxime Chevallier5-16/+0
There's no user of the struct phy_info, the 'phy' field and the mii_if_info in the fs_enet driver, probably dating back when phylib wasn't as widely used. Drop these from the driver code. As the definition for struct mii_if_info is no longer required, drop the include for linux/mii.h altogether in the driver. Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09net: ethernet: fs_enet: only protect the .restart() call in .adjust_linkMaxime Chevallier1-12/+6
When .adjust_link() gets called, it runs in thread context, with the phydev->lock held. We only need to protect the fep->fecp/fccp/sccp register that are accessed within the .restart() function from concurrent access from the interrupts. These registers are being protected by the fep->lock spinlock, so we can move the spinlock protection around the .restart() call instead of the entire adjust_link() call. By doing so, we can simplify further the .adjust_link() callback and avoid the intermediate helper. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09net: ethernet: fs_enet: drop the .adjust_link custom fs_opsMaxime Chevallier2-7/+1
There's no in-tree user for the fs_ops .adjust_link() function, so we can always use the generic one in fe_enet-main. Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09net: ethernet: fs_enet: cosmetic cleanupsMaxime Chevallier1-131/+89
Due to the age of the driver and the slow recent activity on it, the code has taken some layers of dust. Clean the main driver file up so that it passes checkpatch and also conforms with the net coding style. Changes include : - Re-ordering of the variable declarations for RCT - Fixing the comment styles to either one-line comments, or net-style comments - Adding braces around single-statement 'else' clauses - Aligning function/macro parameters on the opening parenthesis - Simplifying checks for NULL pointers - Splitting cascaded assignments into individual assignments - Fixing some typos - Fixing whitespace issues This is a cosmetic change and doesn't introduce any change in behaviour. Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09net: ethernet: fs_enet: convert to SPDXMaxime Chevallier6-24/+6
The ENET driver has SPDX tags in the header files, but they were missing in the C files. Change the licence information to SPDX format. Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-08ptp/ioctl: support MONOTONIC{,_RAW} timestamps for PTP_SYS_OFFSET_EXTENDEDMahesh Bandewar3-12/+56
The ability to read the PHC (Physical Hardware Clock) alongside multiple system clocks is currently dependent on the specific hardware architecture. This limitation restricts the use of PTP_SYS_OFFSET_PRECISE to certain hardware configurations. The generic soultion which would work across all architectures is to read the PHC along with the latency to perform PHC-read as offered by PTP_SYS_OFFSET_EXTENDED which provides pre and post timestamps. However, these timestamps are currently limited to the CLOCK_REALTIME timebase. Since CLOCK_REALTIME is affected by NTP (or similar time synchronization services), it can experience significant jumps forward or backward. This hinders the precise latency measurements that PTP_SYS_OFFSET_EXTENDED is designed to provide. This problem could be addressed by supporting MONOTONIC_RAW timestamps within PTP_SYS_OFFSET_EXTENDED. Unlike CLOCK_REALTIME or CLOCK_MONOTONIC, the MONOTONIC_RAW timebase is unaffected by NTP adjustments. This enhancement can be implemented by utilizing one of the three reserved words within the PTP_SYS_OFFSET_EXTENDED struct to pass the clock-id for timestamps. The current behavior aligns with clock-id for CLOCK_REALTIME timebase (value of 0), ensuring backward compatibility of the UAPI. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-08net: sched: consistently use rcu_replace_pointer() in taprio_change()Dmitry Antipov1-1/+3
According to Vinicius (and carefully looking through the whole https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa once again), txtime branch of 'taprio_change()' is not going to race against 'advance_sched()'. But using 'rcu_replace_pointer()' in the former may be a good idea as well. Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-07Merge tag 'nf-next-24-09-06' of ↵Jakub Kicinski38-178/+206
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: Patch #1 adds ctnetlink support for kernel side filtering for deletions, from Changliang Wu. Patch #2 updates nft_counter support to Use u64_stats_t, from Sebastian Andrzej Siewior. Patch #3 uses kmemdup_array() in all xtables frontends, from Yan Zhen. Patch #4 is a oneliner to use ERR_CAST() in nf_conntrack instead opencoded casting, from Shen Lichuan. Patch #5 removes unused argument in nftables .validate interface, from Florian Westphal. Patch #6 is a oneliner to correct a typo in nftables kdoc, from Simon Horman. Patch #7 fixes missing kdoc in nftables, also from Simon. Patch #8 updates nftables to handle timeout less than CONFIG_HZ. Patch #9 rejects element expiration if timeout is zero, otherwise it is silently ignored. Patch #10 disallows element expiration larger than timeout. Patch #11 removes unnecessary READ_ONCE annotation while mutex is held. Patch #12 adds missing READ_ONCE/WRITE_ONCE annotation in dynset. Patch #13 annotates data-races around element expiration. Patch #14 allocates timeout and expiration in one single set element extension, they are tighly couple, no reason to keep them separated anymore. Patch #15 updates nftables to interpret zero timeout element as never times out. Note that it is already possible to declare sets with elements that never time out but this generalizes to all kind of set with timeouts. Patch #16 supports for element timeout and expiration updates. * tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: set element timeout update support netfilter: nf_tables: zero timeout means element never times out netfilter: nf_tables: consolidate timeout extension for elements netfilter: nf_tables: annotate data-races around element expiration netfilter: nft_dynset: annotate data-races around set timeout netfilter: nf_tables: remove annotation to access set timeout while holding lock netfilter: nf_tables: reject expiration higher than timeout netfilter: nf_tables: reject element expiration with no timeout netfilter: nf_tables: elements with timeout below CONFIG_HZ never expire netfilter: nf_tables: Add missing Kernel doc netfilter: nf_tables: Correct spelling in nf_tables.h netfilter: nf_tables: drop unused 3rd argument from validate callback ops netfilter: conntrack: Convert to use ERR_CAST() netfilter: Use kmemdup_array instead of kmemdup for multiple allocation netfilter: nft_counter: Use u64_stats_t for statistic. netfilter: ctnetlink: support CTA_FILTER for flush ==================== Link: https://patch.msgid.link/20240905232920.5481-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07ptp: ocp: Improve PCIe delay estimationVadim Fedorenko1-9/+11
The PCIe bus can be pretty busy during boot and probe function can see excessive delays. Let's find the minimal value out of several tests and use it as estimated value. Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20240905140028.560454-1-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07netpoll: remove netpoll_srcuEric Dumazet1-11/+4
netpoll_srcu is currently used from netpoll_poll_disable() and __netpoll_cleanup() Both functions run under RTNL, using netpoll_srcu adds confusion and no additional protection. Moreover the synchronize_srcu() call in __netpoll_cleanup() is performed before clearing np->dev->npinfo, which violates RCU rules. After this patch, netpoll_poll_disable() and netpoll_poll_enable() simply use rtnl_dereference(). This saves a big chunk of memory (more than 192KB on platforms with 512 cpus) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20240905084909.2082486-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07Merge branch 'octeontx2-address-some-warnings'Jakub Kicinski2-3/+3
Simon Horman says: ==================== octeontx2: Address some warnings This patchset addresses some warnings flagged by Sparse, gcc-14, and clang-18 in files touched by recent patch submissions. Although these changes do not alter the functionality of the code, by addressing them real problems introduced in future which are flagged by Sparse will stand out more readily. v1: https://lore.kernel.org/20240903-octeontx2-sparse-v1-0-f190309ecb0a@kernel.org ==================== Link: https://patch.msgid.link/20240904-octeontx2-sparse-v2-0-14f2305fe4b2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07octeontx2-pf: Make iplen __be16 in otx2_sqe_add_ext()Simon Horman1-1/+1
In otx2_sqe_add_ext() iplen is used to hold a 16-bit big-endian value, but it's type is u16, indicating a host byte order integer. Address this mismatch by changing the type of iplen to __be16. Flagged by Sparse as: .../otx2_txrx.c:699:31: warning: incorrect type in assignment (different base types) .../otx2_txrx.c:699:31: expected unsigned short [usertype] iplen .../otx2_txrx.c:699:31: got restricted __be16 [usertype] .../otx2_txrx.c:701:54: warning: incorrect type in assignment (different base types) .../otx2_txrx.c:701:54: expected restricted __be16 [usertype] tot_len .../otx2_txrx.c:701:54: got unsigned short [usertype] iplen .../otx2_txrx.c:704:60: warning: incorrect type in assignment (different base types) .../otx2_txrx.c:704:60: expected restricted __be16 [usertype] payload_len .../otx2_txrx.c:704:60: got unsigned short [usertype] iplen Introduced in commit dc1a9bf2c816 ("octeontx2-pf: Add UDP segmentation offload support") No functional change intended. Compile tested only by author. Tested-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240904-octeontx2-sparse-v2-2-14f2305fe4b2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07octeontx2-af: Pass string literal as format argument of alloc_workqueue()Simon Horman1-2/+2
Recently I noticed that both gcc-14 and clang-18 report that passing a non-string literal as the format argument of alloc_workqueue() is potentially insecure. E.g. clang-18 says: .../rvu.c:2493:32: warning: format string is not a string literal (potentially insecure) [-Wformat-security] 2493 | mw->mbox_wq = alloc_workqueue(name, | ^~~~ .../rvu.c:2493:32: note: treat the string as an argument to avoid this 2493 | mw->mbox_wq = alloc_workqueue(name, | ^ | "%s", It is always the case where the contents of name is safe to pass as the format argument. That is, in my understanding, it never contains any format escape sequences. But, it seems better to be safe than sorry. And, as a bonus, compiler output becomes less verbose by addressing this issue as suggested by clang-18. Compile tested only by author. Tested-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240904-octeontx2-sparse-v2-1-14f2305fe4b2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07net: phy: qca83xx: use PHY_ID_MATCH_EXACTRosen Penev1-7/+3
No need for the mask when there's already a macro for this. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240904205659.7470-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07sfc: siena: rip out rss-context dead codeEdward Cree6-211/+8
Siena hardware does not support custom RSS contexts, but when the driver was forked from sfc.ko, some of the plumbing for them was copied across from the common code. Actually trying to use them would lead to EOPNOTSUPP as the relevant efx_nic_type methods were not populated. Remove this dead code from the Siena driver. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240904181156.1993666-1-edward.cree@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07Merge branch 'use-functionality-of-irq_get_trigger_type'Jakub Kicinski3-3/+3
Vasileios Amoiridis says: ==================== Use functionality of irq_get_trigger_type() v1: https://lore.kernel.org/20240902225534.130383-1-vassilisamir@gmail.com ==================== Link: https://patch.msgid.link/20240904151018.71967-1-vassilisamir@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07net: smc91x: Make use of irq_get_trigger_type()Vasileios Amoiridis1-1/+1
Convert irqd_get_trigger_type(irq_get_irq_data(irq)) cases to the more simple irq_get_trigger_type(irq). Signed-off-by: Vasileios Amoiridis <vassilisamir@gmail.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Link: https://patch.msgid.link/20240904151018.71967-4-vassilisamir@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07net: dsa: realtek: rtl8366rb: Make use of irq_get_trigger_type()Vasileios Amoiridis1-1/+1
Convert irqd_get_trigger_type(irq_get_irq_data(irq)) cases to the more simple irq_get_trigger_type(irq). Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Vasileios Amoiridis <vassilisamir@gmail.com> Link: https://patch.msgid.link/20240904151018.71967-3-vassilisamir@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07net: dsa: realtek: rtl8365mb: Make use of irq_get_trigger_type()Vasileios Amoiridis1-1/+1
Convert irqd_get_trigger_type(irq_get_irq_data(irq)) cases to the more simple irq_get_trigger_type(irq). Signed-off-by: Vasileios Amoiridis <vassilisamir@gmail.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Link: https://patch.msgid.link/20240904151018.71967-2-vassilisamir@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07net: tls: wait for async completion on last messageSascha Hauer1-1/+1
When asynchronous encryption is used KTLS sends out the final data at proto->close time. This becomes problematic when the task calling close() receives a signal. In this case it can happen that tcp_sendmsg_locked() called at close time returns -ERESTARTSYS and the final data is not sent. The described situation happens when KTLS is used in conjunction with io_uring, as io_uring uses task_work_add() to add work to the current userspace task. A discussion of the problem along with a reproducer can be found in [1] and [2] Fix this by waiting for the asynchronous encryption to be completed on the final message. With this there is no data left to be sent at close time. [1] https://lore.kernel.org/all/20231010141932.GD3114228@pengutronix.de/ [2] https://lore.kernel.org/all/20240315100159.3898944-1-s.hauer@pengutronix.de/ Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Link: https://patch.msgid.link/20240904-ktls-wait-async-v1-1-a62892833110@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07Merge branch 'make-use-of-the-helper-macro-list_head'Jakub Kicinski5-19/+9
Hongbo Li says: ==================== make use of the helper macro LIST_HEAD() The macro LIST_HEAD() declares a list variable and initializes it, which can be used to simplify the steps of list initialization, thereby simplifying the code. These serials just do some equivalatent substitutions, and with no functional modifications. ==================== Link: https://patch.msgid.link/20240904093243.3345012-1-lihongbo22@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07net/core: make use of the helper macro LIST_HEAD()Hongbo Li1-4/+2
list_head can be initialized automatically with LIST_HEAD() instead of calling INIT_LIST_HEAD(). Here we can simplify the code. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://patch.msgid.link/20240904093243.3345012-6-lihongbo22@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07net/ipv6: make use of the helper macro LIST_HEAD()Hongbo Li1-4/+2
list_head can be initialized automatically with LIST_HEAD() instead of calling INIT_LIST_HEAD(). Here we can simplify the code. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://patch.msgid.link/20240904093243.3345012-5-lihongbo22@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07net/netfilter: make use of the helper macro LIST_HEAD()Hongbo Li1-3/+1
list_head can be initialized automatically with LIST_HEAD() instead of calling INIT_LIST_HEAD(). Here we can simplify the code. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://patch.msgid.link/20240904093243.3345012-4-lihongbo22@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07net/tipc: make use of the helper macro LIST_HEAD()Hongbo Li1-4/+2
list_head can be initialized automatically with LIST_HEAD() instead of calling INIT_LIST_HEAD(). Here we can simplify the code. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://patch.msgid.link/20240904093243.3345012-3-lihongbo22@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-07net/ipv4: make use of the helper macro LIST_HEAD()Hongbo Li1-4/+2
list_head can be initialized automatically with LIST_HEAD() instead of calling INIT_LIST_HEAD(). Here we can simplify the code. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://patch.msgid.link/20240904093243.3345012-2-lihongbo22@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>