summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2022-09-23net: macsec: remove the prepare flag from the MACsec offloading contextAntoine Tenart1-2/+0
Now that the MACsec offloading preparation phase was removed from the MACsec core implementation as well as from drivers implementing it, we can safely remove the flag representing it. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-23net: phylink: Adjust advertisement based on rate matchingSean Anderson1-1/+2
This adds support for adjusting the advertisement for pause-based rate matching. This may result in a lossy link, since the final link settings are not adjusted. Asymmetric pause support is necessary. It would be possible for a MAC supporting only symmetric pause to use pause-based rate adaptation, but only if pause reception was enabled as well. Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23net: phylink: Adjust link settings based on rate matchingSean Anderson1-0/+5
If the phy is configured to use pause-based rate matching, ensure that the link is full duplex with pause frame reception enabled. As suggested, if pause-based rate matching is enabled by the phy, then pause reception is unconditionally enabled. The interface duplex is determined based on the rate matching type. When rate matching is enabled, so is the speed. We assume the maximum interface speed is used. This is only relevant for MLO_AN_PHY. For MLO_AN_INBAND, the MAC/PCS's view of the interface speed will be used. Although there are no RATE_ADAPT_CRS phys in-tree, it has been added for comparison (and the implementation is quite simple). Co-developed-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23net: phy: Add support for rate matchingSean Anderson3-3/+38
This adds support for rate matching (also known as rate adaptation) to the phy subsystem. The general idea is that the phy interface runs at one speed, and the MAC throttles the rate at which it sends packets to the link speed. There's a good overview of several techniques for achieving this at [1]. This patch adds support for three: pause-frame based (such as in Aquantia phys), CRS-based (such as in 10PASS-TS and 2BASE-TL), and open-loop-based (such as in 10GBASE-W). This patch makes a few assumptions and a few non assumptions about the types of rate matching available. First, it assumes that different phys may use different forms of rate matching. Second, it assumes that phys can use rate matching for any of their supported link speeds (e.g. if a phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T). Third, it does not assume that all interface modes will use the same form of rate matching. Fourth, it does not assume that all phy devices will support rate matching (even if some do). Relaxing or strengthening these (non-)assumptions could result in a different API. For example, if all interface modes were assumed to use the same form of rate matching, then a bitmask of interface modes supportting rate matching would suffice. For some better visibility into the process, the current rate matching mode is exposed as part of the ethtool ksettings. For the moment, only read access is supported. I'm not sure what userspace might want to configure yet (disable it altogether, disable just one mode, specify the mode to use, etc.). For the moment, since only pause-based rate adaptation support is added in the next few commits, rate matching can be disabled altogether by adjusting the advertisement. 802.3 calls this feature "rate adaptation" in clause 49 (10GBASE-R) and "rate matching" in clause 61 (10PASS-TL and 2BASE-TS). Aquantia also calls this feature "rate adaptation". I chose "rate matching" because it is shorter, and because Russell doesn't think "adaptation" is correct in this context. Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23net: phylink: Generate caps and convert to linkmodes separatelySean Anderson1-2/+2
If we call phylink_caps_to_linkmodes directly from phylink_get_linkmodes, it is difficult to re-use this functionality in MAC drivers. This is because MAC drivers must then work with an ethtool linkmode bitmap, instead of with mac capabilities. Instead, let the caller of phylink_get_linkmodes do the conversion. To reflect this change, rename the function to phylink_get_capabilities. Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23net: phylink: Export phylink_caps_to_linkmodesSean Anderson1-0/+1
This function is convenient for MAC drivers. They can use it to add or remove particular link modes based on capabilities (such as if half duplex is not supported for a particular interface mode). Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23net: phylink: Document MAC_(A)SYM_PAUSESean Anderson1-0/+29
This documents the possible MLO_PAUSE_* settings which can result from different combinations of MAC_(A)SYM_PAUSE. Special note is paid to settings which can result from user configuration (MLO_PAUSE_AN). The autonegotiation results are more-or-less a direct consequence of IEEE 802.3 Table 28B-2. Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23net/mlx5e: Support MACsec offload extended packet number (EPN)Emeel Hakim1-0/+8
MACsec EPN splits the packet number (PN) into two 32-bits fields, epn_lsb (32 least significant bits (LSBs) of PN) and epn_msb (32 most significant bits (MSBs) of PN). Epn_msb bits are managed by SW and for that HW is required to send an object change event of type EPN event notifying the SW to update the epn_msb in addition, once epn_msb is updated SW update HW with the new epn_msb value for HW to perform replay protection. To prevent HW from stopping while handling the event, SW manages another bit for HW called epn_overlap, HW uses the latter to get an indication regarding how to read the epn_msb value correctly while still receiving packets. Add epn event handling that updates the epn_overlap and epn_msb for every 2^31 packets according to the following logic: if epn_lsb crosses 2^31 (half sequence number wraparound) upon HW relevant event, SW updates the esn_overlap value to OLD (value = 1). When the epn_lsb crosses 2^32 (full sequence number wraparound) upon HW relevant event, SW updates the esn_overlap to NEW (value = 0) and increment the esn_msb. When using MACsec EPN a salt and short secure channel id (ssci) needs to be provided by the user, when offloading EPN need to pass this salt and ssci to the HW to be used in the initial vector (IV) calculations. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-23net/mlx5: Add ifc bits for MACsec extended packet number (EPN) and replay ↵Emeel Hakim1-0/+29
protection Add ifc bits related to advanced steering operations (ASO) and general object modify for macsec to use as part of offloading EPN and replay protection features. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-23net/mlx5: Fix fields name prefix in MACsecEmeel Hakim1-3/+3
Fix ifc fields name to be consistent with the device spec document. Fixes: 8385c51ff5bc ("net/mlx5: Introduce MACsec Connect-X offload hardware bits and structures") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-23net/sched: sch_api: add helper for tc qdisc walker stats dumpZhengchao Shao1-0/+13
The walk implementation of most qdisc class modules is basically the same. That is, the values of count and skip are checked first. If count is greater than or equal to skip, the registered fn function is executed. Otherwise, increase the value of count. So we can reconstruct them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-23net/tls: Describe ciphers sizes by const structsTariq Toukan1-0/+10
Introduce cipher sizes descriptor. It helps reducing the amount of code duplications and repeated switch/cases that assigns the proper sizes according to the cipher type. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski17-32/+78
drivers/net/ethernet/freescale/fec.h 7b15515fc1ca ("Revert "fec: Restart PPS after link state change"") 40c79ce13b03 ("net: fec: add stop mode support for imx8 platform") https://lore.kernel.org/all/20220921105337.62b41047@canb.auug.org.au/ drivers/pinctrl/pinctrl-ocelot.c c297561bc98a ("pinctrl: ocelot: Fix interrupt controller") 181f604b33cd ("pinctrl: ocelot: add ability to be used in a non-mmio configuration") https://lore.kernel.org/all/20220921110032.7cd28114@canb.auug.org.au/ tools/testing/selftests/drivers/net/bonding/Makefile bbb774d921e2 ("net: Add tests for bonding and team address list management") 152e8ec77640 ("selftests/bonding: add a test for bonding lladdr target") https://lore.kernel.org/all/20220921110437.5b7dbd82@canb.auug.org.au/ drivers/net/can/usb/gs_usb.c 5440428b3da6 ("can: gs_usb: gs_can_open(): fix race dev->can.state condition") 45dfa45f52e6 ("can: gs_usb: add RX and TX hardware timestamp support") https://lore.kernel.org/all/84f45a7d-92b6-4dc5-d7a1-072152fab6ff@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22Merge tag 'net-6.0-rc7' of ↵Linus Torvalds4-4/+40
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wifi, netfilter and can. A handful of awaited fixes here - revert of the FEC changes, bluetooth fix, fixes for iwlwifi spew. We added a warning in PHY/MDIO code which is triggering on a couple of platforms in a false-positive-ish way. If we can't iron that out over the week we'll drop it and re-add for 6.1. I've added a new "follow up fixes" section for fixes to fixes in 6.0-rcs but it may actually give the false impression that those are problematic or that more testing time would have caught them. So likely a one time thing. Follow up fixes: - nf_tables_addchain: fix nft_counters_enabled underflow - ebtables: fix memory leak when blob is malformed - nf_ct_ftp: fix deadlock when nat rewrite is needed Current release - regressions: - Revert "fec: Restart PPS after link state change" and the related "net: fec: Use a spinlock to guard `fep->ptp_clk_on`" - Bluetooth: fix HCIGETDEVINFO regression - wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2 - mptcp: fix fwd memory accounting on coalesce - rwlock removal fall out: - ipmr: always call ip{,6}_mr_forward() from RCU read-side critical section - ipv6: fix crash when IPv6 is administratively disabled - tcp: read multiple skbs in tcp_read_skb() - mdio_bus_phy_resume state warning fallout: - eth: ravb: fix PHY state warning splat during system resume - eth: sh_eth: fix PHY state warning splat during system resume Current release - new code bugs: - wifi: iwlwifi: don't spam logs with NSS>2 messages - eth: mtk_eth_soc: enable XDP support just for MT7986 SoC Previous releases - regressions: - bonding: fix NULL deref in bond_rr_gen_slave_id - wifi: iwlwifi: mark IWLMEI as broken Previous releases - always broken: - nf_conntrack helpers: - irc: tighten matching on DCC message - sip: fix ct_sip_walk_headers - osf: fix possible bogus match in nf_osf_find() - ipvlan: fix out-of-bound bugs caused by unset skb->mac_header - core: fix flow symmetric hash - bonding, team: unsync device addresses on ndo_stop - phy: micrel: fix shared interrupt on LAN8814" * tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits) selftests: forwarding: add shebang for sch_red.sh bnxt: prevent skb UAF after handing over to PTP worker net: marvell: Fix refcounting bugs in prestera_port_sfp_bind() net: sched: fix possible refcount leak in tc_new_tfilter() net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD udp: Use WARN_ON_ONCE() in udp_read_skb() selftests: bonding: cause oops in bond_rr_gen_slave_id bonding: fix NULL deref in bond_rr_gen_slave_id net: phy: micrel: fix shared interrupt on LAN8814 net/smc: Stop the CLC flow if no link to map buffers on ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient net: atlantic: fix potential memory leak in aq_ndev_close() can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported can: gs_usb: gs_can_open(): fix race dev->can.state condition can: flexcan: flexcan_mailbox_read() fix return value for drop = true net: sh_eth: Fix PHY state warning splat during system resume net: ravb: Fix PHY state warning splat during system resume netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed netfilter: ebtables: fix memory leak when blob is malformed netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain() ...
2022-09-22net: ethernet: mtk_eth_wed: add axi bus supportLorenzo Bianconi1-1/+10
Other than pcie bus, introduce support for axi bus to mtk wed driver. Axi bus is used to connect mt7986-wmac soc chip available on mt7986 device. Tested-by: Daniel Golle <daniel@makrotopia.org> Co-developed-by: Bo Jiao <Bo.Jiao@mediatek.com> Signed-off-by: Bo Jiao <Bo.Jiao@mediatek.com> Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-22net: ethernet: mtk_eth_wed: add wed support for mt7986 chipsetLorenzo Bianconi1-0/+8
Introduce Wireless Etherne Dispatcher support on transmission side for mt7986 chipset Tested-by: Daniel Golle <daniel@makrotopia.org> Co-developed-by: Bo Jiao <Bo.Jiao@mediatek.com> Signed-off-by: Bo Jiao <Bo.Jiao@mediatek.com> Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-22net/smc: Unbind r/w buffer size from clcsock and make them tunableTony Lu1-0/+2
Currently, SMC uses smc->sk.sk_{rcv|snd}buf to create buffers for send buffer and RMB. And the values of buffer size are from tcp_{w|r}mem in clcsock. The buffer size from TCP socket doesn't fit SMC well. Generally, buffers are usually larger than TCP for SMC-R/-D to get higher performance, for they are different underlay devices and paths. So this patch unbinds buffer size from TCP, and introduces two sysctl knobs to tune them independently. Also, these knobs are per net namespace and work for containers. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-22net/smc: Introduce a specific sysctl for TEST_LINK timeWen Gu1-0/+1
SMC-R tests the viability of link by sending out TEST_LINK LLC messages over RoCE fabric when connections on link have been idle for a time longer than keepalive interval (testlink time). But using tcp_keepalive_time as testlink time maybe not quite suitable because it is default no less than two hours[1], which is too long for single link to find peer dead. The active host will still use peer-dead link (QP) sending messages, and can't find out until get IB_WC_RETRY_EXC_ERR error CQEs, which takes more time than TEST_LINK timeout (SMC_LLC_WAIT_TIME) normally. So this patch introduces a independent sysctl for SMC-R to set link keepalive time, in order to detect link down in time. The default value is 30 seconds. [1] https://www.rfc-editor.org/rfc/rfc1122#page-101 Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-22Merge branch 'master' of ↵Jakub Kicinski4-31/+4
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter patches for net-next Remove GPL license copypastry in uapi files, those have SPDX tags. From Christophe Jaillet. Remove unused variable in rpfilter, from Guillaume Nault. Rework gc resched delay computation in conntrack, from Antoine Tenart. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: rpfilter: Remove unused variable 'ret'. headers: Remove some left-over license text in include/uapi/linux/netfilter/ netfilter: conntrack: revisit the gc initial rescheduling bias netfilter: conntrack: fix the gc rescheduling delay ==================== Link: https://lore.kernel.org/r/20220921095000.29569-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22net: sched: remove unused tcf_result extensionJamal Hadi Salim1-5/+0
Added by: commit e5cf1baf92cb ("act_mirred: use TC_ACT_REINSERT when possible") but no longer useful. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20220919130627.3551233-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21headers: Remove some left-over license text in include/uapi/linux/netfilter/Christophe JAILLET4-31/+4
When the SPDX-License-Identifier tag has been added, the corresponding license text has not been removed. Remove it now. Also, in xt_connmark.h, move the copyright text at the top of the file which is a much more common pattern. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-21Revert "iommu/vt-d: Fix possible recursive locking in intel_iommu_init()"Lu Baolu1-3/+1
This reverts commit 9cd4f1434479f1ac25c440c421fbf52069079914. Some issues were reported on the original commit. Some thunderbolt devices don't work anymore due to the following DMA fault. DMAR: DRHD: handling fault status reg 2 DMAR: [INTR-REMAP] Request device [09:00.0] fault index 0x8080 [fault reason 0x25] Blocked a compatibility format interrupt request Bring it back for now to avoid functional regression. Fixes: 9cd4f1434479f ("iommu/vt-d: Fix possible recursive locking in intel_iommu_init()") Link: https://lore.kernel.org/linux-iommu/485A6EA5-6D58-42EA-B298-8571E97422DE@getmailspring.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216497 Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: <stable@vger.kernel.org> # 5.19.x Reported-and-tested-by: George Hilliard <thirtythreeforty@gmail.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220920081701.3453504-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-21net/sched: cls_api: add helper for tc cls walker stats dumpZhengchao Shao1-0/+13
The walk implementation of most tc cls modules is basically the same. That is, the values of count and skip are checked first. If count is greater than or equal to skip, the registered fn function is executed. Otherwise, increase the value of count. So we can reconstruct them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20tcp: Introduce optional per-netns ehash.Kuniyuki Iwashima2-0/+7
The more sockets we have in the hash table, the longer we spend looking up the socket. While running a number of small workloads on the same host, they penalise each other and cause performance degradation. The root cause might be a single workload that consumes much more resources than the others. It often happens on a cloud service where different workloads share the same computing resource. On EC2 c5.24xlarge instance (196 GiB memory and 524288 (1Mi / 2) ehash entries), after running iperf3 in different netns, creating 24Mi sockets without data transfer in the root netns causes about 10% performance regression for the iperf3's connection. thash_entries sockets length Gbps 524288 1 1 50.7 24Mi 48 45.1 It is basically related to the length of the list of each hash bucket. For testing purposes to see how performance drops along the length, I set 131072 (1Mi / 8) to thash_entries, and here's the result. thash_entries sockets length Gbps 131072 1 1 50.7 1Mi 8 49.9 2Mi 16 48.9 4Mi 32 47.3 8Mi 64 44.6 16Mi 128 40.6 24Mi 192 36.3 32Mi 256 32.5 40Mi 320 27.0 48Mi 384 25.0 To resolve the socket lookup degradation, we introduce an optional per-netns hash table for TCP, but it's just ehash, and we still share the global bhash, bhash2 and lhash2. With a smaller ehash, we can look up non-listener sockets faster and isolate such noisy neighbours. In addition, we can reduce lock contention. We can control the ehash size by a new sysctl knob. However, depending on workloads, it will require very sensitive tuning, so we disable the feature by default (net.ipv4.tcp_child_ehash_entries == 0). Moreover, we can fall back to using the global ehash in case we fail to allocate enough memory for a new ehash. The maximum size is 16Mi, which is large enough that even if we have 48Mi sockets, the average list length is 3, and regression would be less than 1%. We can check the current ehash size by another read-only sysctl knob, net.ipv4.tcp_ehash_entries. A negative value means the netns shares the global ehash (per-netns ehash is disabled or failed to allocate memory). # dmesg | cut -d ' ' -f 5- | grep "established hash" TCP established hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc hugepage) # sysctl net.ipv4.tcp_ehash_entries net.ipv4.tcp_ehash_entries = 524288 # can be changed by thash_entries # sysctl net.ipv4.tcp_child_ehash_entries net.ipv4.tcp_child_ehash_entries = 0 # disabled by default # ip netns add test1 # ip netns exec test1 sysctl net.ipv4.tcp_ehash_entries net.ipv4.tcp_ehash_entries = -524288 # share the global ehash # sysctl -w net.ipv4.tcp_child_ehash_entries=100 net.ipv4.tcp_child_ehash_entries = 100 # ip netns add test2 # ip netns exec test2 sysctl net.ipv4.tcp_ehash_entries net.ipv4.tcp_ehash_entries = 128 # own a per-netns ehash with 2^n buckets When more than two processes in the same netns create per-netns ehash concurrently with different sizes, we need to guarantee the size in one of the following ways: 1) Share the global ehash and create per-netns ehash First, unshare() with tcp_child_ehash_entries==0. It creates dedicated netns sysctl knobs where we can safely change tcp_child_ehash_entries and clone()/unshare() to create a per-netns ehash. 2) Control write on sysctl by BPF We can use BPF_PROG_TYPE_CGROUP_SYSCTL to allow/deny read/write on sysctl knobs. Note that the global ehash allocated at the boot time is spread over available NUMA nodes, but inet_pernet_hashinfo_alloc() will allocate pages for each per-netns ehash depending on the current process's NUMA policy. By default, the allocation is done in the local node only, so the per-netns hash table could fully reside on a random node. Thus, depending on the NUMA policy the netns is created with and the CPU the current thread is running on, we could see some performance differences for highly optimised networking applications. Note also that the default values of two sysctl knobs depend on the ehash size and should be tuned carefully: tcp_max_tw_buckets : tcp_child_ehash_entries / 2 tcp_max_syn_backlog : max(128, tcp_child_ehash_entries / 128) As a bonus, we can dismantle netns faster. Currently, while destroying netns, we call inet_twsk_purge(), which walks through the global ehash. It can be potentially big because it can have many sockets other than TIME_WAIT in all netns. Splitting ehash changes that situation, where it's only necessary for inet_twsk_purge() to clean up TIME_WAIT sockets in each netns. With regard to this, we do not free the per-netns ehash in inet_twsk_kill() to avoid UAF while iterating the per-netns ehash in inet_twsk_purge(). Instead, we do it in tcp_sk_exit_batch() after calling tcp_twsk_purge() to keep it protocol-family-independent. In the future, we could optimise ehash lookup/iteration further by removing netns comparison for the per-netns ehash. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20tcp: Save unnecessary inet_twsk_purge() calls.Kuniyuki Iwashima1-0/+1
While destroying netns, we call inet_twsk_purge() in tcp_sk_exit_batch() and tcpv6_net_exit_batch() for AF_INET and AF_INET6. These commands trigger the kernel to walk through the potentially big ehash twice even though the netns has no TIME_WAIT sockets. # ip netns add test # ip netns del test or # unshare -n /bin/true >/dev/null When tw_refcount is 1, we need not call inet_twsk_purge() at least for the net. We can save such unneeded iterations if all netns in net_exit_list have no TIME_WAIT sockets. This change eliminates the tax by the additional unshare() described in the next patch to guarantee the per-netns ehash size. Tested: # mount -t debugfs none /sys/kernel/debug/ # echo cleanup_net > /sys/kernel/debug/tracing/set_ftrace_filter # echo inet_twsk_purge >> /sys/kernel/debug/tracing/set_ftrace_filter # echo function > /sys/kernel/debug/tracing/current_tracer # cat ./add_del_unshare.sh for i in `seq 1 40` do (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) & done wait; # ./add_del_unshare.sh Before the patch: # cat /sys/kernel/debug/tracing/trace_pipe kworker/u128:0-8 [031] ...1. 174.162765: cleanup_net <-process_one_work kworker/u128:0-8 [031] ...1. 174.240796: inet_twsk_purge <-cleanup_net kworker/u128:0-8 [032] ...1. 174.244759: inet_twsk_purge <-tcp_sk_exit_batch kworker/u128:0-8 [034] ...1. 174.290861: cleanup_net <-process_one_work kworker/u128:0-8 [039] ...1. 175.245027: inet_twsk_purge <-cleanup_net kworker/u128:0-8 [046] ...1. 175.290541: inet_twsk_purge <-tcp_sk_exit_batch kworker/u128:0-8 [037] ...1. 175.321046: cleanup_net <-process_one_work kworker/u128:0-8 [024] ...1. 175.941633: inet_twsk_purge <-cleanup_net kworker/u128:0-8 [025] ...1. 176.242539: inet_twsk_purge <-tcp_sk_exit_batch After: # cat /sys/kernel/debug/tracing/trace_pipe kworker/u128:0-8 [038] ...1. 428.116174: cleanup_net <-process_one_work kworker/u128:0-8 [038] ...1. 428.262532: cleanup_net <-process_one_work kworker/u128:0-8 [030] ...1. 429.292645: cleanup_net <-process_one_work Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20tcp: Set NULL to sk->sk_prot->h.hashinfo.Kuniyuki Iwashima1-0/+10
We will soon introduce an optional per-netns ehash. This means we cannot use the global sk->sk_prot->h.hashinfo to fetch a TCP hashinfo. Instead, set NULL to sk->sk_prot->h.hashinfo for TCP and get a proper hashinfo from net->ipv4.tcp_death_row.hashinfo. Note that we need not use sk->sk_prot->h.hashinfo if DCCP is disabled. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20tcp: Don't allocate tcp_death_row outside of struct netns_ipv4.Kuniyuki Iwashima1-1/+2
We will soon introduce an optional per-netns ehash and access hash tables via net->ipv4.tcp_death_row->hashinfo instead of &tcp_hashinfo in most places. It could harm the fast path because dereferences of two fields in net and tcp_death_row might incur two extra cache line misses. To save one dereference, let's place tcp_death_row back in netns_ipv4 and fetch hashinfo via net->ipv4.tcp_death_row"."hashinfo. Note tcp_death_row was initially placed in netns_ipv4, and commit fbb8295248e1 ("tcp: allocate tcp_death_row outside of struct netns_ipv4") changed it to a pointer so that we can fire TIME_WAIT timers after freeing net. However, we don't do so after commit 04c494e68a13 ("Revert "tcp/dccp: get rid of inet_twsk_purge()""), so we need not define tcp_death_row as a pointer. Also, we move refcount_dec_and_test(&tw_refcount) from tcp_sk_exit() to tcp_sk_exit_batch() as a debug check. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20headers: Remove some left-over license textChristophe JAILLET2-4/+0
Remove a left-over from commit 2874c5fd2842 ("treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152") There is no need for an empty "License:". Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/0e5ff727626b748238f4b78932f81572143d8f0b.1662896317.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20firmware: xilinx: add support for sd/gem configRonak Jain1-0/+45
Add new APIs in firmware to configure SD/GEM registers. Internally it calls PM IOCTL for below SD/GEM register configuration: - SD/EMMC select - SD slot type - SD base clock - SD 8 bit support - SD fixed config - GEM SGMII Mode - GEM fixed config Signed-off-by: Ronak Jain <ronak.jain@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20seg6: add NEXT-C-SID support for SRv6 End behaviorAndrea Mayer1-0/+24
The NEXT-C-SID mechanism described in [1] offers the possibility of encoding several SRv6 segments within a single 128 bit SID address. Such a SID address is called a Compressed SID (C-SID) container. In this way, the length of the SID List can be drastically reduced. A SID instantiated with the NEXT-C-SID flavor considers an IPv6 address logically structured in three main blocks: i) Locator-Block; ii) Locator-Node Function; iii) Argument. C-SID container +------------------------------------------------------------------+ | Locator-Block |Loc-Node| Argument | | |Function| | +------------------------------------------------------------------+ <--------- B -----------> <- NF -> <------------- A ---------------> (i) The Locator-Block can be any IPv6 prefix available to the provider; (ii) The Locator-Node Function represents the node and the function to be triggered when a packet is received on the node; (iii) The Argument carries the remaining C-SIDs in the current C-SID container. The NEXT-C-SID mechanism relies on the "flavors" framework defined in [2]. The flavors represent additional operations that can modify or extend a subset of the existing behaviors. This patch introduces the support for flavors in SRv6 End behavior implementing the NEXT-C-SID one. An SRv6 End behavior with NEXT-C-SID flavor works as an End behavior but it is capable of processing the compressed SID List encoded in C-SID containers. An SRv6 End behavior with NEXT-C-SID flavor can be configured to support user-provided Locator-Block and Locator-Node Function lengths. In this implementation, such lengths must be evenly divisible by 8 (i.e. must be byte-aligned), otherwise the kernel informs the user about invalid values with a meaningful error code and message through netlink_ext_ack. If Locator-Block and/or Locator-Node Function lengths are not provided by the user during configuration of an SRv6 End behavior instance with NEXT-C-SID flavor, the kernel will choose their default values i.e., 32-bit Locator-Block and 16-bit Locator-Node Function. [1] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression [2] - https://datatracker.ietf.org/doc/html/rfc8986 Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: felix: add support for changing DSA masterVladimir Oltean1-0/+1
Changing the DSA master means different things depending on the tagging protocol in use. For NPI mode ("ocelot" and "seville"), there is a single port which can be configured as NPI, but DSA only permits changing the CPU port affinity of user ports one by one. So changing a user port to a different NPI port globally changes what the NPI port is, and breaks the user ports still using the old one. To address this while still permitting the change of the NPI port, require that the user ports which are still affine to the old NPI port are down, and cannot be brought up until they are all affine to the same NPI port. The tag_8021q mode ("ocelot-8021q") is more flexible, in that each user port can be freely assigned to one CPU port or to the other. This works by filtering host addresses towards both tag_8021q CPU ports, and then restricting the forwarding from a certain user port only to one of the two tag_8021q CPU ports. Additionally, the 2 tag_8021q CPU ports can be placed in a LAG. This works by enabling forwarding via PGID_SRC from a certain user port towards the logical port ID containing both tag_8021q CPU ports, but then restricting forwarding per packet, via the LAG hash codes in PGID_AGGR, to either one or the other. When we change the DSA master to a LAG device, DSA guarantees us that the LAG has at least one lower interface as a physical DSA master. But DSA masters can come and go as lowers of that LAG, and ds->ops->port_change_master() will not get called, because the DSA master is still the same (the LAG). So we need to hook into the ds->ops->port_lag_{join,leave} calls on the CPU ports and update the logical port ID of the LAG that user ports are assigned to. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: allow masters to join a LAGVladimir Oltean1-0/+12
There are 2 ways in which a DSA user port may become handled by 2 CPU ports in a LAG: (1) its current DSA master joins a LAG ip link del bond0 && ip link add bond0 type bond mode 802.3ad ip link set eno2 master bond0 When this happens, all user ports with "eno2" as DSA master get automatically migrated to "bond0" as DSA master. (2) it is explicitly configured as such by the user # Before, the DSA master was eno3 ip link set swp0 type dsa master bond0 The design of this configuration is that the LAG device dynamically becomes a DSA master through dsa_master_setup() when the first physical DSA master becomes a LAG slave, and stops being so through dsa_master_teardown() when the last physical DSA master leaves. A LAG interface is considered as a valid DSA master only if it contains existing DSA masters, and no other lower interfaces. Therefore, we mainly rely on method (1) to enter this configuration. Each physical DSA master (LAG slave) retains its dev->dsa_ptr for when it becomes a standalone DSA master again. But the LAG master also has a dev->dsa_ptr, and this is actually duplicated from one of the physical LAG slaves, and therefore needs to be balanced when LAG slaves come and go. To the switch driver, putting DSA masters in a LAG is seen as putting their associated CPU ports in a LAG. We need to prepare cross-chip host FDB notifiers for CPU ports in a LAG, by calling the driver's ->lag_fdb_add method rather than ->port_fdb_add. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: propagate extack to port_lag_joinVladimir Oltean2-3/+6
Drivers could refuse to offload a LAG configuration for a variety of reasons, mainly having to do with its TX type. Additionally, since DSA masters may now also be LAG interfaces, and this will translate into a call to port_lag_join on the CPU ports, there may be extra restrictions there. Propagate the netlink extack to this DSA method in order for drivers to give a meaningful error message back to the user. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: allow the DSA master to be seen and changed through rtnetlinkVladimir Oltean2-0/+18
Some DSA switches have multiple CPU ports, which can be used to improve CPU termination throughput, but DSA, through dsa_tree_setup_cpu_ports(), sets up only the first one, leading to suboptimal use of hardware. The desire is to not change the default configuration but to permit the user to create a dynamic mapping between individual user ports and the CPU port that they are served by, configurable through rtnetlink. It is also intended to permit load balancing between CPU ports, and in that case, the foreseen model is for the DSA master to be a bonding interface whose lowers are the physical DSA masters. To that end, we create a struct rtnl_link_ops for DSA user ports with the "dsa" kind. We expose the IFLA_DSA_MASTER link attribute that contains the ifindex of the newly desired DSA master. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: introduce dsa_port_get_master()Vladimir Oltean1-0/+5
There is a desire to support for DSA masters in a LAG. That configuration is intended to work by simply enslaving the master to a bonding/team device. But the physical DSA master (the LAG slave) still has a dev->dsa_ptr, and that cpu_dp still corresponds to the physical CPU port. However, we would like to be able to retrieve the LAG that's the upper of the physical DSA master. In preparation for that, introduce a helper called dsa_port_get_master() that replaces all occurrences of the dp->cpu_dp->master pattern. The distinction between LAG and non-LAG will be made later within the helper itself. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: introduce iterators over synced hw addressesVladimir Oltean1-0/+6
Some network drivers use __dev_mc_sync()/__dev_uc_sync() and therefore program the hardware only with addresses with a non-zero sync_cnt. Some of the above drivers also need to save/restore the address filtering lists when certain events happen, and they need to walk through the struct net_device :: uc and struct net_device :: mc lists. But these lists contain unsynced addresses too. To keep the appearance of an elementary form of data encapsulation, provide iterators through these lists that only look at entries with a non-zero sync_cnt, instead of filtering entries out from device drivers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20flow_offload: Introduce flow_match_l2tpv3Wojciech Drewek1-0/+6
Allow to offload L2TPv3 filters by adding flow_rule_match_l2tpv3. Drivers can extract L2TPv3 specific fields from now on. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net/sched: flower: Add L2TPv3 filterWojciech Drewek1-0/+2
Add support for matching on L2TPv3 session ID. Session ID can be specified only when ip proto was set to IPPROTO_L2TP. Example filter: # tc filter add dev $PF1 ingress prio 1 protocol ip \ flower \ ip_proto l2tp \ l2tpv3_sid 1234 \ skip_sw \ action mirred egress redirect dev $VF1_PR Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20flow_dissector: Add L2TPv3 dissectorsWojciech Drewek1-0/+9
Allow to dissect L2TPv3 specific field which is: - session ID (32 bits) L2TPv3 might be transported over IP or over UDP, this implementation is only about L2TPv3 over IP. IP protocol carries L2TPv3 when ip_proto is IPPROTO_L2TP (115). Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20uapi: move IPPROTO_L2TP to in.hWojciech Drewek2-2/+2
IPPROTO_L2TP is currently defined in l2tp.h, but most of ip protocols are defined in in.h file. Move it there in order to keep code clean. Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20Merge tag 'for-net-2022-09-09' of ↵Jakub Kicinski1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix HCIGETDEVINFO regression * tag 'for-net-2022-09-09' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: Fix HCIGETDEVINFO regression ==================== Link: https://lore.kernel.org/r/20220909201642.3810565-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20Merge tag 'ib-mfd-net-pinctrl-v6.0' of ↵Jakub Kicinski2-0/+67
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Lee Jones says: ==================== Immutable branch between MFD, Net and Pinctrl due for the v6.0 merge window * tag 'ib-mfd-net-pinctrl-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: ocelot: Add support for the vsc7512 chip via spi dt-bindings: mfd: ocelot: Add bindings for VSC7512 resource: add define macro for register address resources pinctrl: microchip-sgpio: add ability to be used in a non-mmio configuration pinctrl: microchip-sgpio: allow sgpio driver to be used as a module pinctrl: ocelot: add ability to be used in a non-mmio configuration net: mdio: mscc-miim: add ability to be used in a non-mmio configuration mfd: ocelot: Add helper to get regmap from a resource ==================== Link: https://lore.kernel.org/r/YxrjyHcceLOFlT/c@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-18Merge tag 'parisc-for-6.0-3' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture fixes from Helge Deller: "Some small parisc architecture fixes for 6.0-rc6: One patch lightens up a previous commit and thus unbreaks building the debian kernel, which tries to configure a 64-bit kernel with the ARCH=parisc environment variable set. The other patches fixes asm/errno.h includes in the tools directory and cleans up memory allocation in the iosapic driver. Summary: - Allow configuring 64-bit kernel with ARCH=parisc - Fix asm/errno.h includes in tools directory for parisc and xtensa - Clean up iosapic memory allocation - Minor typo and spelling fixes" * tag 'parisc-for-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Allow CONFIG_64BIT with ARCH=parisc parisc: remove obsolete manual allocation aligning in iosapic tools/include/uapi: Fix <asm/errno.h> for parisc and xtensa Input: hp_sdc: fix spelling typo in comment parisc: ccio-dma: Add missing iounmap in error path in ccio_probe()
2022-09-16Merge tag 'linux-can-next-for-6.1-20220915' of ↵David S. Miller5-4/+115
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== Sept. 15, 2022, 8:19 a.m. UTC Hello Jakub, hello David, this is a pull request of 23 patches for net-next/master. the first 2 patches are by me and fix a typo in the rx-offload helper and the flexcan driver. Christophe JAILLET's patch cleans up the error handling in rcar_canfd driver's probe function. Kenneth Lee's patch converts the kvaser_usb driver from kcalloc() to kzalloc(). Biju Das contributes 2 patches to the sja1000 driver which update the DT bindings and support for the RZ/N1 SJA1000 CAN controller. Jinpeng Cui provides 2 patches that remove redundant variables from the sja1000 and kvaser_pciefd driver. 2 patches by John Whittington and me add hardware timestamp support to the gs_usb driver. Gustavo A. R. Silva's patch converts the etas_es58x driver to make use of DECLARE_FLEX_ARRAY(). Krzysztof Kozlowski's patch cleans up the sja1000 DT bindings. Dario Binacchi fixes his invalid email in the flexcan driver documentation. Ziyang Xuan contributes 2 patches that clean up the CAN RAW protocol. Yang Yingliang's patch switches the flexcan driver to dev_err_probe(). The last 7 patches are by Oliver Hartkopp and add support for the next generation of the CAN protocol: CAN with eXtended data Length (CAN XL). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16net: bonding: Share lacpdu_mcast_addr definitionBenjamin Poirier2-2/+3
There are already a few definitions of arrays containing MULTICAST_LACPDU_ADDR and the next patch will add one more use. These all contain the same constant data so define one common instance for all bonding code. Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16iov_iter: use "maxpages" parameterDan Carpenter1-1/+1
This was intended to be "maxpages" instead of INT_MAX. There is only one caller and it passes INT_MAX so this does not affect runtime. Fixes: b93235e68921 ("tls: cap the output scatter list to something reasonable") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16net/ieee802154: fix uninit value bug in dgram_sendmsgHaimin Zhang1-0/+37
There is uninit value bug in dgram_sendmsg function in net/ieee802154/socket.c when the length of valid data pointed by the msg->msg_name isn't verified. We introducing a helper function ieee802154_sockaddr_check_size to check namelen. First we check there is addr_type in ieee802154_addr_sa. Then, we check namelen according to addr_type. Also fixed in raw_bind, dgram_bind, dgram_connect. Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16rtnetlink: advertise allmulti counterNicolas Dichtel1-0/+1
Like what was done with IFLA_PROMISCUITY, add IFLA_ALLMULTI to advertise the allmulti counter. The flag IFF_ALLMULTI is advertised only if it was directly set by a userland app. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-15can: raw: add CAN XL supportOliver Hartkopp1-0/+1
Enable CAN_RAW sockets to read and write CAN XL frames analogue to the CAN FD extension (new CAN_RAW_XL_FRAMES sockopt). A CAN XL network interface is capable to handle Classical CAN, CAN FD and CAN XL frames. When CAN_RAW_XL_FRAMES is enabled, the CAN_RAW socket checks whether the addressed CAN network interface is capable to handle the provided CAN frame. In opposite to the fixed number of bytes for - CAN frames (CAN_MTU = sizeof(struct can_frame)) - CAN FD frames (CANFD_MTU = sizeof(struct can_frame)) the number of bytes when reading/writing CAN XL frames depends on the number of data bytes. For efficiency reasons the length of the struct canxl_frame is truncated to the needed size for read/write operations. This leads to a calculated size of CANXL_HDR_SIZE + canxl_frame::len which is enforced on write() operations and guaranteed on read() operations. NB: Valid length values are 1 .. 2048 (CANXL_MIN_DLEN .. CANXL_MAX_DLEN). Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20220912170725.120748-8-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-15can: dev: add CAN XL support to virtual CANOliver Hartkopp1-0/+5
Make use of new can_skb_get_data_len() helper. Add support for variable CANXL MTU using the new can_is_canxl_dev_mtu(). Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20220912170725.120748-7-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>