starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2024-08-22	net: ipv6: ioam6: new feature tunsrc	Justin Iurman	2	-8/+63
	This patch provides a new feature (i.e., "tunsrc") for the tunnel (i.e., "encap") mode of ioam6. Just like seg6 already does, except it is attached to a route. The "tunsrc" is optional: when not provided (by default), the automatic resolution is applied. Using "tunsrc" when possible has a benefit: performance. See the comparison: - before (= "encap" mode): https://ibb.co/bNCzvf7 - after (= "encap" mode with "tunsrc"): https://ibb.co/PT8L6yq Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-22	net: ipv6: ioam6: code alignment	Justin Iurman	1	-4/+5
	This patch prepares the next one by correcting the alignment of some lines. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-22	tools: ynl: lift an assumption about spec file name	Paolo Abeni	1	-2/+4
	Currently the parsing code generator assumes that the yaml specification file name and the main 'name' attribute carried inside correspond, that is the field is the c-name representation of the file basename. The above assumption held true within the current tree, but will be hopefully broken soon by the upcoming net shaper specification. Additionally, it makes the field 'name' itself useless. Lift the assumption, always computing the generated include file name from the generated c file name. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/24da5a3596d814beeb12bd7139a6b4f89756cc19.1724165948.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-22	Merge branch 'net-xilinx-axienet-add-statistics-support'	Jakub Kicinski	2	-4/+408
	Sean Anderson says: ==================== net: xilinx: axienet: Add statistics support Add support for hardware statistics counters (if they are enabled) in the AXI Ethernet driver. Unfortunately, the implementation is complicated a bit since the hardware might only support 32-bit counters. ==================== Link: https://patch.msgid.link/20240820175343.760389-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-22	net: xilinx: axienet: Add statistics support	Sean Anderson	2	-3/+407
	Add support for reading the statistics counters, if they are enabled. The counters may be 64-bit, but we can't detect this statically as there's no ability bit for it and the counters are read-only. Therefore, we assume the counters are 32-bits by default. To ensure we don't miss an overflow, we read all counters at 13-second intervals. This should be often enough to ensure the bytes counters don't wrap at 2.5 Gbit/s. Another complication is that the counters may be reset when the device is reset (depending on configuration). To ensure the counters persist across link up/down (including suspend/resume), we maintain our own versions along with the last counter value we saw. Because we might wait up to 100 ms for the reset to complete, we use a mutex to protect writing hw_stats. We can't sleep in ndo_get_stats64, so we use a seqlock to protect readers. We don't bother disabling the refresh work when we detect 64-bit counters. This is because the reset issue requires us to read hw_stat_base and reset_in_progress anyway, which would still require the seqcount. And I don't think skipping the task is worth the extra bookkeeping. We can't use the byte counters for either get_stats64 or get_eth_mac_stats. This is because the byte counters include everything in the frame (destination address to FCS, inclusive). But rtnl_link_stats64 wants bytes excluding the FCS, and ethtool_eth_mac_stats wants to exclude the L2 overhead (addresses and length/type). It might be possible to calculate the byte values Linux expects based on the frame counters, but I think it is simpler to use the existing software counters. get_ethtool_stats is implemented for nonstandard statistics. This includes the aforementioned byte counters, VLAN and PFC frame counters, and user-defined (e.g. with custom RTL) counters. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20240820175343.760389-3-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-22	net: xilinx: axienet: Report RxRject as rx_dropped	Sean Anderson	1	-1/+1
	The Receive Frame Rejected interrupt is asserted whenever there was a receive error (bad FCS, bad length, etc.) or whenever the frame was dropped due to a mismatched address. So this is really a combination of rx_otherhost_dropped, rx_length_errors, rx_frame_errors, and rx_crc_errors. Mismatched addresses are common and aren't really errors at all (much like how fragments are normal on half-duplex links). To avoid confusion, report these events as rx_dropped. This better reflects what's going on: the packet was received by the MAC but dropped before being processed. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://patch.msgid.link/20240820175343.760389-2-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-22	net: repack struct netdev_queue	Jakub Kicinski	1	-9/+14
	Adding the NAPI pointer to struct netdev_queue made it grow into another cacheline, even though there was 44 bytes of padding available. The struct was historically grouped as follows: /* read-mostly stuff (align) / / ... random control path fields ... / / write-mostly stuff (align) / / ... 40 byte hole ... / / struct dql (align) / It seems that people want to add control path fields after the read only fields. struct dql looks pretty innocent but it forces its own alignment and nothing indicates that there is a lot of empty space above it. Move dql above the xmit_lock. This shifts the empty space to the end of the struct rather than in the middle of it. Move two example fields there to set an example. Hopefully people will now add new fields at the end of the struct. A lot of the read-only stuff is also control path-only, but if we move it all we'll have another hole in the middle. Before: / size: 384, cachelines: 6, members: 16 / / sum members: 284, holes: 3, sum holes: 100 / After: / size: 320, cachelines: 5, members: 16 / / sum members: 284, holes: 1, sum holes: 8 */ Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240820205119.1321322-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-22	ice: Fix a 32bit bug	Dan Carpenter	1	-3/+3
	BIT() is unsigned long but ->pu.flg_msk and ->pu.flg_val are u64 type. On 32 bit systems, unsigned long is a u32 and the mismatch between u32 and u64 will break things for the high 32 bits. Fixes: 9a4c07aaa0f5 ("ice: add parser execution main loop") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/ddc231a8-89c1-4ff4-8704-9198bcb41f8d@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-22	ipv6: remove redundant check	Xi Huang	1	-6/+3
	err varibale will be set everytime,like -ENOBUFS and in if (err < 0), when code gets into this path. This check will just slowdown the execution and that's all. Signed-off-by: Xi Huang <xuiagnh@gmail.com> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20240820115442.49366-1-xuiagnh@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-22	net: dsa: sja1105: Simplify with scoped for each OF child loop	Jinjie Ruan	1	-8/+2
	Use scoped for_each_available_child_of_node_scoped() when iterating over device nodes to make code a bit simpler. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patch.msgid.link/20240820075047.681223-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-22	net: dsa: ocelot: Simplify with scoped for each OF child loop	Jinjie Ruan	1	-4/+1
	Use scoped for_each_available_child_of_node_scoped() when iterating over device nodes to make code a bit simpler. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patch.msgid.link/20240820074805.680674-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-22	nfc: pn533: Avoid -Wflex-array-member-not-at-end warnings	Gustavo A. R. Silva	1	-1/+0
	-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Remove unnecessary flex-array member `data[]`, and with this fix the following warnings: drivers/nfc/pn533/usb.c:268:38: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/nfc/pn533/usb.c:275:38: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/ZsPw7+6vNoS651Cb@elsanto Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21	net: wwan: t7xx: PCIe reset rescan	Jinjian Song	7	-43/+105
	WWAN device is programmed to boot in normal mode or fastboot mode, when triggering a device reset through ACPI call or fastboot switch command. Maintain state machine synchronization and reprobe logic after a device reset. The PCIe device reset triggered by several ways. E.g.: - fastboot: echo "fastboot_switching" > /sys/bus/pci/devices/${bdf}/t7xx_mode. - reset: echo "reset" > /sys/bus/pci/devices/${bdf}/t7xx_mode. - IRQ: PCIe device request driver to reset itself by an interrupt request. Use pci_reset_function() as a generic way to reset device, save and restore the PCIe configuration before and after reset device to ensure the reprobe process. Suggestion from Bjorn: Link: https://lore.kernel.org/all/20230127133034.GA1364550@bhelgaas/ Signed-off-by: Jinjian Song <jinjian.song@fibocom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-21	net: dsa: b53: Use dev_err_probe()	Florian Fainelli	1	-4/+3
	Rather than print an error even when we get -EPROBE_DEFER, use dev_err_probe() to filter out those messages. Link: https://github.com/openwrt/openwrt/pull/11680 Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240820004436.224603-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21	l2tp: use skb_queue_purge in l2tp_ip_destroy_sock	James Chapman	2	-4/+1
	Recent commit ed8ebee6def7 ("l2tp: have l2tp_ip_destroy_sock use ip_flush_pending_frames") was incorrect in that l2tp_ip does not use socket cork and ip_flush_pending_frames is for sockets that do. Use __skb_queue_purge instead and remove the unnecessary lock. Also unexport ip_flush_pending_frames since it was originally exported in commit 4ff8863419cd ("ipv4: export ip_flush_pending_frames") for l2tp and is not used by other modules. Suggested-by: xiyou.wangcong@gmail.com Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240819143333.3204957-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21	af_unix: Don't call skb_get() for OOB skb.	Kuniyuki Iwashima	2	-36/+7
	Since introduced, OOB skb holds an additional reference count with no special reason and caused many issues. Also, kfree_skb() and consume_skb() are used to decrement the count, which is confusing. Let's drop the unnecessary skb_get() in queue_oob() and corresponding kfree_skb(), consume_skb(), and skb_unref(). Now unix_sk(sk)->oob_skb is just a pointer to skb in the receive queue, so special handing is no longer needed in GC. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240816233921.57800-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21	dt-bindings: net: socionext,uniphier-ave4: add top-level constraints	Krzysztof Kozlowski	1	-2/+6
	Properties with variable number of items per each device are expected to have widest constraints in top-level "properties:" block and further customized (narrowed) in "if:then:". Add missing top-level constraints for clock-names and reset-names. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240818172905.121829-4-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21	dt-bindings: net: renesas,etheravb: add top-level constraints	Krzysztof Kozlowski	1	-10/+19
	Properties with variable number of items per each device are expected to have widest constraints in top-level "properties:" block and further customized (narrowed) in "if:then:". Add missing top-level constraints for reg, clocks, clock-names, interrupts and interrupt-names. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240818172905.121829-3-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21	dt-bindings: net: mediatek,net: add top-level constraints	Krzysztof Kozlowski	1	-2/+7
	Properties with variable number of items per each device are expected to have widest constraints in top-level "properties:" block and further customized (narrowed) in "if:then:". Add missing top-level constraints for clocks and clock-names. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240818172905.121829-2-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21	dt-bindings: net: mediatek,net: narrow interrupts per variants	Krzysztof Kozlowski	1	-0/+3
	Each variable-length property like interrupts must have fixed constraints on number of items for given variant in binding. The clauses in "if:then:" block should define both limits: upper and lower. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240818172905.121829-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21	net: Silence false field-spanning write warning in metadata_dst memcpy	Gal Pressman	1	-2/+5
	When metadata_dst struct is allocated (using metadata_dst_alloc()), it reserves room for options at the end of the struct. Change the memcpy() to unsafe_memcpy() as it is guaranteed that enough room (md_size bytes) was allocated and the field-spanning write is intentional. This resolves the following warning: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 104) of single field "&new_md->u.tun_info" at include/net/dst_metadata.h:166 (size 96) WARNING: CPU: 2 PID: 391470 at include/net/dst_metadata.h:166 tun_dst_unclone+0x114/0x138 [geneve] Modules linked in: act_tunnel_key geneve ip6_udp_tunnel udp_tunnel act_vlan act_mirred act_skbedit cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress sbsa_gwdt ipmi_devintf ipmi_msghandler xfrm_interface xfrm6_tunnel tunnel6 tunnel4 xfrm_user xfrm_algo nvme_fabrics overlay optee openvswitch nsh nf_conncount ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser rdma_cm ib_umad iw_cm libiscsi ib_ipoib scsi_transport_iscsi ib_cm uio_pdrv_genirq uio mlxbf_pmc pwr_mlxbf mlxbf_bootctl bluefield_edac nft_chain_nat binfmt_misc xt_MASQUERADE nf_nat xt_tcpmss xt_NFLOG nfnetlink_log xt_recent xt_hashlimit xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_comment ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink sch_fq_codel dm_multipath fuse efi_pstore ip_tables btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 nvme nvme_core mlx5_ib ib_uverbs ib_core ipv6 crc_ccitt mlx5_core crct10dif_ce mlxfw psample i2c_mlxbf gpio_mlxbf2 mlxbf_gige mlxbf_tmfifo CPU: 2 PID: 391470 Comm: handler6 Not tainted 6.10.0-rc1 #1 Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.5.0.12993 Dec 6 2023 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : tun_dst_unclone+0x114/0x138 [geneve] lr : tun_dst_unclone+0x114/0x138 [geneve] sp : ffffffc0804533f0 x29: ffffffc0804533f0 x28: 000000000000024e x27: 0000000000000000 x26: ffffffdcfc0e8e40 x25: ffffff8086fa6600 x24: ffffff8096a0c000 x23: 0000000000000068 x22: 0000000000000008 x21: ffffff8092ad7000 x20: ffffff8081e17900 x19: ffffff8092ad7900 x18: 00000000fffffffd x17: 0000000000000000 x16: ffffffdcfa018488 x15: 695f6e75742e753e x14: 2d646d5f77656e26 x13: 6d5f77656e262220 x12: 646c65696620656c x11: ffffffdcfbe33ae8 x10: ffffffdcfbe1baa8 x9 : ffffffdcfa0a4c10 x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001 x5 : ffffff83fdeeb010 x4 : 0000000000000000 x3 : 0000000000000027 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80913f6780 Call trace: tun_dst_unclone+0x114/0x138 [geneve] geneve_xmit+0x214/0x10e0 [geneve] dev_hard_start_xmit+0xc0/0x220 __dev_queue_xmit+0xa14/0xd38 dev_queue_xmit+0x14/0x28 [openvswitch] ovs_vport_send+0x98/0x1c8 [openvswitch] do_output+0x80/0x1a0 [openvswitch] do_execute_actions+0x172c/0x1958 [openvswitch] ovs_execute_actions+0x64/0x1a8 [openvswitch] ovs_packet_cmd_execute+0x258/0x2d8 [openvswitch] genl_family_rcv_msg_doit+0xc8/0x138 genl_rcv_msg+0x1ec/0x280 netlink_rcv_skb+0x64/0x150 genl_rcv+0x40/0x60 netlink_unicast+0x2e4/0x348 netlink_sendmsg+0x1b0/0x400 __sock_sendmsg+0x64/0xc0 ____sys_sendmsg+0x284/0x308 ___sys_sendmsg+0x88/0xf0 __sys_sendmsg+0x70/0xd8 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x50/0x128 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x38/0x100 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x1a4/0x1a8 ---[ end trace 0000000000000000 ]--- Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240818114351.3612692-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21	net: hns3: Use ARRAY_SIZE() to improve readability	Zhang Zekun	1	-4/+4
	There is a helper function ARRAY_SIZE() to help calculating the u32 array size, and we don't need to do it mannually. So, let's use ARRAY_SIZE() to calculate the array size, and improve the code readability. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Reviewed-by: Jijie Shao<shaojijie@huawei.com> Link: https://patch.msgid.link/20240818052518.45489-1-zhangzekun11@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21	selftests: net/forwarding: spawn sh inside vrf to speed up ping loop	Jakub Kicinski	3	-12/+12
	Looking at timestamped output of netdev CI reveals that most of the time in forwarding tests for custom route hashing is spent on a single case, namely the test which uses ping (mausezahn does not support flow labels). On a non-debug kernel we spend 714 of 730 total test runtime (97%) on this test case. While having flow label support in a traffic gen tool / mausezahn would be best, we can significantly speed up the loop by putting ip vrf exec outside of the iteration. In a test of 1000 pings using a normal loop takes 50 seconds to finish. While using: ip vrf exec $vrf sh -c "$loop-body" takes 12 seconds (1/4 of the time). Some of the slowness is likely due to our inefficient virtualization setup, but even on my laptop running "ip link help" 16k times takes 25-30 seconds, so I think it's worth optimizing even for fastest setups. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20240817203659.712085-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20	net: ethernet: ibm: Simpify code with for_each_child_of_node()	Zhang Zekun	1	-6/+4
	for_each_child_of_node can help to iterate through the device_node, and we don't need to use while loop. No functional change with this conversion. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240816015837.109627-1-zhangzekun11@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20	Merge branch 'preparations-for-fib-rule-dscp-selector'	Paolo Abeni	7	-9/+13
	Ido Schimmel says: ==================== Preparations for FIB rule DSCP selector This patchset moves the masking of the upper DSCP bits in 'flowi4_tos' to the core instead of relying on callers of the FIB lookup API to do it. This will allow us to start changing users of the API to initialize the 'flowi4_tos' field with all six bits of the DSCP field. In turn, this will allow us to extend FIB rules with a new DSCP selector. By masking the upper DSCP bits in the core we are able to maintain the behavior of the TOS selector in FIB rules and routes to only match on the lower DSCP bits. While working on this I found two users of the API that do not mask the upper DSCP bits before performing the lookup. The first is an ancient netlink family that is unlikely to be used. It is adjusted in patch #1 to mask both the upper DSCP bits and the ECN bits before calling the API. The second user is a nftables module that differs in this regard from its equivalent iptables module. It is adjusted in patch #2 to invoke the API with the upper DSCP bits masked, like all other callers. The relevant selftest passed, but in the unlikely case that regressions are reported because of this change, we can restore the existing behavior using a new flow information flag as discussed here [1]. The last patch moves the masking of the upper DSCP bits to the core, making the first two patches redundant, but I wanted to post them separately to call attention to the behavior change for these two users of the FIB lookup API. Future patchsets (around 3) will start unmasking the upper DSCP bits throughout the networking stack before adding support for the new FIB rule DSCP selector. Changes from v1 [2]: Patch #3: Include <linux/ip.h> in <linux/in_route.h> instead of including it in net/ip_fib.h [1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/ [2] https://lore.kernel.org/netdev/20240725131729.1729103-1-idosch@nvidia.com/ ==================== Link: https://patch.msgid.link/20240814125224.972815-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20	ipv4: Centralize TOS matching	Ido Schimmel	5	-5/+11
	The TOS field in the IPv4 flow information structure ('flowi4_tos') is matched by the kernel against the TOS selector in IPv4 rules and routes. The field is initialized differently by different call sites. Some treat it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC 791 TOS and initialize it using IPTOS_RT_MASK. What is common to all these call sites is that they all initialize the lower three DSCP bits, which fits the TOS definition in the initial IPv4 specification (RFC 791). Therefore, the kernel only allows configuring IPv4 FIB rules that match on the lower three DSCP bits which are always guaranteed to be initialized by all call sites: # ip -4 rule add tos 0x1c table 100 # ip -4 rule add tos 0x3c table 100 Error: Invalid tos. While this works, it is unlikely to be very useful. RFC 791 that initially defined the TOS and IP precedence fields was updated by RFC 2474 over twenty five years ago where these fields were replaced by a single six bits DSCP field. Extending FIB rules to match on DSCP can be done by adding a new DSCP selector while maintaining the existing semantics of the TOS selector for applications that rely on that. A prerequisite for allowing FIB rules to match on DSCP is to adjust all the call sites to initialize the high order DSCP bits and remove their masking along the path to the core where the field is matched on. However, making this change alone will result in a behavior change. For example, a forwarded IPv4 packet with a DS field of 0xfc will no longer match a FIB rule that was configured with 'tos 0x1c'. This behavior change can be avoided by masking the upper three DSCP bits in 'flowi4_tos' before comparing it against the TOS selectors in FIB rules and routes. Implement the above by adding a new function that checks whether a given DSCP value matches the one specified in the IPv4 flow information structure and invoke it from the three places that currently match on 'flowi4_tos'. Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK since the latter is not uAPI and we should be able to remove it at some point. Include <linux/ip.h> in <linux/in_route.h> since the former defines IPTOS_TOS_MASK which is used in the definition of RT_TOS() in <linux/in_route.h>. No regressions in FIB tests: # ./fib_tests.sh [...] Tests passed: 218 Tests failed: 0 And FIB rule tests: # ./fib_rule_tests.sh [...] Tests passed: 116 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20	netfilter: nft_fib: Mask upper DSCP bits before FIB lookup	Ido Schimmel	1	-3/+1
	As part of its functionality, the nftables FIB expression module performs a FIB lookup, but unlike other users of the FIB lookup API, it does so without masking the upper DSCP bits. In particular, this differs from the equivalent iptables match ("rpfilter") that does mask the upper DSCP bits before the FIB lookup. Align the module to other users of the FIB lookup API and mask the upper DSCP bits using IPTOS_RT_MASK before the lookup. No regressions in nft_fib.sh: # ./nft_fib.sh PASS: fib expression did not cause unwanted packet drops PASS: fib expression did drop packets for 1.1.1.1 PASS: fib expression did drop packets for 1c3::c01d PASS: fib expression forward check with policy based routing Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20	ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family	Ido Schimmel	1	-1/+1
	The NETLINK_FIB_LOOKUP netlink family can be used to perform a FIB lookup according to user provided parameters and communicate the result back to user space. However, unlike other users of the FIB lookup API, the upper DSCP bits and the ECN bits of the DS field are not masked, which can result in the wrong result being returned. Solve this by masking the upper DSCP bits and the ECN bits using IPTOS_RT_MASK. The structure that communicates the request and the response is not exported to user space, so it is unlikely that this netlink family is actually in use [1]. [1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20	Merge branch 'net-smc-introduce-ringbufs-usage-statistics'	Paolo Abeni	5	-20/+90
	Wen Gu says: ==================== net/smc: introduce ringbufs usage statistics Currently, we have histograms that show the sizes of ringbufs that ever used by SMC connections. However, they are always incremental and since SMC allows the reuse of ringbufs, we cannot know the actual amount of ringbufs being allocated or actively used. So this patch set introduces statistics for the amount of ringbufs that actually allocated by link group and actively used by connections of a certain net namespace, so that we can react based on these memory usage information, e.g. active fallback to TCP. With appropriate adaptations of smc-tools, we can obtain these ringbufs usage information: $ smcr -d linkgroup LG-ID : 00000500 LG-Role : SERV LG-Type : ASYML VLAN : 0 PNET-ID : Version : 1 Conns : 0 Sndbuf : 12910592 B <- RMB : 12910592 B <- or $ smcr -d stats [...] RX Stats Data transmitted (Bytes) 869225943 (869.2M) Total requests 18494479 Buffer usage (Bytes) 12910592 (12.31M) <- [...] TX Stats Data transmitted (Bytes) 12760884405 (12.76G) Total requests 36988338 Buffer usage (Bytes) 12910592 (12.31M) <- [...] [...] Change log: v3->v2 - use new helper nla_put_uint() instead of nla_put_u64_64bit(). v2->v1 https://lore.kernel.org/r/20240807075939.57882-1-guwen@linux.alibaba.com/ - remove inline keyword in .c files. - use local variable in macros to avoid potential side effects. v1 https://lore.kernel.org/r/20240805090551.80786-1-guwen@linux.alibaba.com/ ==================== Link: https://patch.msgid.link/20240814130827.73321-1-guwen@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20	net/smc: introduce statistics for ringbufs usage of net namespace	Wen Gu	4	-16/+42
	The buffer size histograms in smc_stats, namely rx/tx_rmbsize, record the sizes of ringbufs for all connections that have ever appeared in the net namespace. They are incremental and we cannot know the actual ringbufs usage from these. So here introduces statistics for current ringbufs usage of existing smc connections in the net namespace into smc_stats, it will be incremented when new connection uses a ringbuf and decremented when the ringbuf is unused. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20	net/smc: introduce statistics for allocated ringbufs of link group	Wen Gu	3	-4/+48
	Currently we have the statistics on sndbuf/RMB sizes of all connections that have ever been on the link group, namely smc_stats_memsize. However these statistics are incremental and since the ringbufs of link group are allowed to be reused, we cannot know the actual allocated buffers through these. So here introduces the statistic on actual allocated ringbufs of the link group, it will be incremented when a new ringbuf is added into buf_list and decremented when it is deleted from buf_list. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20	net: remove redundant check in skb_shift()	Zhang Changzhong	1	-2/+1
	The check for '!to' is redundant here, since skb_can_coalesce() already contains this check. Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1723730983-22912-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20	mptcp: Remove unused declaration mptcp_sockopt_sync()	Yue Haibing	1	-1/+0
	Commit a1ab24e5fc4a ("mptcp: consolidate sockopt synchronization") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240816100404.879598-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20	net/mlx5: E-Switch, Remove unused declarations	Yue Haibing	1	-3/+0
	These are never implenmented since commit b691b1116e82 ("net/mlx5: Implement devlink port function cmds to control ipsec_packet"). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240816101550.881844-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20	igbvf: Remove two unused declarations	Yue Haibing	2	-2/+0
	There is no caller and implementations in tree. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240816101638.882072-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20	gve: Remove unused declaration gve_rx_alloc_rings()	Yue Haibing	1	-1/+0
	Commit f13697cc7a19 ("gve: Switch to config-aware queue allocation") convert this function to gve_rx_alloc_rings_gqi(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240816101906.882743-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20	tcp_metrics: use netlink policy for IPv6 addr len validation	Jakub Kicinski	1	-4/+6
	Use the netlink policy to validate IPv6 address length. Destination address currently has policy for max len set, and source has no policy validation. In both cases the code does the real check. With correct policy check the code can be removed. Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240816212245.467745-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-19	dt-binding: ptp: fsl,ptp: add pci1957,ee02 compatible string for fsl,enetc-ptp	Frank Li	1	-5/+17
	fsl,enetc-ptp is embedded pcie device. Add compatible string pci1957,ee02. Fix warning: arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-kbox-a-230-ls.dtb: ethernet@0,4: compatible:0: 'pci1957,ee02' is not one of ['fsl,etsec-ptp', 'fsl,fman-ptp-timer', 'fsl,dpaa2-ptp', 'fsl,enetc-ptp'] Signed-off-by: Frank Li <Frank.Li@nxp.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-17	bnx2x: Set ivi->vlan field as an integer	Simon Horman	1	-2/+2
	In bnx2x_get_vf_config(): * The vlan field of ivi is a 32-bit integer, it is used to store a vlan ID. * The vlan field of bulletin is a 16-bit integer, it is also used to store a vlan ID. In the current code, ivi->vlan is set using memset. But in the case of setting it to the value of bulletin->vlan, this involves reading 32 bits from a 16bit source. This is likely safe, as the following 6 bytes are padding in the same structure, but none the less, it seems undesirable. However, it is entirely unclear to me how this scheme works on big-endian systems. Resolve this by simply assigning integer values to ivi->vlan. Flagged by W=1 builds. f.e. gcc-14 reports: In function 'fortify_memcpy_chk', inlined from 'bnx2x_get_vf_config' at .../bnx2x_sriov.c:2655:4: .../fortify-string.h:580:25: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 580 \| __read_overflow2_field(q_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Link: https://patch.msgid.link/20240815-bnx2x-int-vlan-v1-1-5940b76e37ad@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-17	mpls: Reduce skb re-allocations due to skb_cow()	Christoph Paasch	1	-1/+1
	mpls_xmit() needs to prepend the MPLS-labels to the packet. That implies one needs to make sure there is enough space for it in the headers. Calling skb_cow() implies however that one wants to change even the playload part of the packet (which is not true for MPLS). Thus, call skb_cow_head() instead, which is what other tunnelling protocols do. Running a server with this comm it entirely removed the calls to pskb_expand_head() from the callstack in mpls_xmit() thus having significant CPU-reduction, especially at peak times. Cc: Roopa Prabhu <roopa@nvidia.com> Reported-by: Craig Taylor <cmtaylor@apple.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240815161201.22021-1-cpaasch@apple.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-17	net: txgbe: Remove unnecessary NULL check before free	Simon Horman	1	-2/+1
	Remove unnecessary NULL check before freeing using kvfree(). This function will ignore a NULL argument. Flagged by Coccinelle: .../txgbe_hw.c:187:2-8: WARNING: NULL check before some freeing functions is not needed. No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240815-txgbe-kvfree-v1-1-5ecf8656f555@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-17	net: ethernet: lantiq_etop: remove unused variable	Aleksander Jan Bajkowski	1	-1/+0
	Remove a variable that has never been used. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Link: https://patch.msgid.link/20240815074956.155224-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-17	docs: networking: Align documentation with behavior change	Tariq Toukan	1	-5/+5
	Following commit 9f7e8fbb91f8 ("net/mlx5: offset comp irq index in name by one"), which fixed the index in IRQ name to start once again from 0, we change the documentation accordingly. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://patch.msgid.link/20240815142343.2254247-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-17	dt-bindings: net: mdio: change nodename match pattern	Frank Li	1	-1/+1
	Change mdio.yaml nodename match pattern to '^mdio(-(bus\|external))?(@.+\|-([0-9]+))$' Fix mdio.yaml wrong parser mdio controller's address instead phy's address when mdio-mux exista. For example: mdio-mux-emi1@54 { compatible = "mdio-mux-mmioreg", "mdio-mux"; mdio@20 { reg = <0x20>; ^^^ This is mdio controller register ethernet-phy@2 { reg = <0x2>; ^^^ This phy's address }; }; }; Only phy's address is limited to 31 because MDIO bus definition. But CHECK_DTBS report below warning: arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: mdio-mux-emi1@54: mdio@20:reg:0:0: 32 is greater than the maximum of 31 The reason is that "mdio-mux-emi1@54" match "nodename: '^mdio(@.*)?'" in mdio.yaml. Change to '^mdio(-(bus\|external))?(@.+\|-([0-9]+))?$' to avoid wrong match mdio mux controller's node. Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240815163408.4184705-1-Frank.Li@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-16	Merge branch '40GbE' of ↵	Jakub Kicinski	22	-90/+4792
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: iavf: add support for TC U32 filters on VFs Ahmed Zaki says: The Intel Ethernet 800 Series is designed with a pipeline that has an on-chip programmable capability called Dynamic Device Personalization (DDP). A DDP package is loaded by the driver during probe time. The DDP package programs functionality in both the parser and switching blocks in the pipeline, allowing dynamic support for new and existing protocols. Once the pipeline is configured, the driver can identify the protocol and apply any HW action in different stages, for example, direct packets to desired hardware queues (flow director), queue groups or drop. Patches 1-8 introduce a DDP package parser API that enables different pipeline stages in the driver to learn the HW parser capabilities from the DDP package that is downloaded to HW. The parser library takes raw packet patterns and masks (in binary) indicating the packet protocol fields to be matched and generates the final HW profiles that can be applied at the required stage. With this API, raw flow filtering for FDIR or RSS could be done on new protocols or headers without any driver or Kernel updates (only need to update the DDP package). These patches were submitted before [1] but were not accepted mainly due to lack of a user. Patches 9-11 extend the virtchnl support to allow the VF to request raw flow director filters. Upon receiving the raw FDIR filter request, the PF driver allocates and runs a parser lib instance and generates the hardware profile definitions required to program the FDIR stage. These were also submitted before [2]. Finally, patches 12 and 13 add TC U32 filter support to the iavf driver. Using the parser API, the ice driver runs the raw patterns sent by the user and then adds a new profile to the FDIR stage associated with the VF's VSI. Refer to examples in patch 13 commit message. [1]: https://lore.kernel.org/netdev/20230904021455.3944605-1-junfeng.guo@intel.com/ [2]: https://lore.kernel.org/intel-wired-lan/20230818064703.154183-1-junfeng.guo@intel.com/ * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: iavf: add support for offloading tc U32 cls filters iavf: refactor add/del FDIR filters ice: enable FDIR filters from raw binary patterns for VFs ice: add method to disable FDIR SWAP option virtchnl: support raw packet in protocol header ice: add API for parser profile initialization ice: add UDP tunnels support to the parser ice: support turning on/off the parser's double vlan mode ice: add parser execution main loop ice: add parser internal helper functions ice: add debugging functions for the parser sections ice: parse and init various DDP parser sections ice: add parser create and destroy skeleton ==================== Link: https://patch.msgid.link/20240813222249.3708070-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-16	idpf: remove redundant 'req_vec_chunks' NULL check	Pavan Kumar Linga	1	-18/+5
	'req_vec_chunks' is used to store the vector info received from the device control plane. The memory for it is allocated in idpf_send_alloc_vectors_msg and returns an error if the memory allocation fails. 'req_vec_chunks' cannot be NULL in the later code flow. So remove the conditional check to extract the vector ids received from the device control plane. Smatch static checker warning: drivers/net/ethernet/intel/idpf/idpf_lib.c:417 idpf_intr_req() error: we previously assumed 'adapter->req_vec_chunks' could be null (see line 360) Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/intel-wired-lan/a355ae8a-9011-4a85-a4d1-5b2793bb5f7b@stanley.mountain/ Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240814175903.4166390-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-16	Merge branch 'use-more-devm-for-ag71xx'	Jakub Kicinski	1	-63/+11
	Rosen Penev says: ==================== use more devm for ag71xx Some of these were introduced after the driver got introduced. In any case, using more devm allows removal of the remove function and overall simplifies the code. All of these were tested on a TP-LINK Archer C7v2. ==================== Link: https://patch.msgid.link/20240813170516.7301-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-16	net: ag71xx: use devm for register_netdev	Rosen Penev	1	-13/+1
	Allows completely removing the remove function. Nothing is being done manually now. Tested on TP-LINK Archer C7v2. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20240813170516.7301-4-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-16	net: ag71xx: use devm for of_mdiobus_register	Rosen Penev	1	-20/+3
	Allows removing ag71xx_mdio_remove. Removed ag.mii_bus variable. Local one can be used with devm. Easier to reason about as mii_bus is only used here now. Also shrinks the struct slightly. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20240813170516.7301-3-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-16	net: ag71xx: devm_clk_get_enabled	Rosen Penev	1	-30/+7
	Allows removal of clk_prepare_enable to simplify the code slightly. Tested on a TP-LINK Archer C7v2. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20240813170516.7301-2-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>