summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2026-01-20netkit: Add netkit notifier to check for unregistering devicesDaniel Borkmann1-0/+6
Add a netdevice notifier in netkit to watch for NETDEV_UNREGISTER events. If the target device is indeed NETREG_UNREGISTERING and previously leased a queue to a netkit device, then collect the related netkit devices and batch-unregister_netdevice_many() them. If this would not be done, then the netkit device would hold a reference on the physical device preventing it from going away. However, in case of both io_uring zero-copy as well as AF_XDP this situation is handled gracefully and the allocated resources are torn down. In the case where mentioned infra is used through netkit, the applications have a reference on netkit, and netkit in turn holds a reference on the physical device. In order to have netkit release the reference on the physical device, we need such watcher to then unregister the netkit ones. This is generally quite similar to the dependency handling in case of tunnels (e.g. vxlan bound to a underlying netdev) where the tunnel device gets removed along with the physical device. # ip a [...] 4: enp10s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e8:eb:d3:a3:43:f6 brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 scope global enp10s0f0np0 valid_lft forever preferred_lft forever [...] 8: nk@NONE: <BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff [...] # rmmod mlx5_ib # rmmod mlx5_core [ 309.261822] mlx5_core 0000:0a:00.0 mlx5_0: Port: 1 Link DOWN [ 344.235236] mlx5_core 0000:0a:00.1: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 344.246948] mlx5_core 0000:0a:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 344.463754] mlx5_core 0000:0a:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 344.770155] mlx5_core 0000:0a:00.1: E-Switch: cleanup [ 345.345709] mlx5_core 0000:0a:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 345.357524] mlx5_core 0000:0a:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 350.995989] mlx5_core 0000:0a:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 351.574396] mlx5_core 0000:0a:00.0: E-Switch: cleanup # ip a [...] [ both enp10s0f0np0 and nk gone ] [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260115082603.219152-12-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20netkit: Add single device mode for netkitDaniel Borkmann1-0/+6
Add a single device mode for netkit instead of netkit pairs. The primary target for the paired devices is to connect network namespaces, of course, and support has been implemented in projects like Cilium [0]. For the rxq leasing the plan is to support two main scenarios related to single device mode: * For the use-case of io_uring zero-copy, the control plane can either set up a netkit pair where the peer device can perform rxq leasing which is then tied to the lifetime of the peer device, or the control plane can use a regular netkit pair to connect the hostns to a Pod/container and dynamically add/remove rxq leasing through a single device without having to interrupt the device pair. In the case of io_uring, the memory pool is used as skb non-linear pages, and thus the skb will go its way through the regular stack into netkit. Things like the netkit policy when no BPF is attached or skb scrubbing etc apply as-is in case the paired devices are used, or if the backend memory is tied to the single device and traffic goes through a paired device. * For the use-case of AF_XDP, the control plane needs to use netkit in the single device mode. The single device mode currently enforces only a pass policy when no BPF is attached, and does not yet support BPF link attachments for AF_XDP. skbs sent to that device get dropped at the moment. Given AF_XDP operates at a lower layer of the stack tying this to the netkit pair did not make sense. In future, the plan is to allow BPF at the XDP layer which can: i) process traffic coming from the AF_XDP application (e.g. QEMU with AF_XDP backend) to filter egress traffic or to push selected egress traffic up to the single netkit device to the local stack (e.g. DHCP requests), and ii) vice-versa skbs sent to the single netkit into the AF_XDP application (e.g. DHCP replies). Also, the control-plane can dynamically manage rxq leasing for the single netkit device without having to interrupt (e.g. down/up cycle) the main netkit pair for the Pod which has traffic going in and out. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jordan Rife <jordan@jrife.io> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://docs.cilium.io/en/stable/operations/performance/tuning/#netkit-device-mode [0] Link: https://patch.msgid.link/20260115082603.219152-10-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20net: Proxy net_mp_{open,close}_rxq for leased queuesDavid Wei2-2/+6
When a process in a container wants to setup a memory provider, it will use the virtual netdev and a leased rxq, and call net_mp_{open,close}_rxq to try and restart the queue. At this point, proxy the queue restart on the real rxq in the physical netdev. For memory providers (io_uring zero-copy rx and devmem), it causes the real rxq in the physical netdev to be filled from a memory provider that has DMA mapped memory from a process within a container. Signed-off-by: David Wei <dw@davidwei.uk> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260115082603.219152-6-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20net: Add lease info to queue-get responseDaniel Borkmann1-0/+10
Populate nested lease info to the queue-get response that returns the ifindex, queue id with type and optionally netns id if the device resides in a different netns. Example with ynl client: # ip a [...] 4: enp10s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:24 qdisc mq state UP group default qlen 1000 link/ether e8:eb:d3:a3:43:f6 brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 scope global enp10s0f0np0 valid_lft forever preferred_lft forever inet6 fe80::eaeb:d3ff:fea3:43f6/64 scope link proto kernel_ll valid_lft forever preferred_lft forever [...] # ethtool -i enp10s0f0np0 driver: mlx5_core [...] # ./pyynl/cli.py \ --spec ~/netlink/specs/netdev.yaml \ --do queue-get \ --json '{"ifindex": 4, "id": 15, "type": "rx"}' {'id': 15, 'ifindex': 4, 'lease': {'ifindex': 8, 'netns-id': 0, 'queue': {'id': 1, 'type': 'rx'}}, 'napi-id': 8227, 'type': 'rx', 'xsk': {}} # ip netns list foo (id: 0) # ip netns exec foo ip a [...] 8: nk@NONE: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff inet6 fe80::200:ff:fe00:0/64 scope link proto kernel_ll valid_lft forever preferred_lft forever [...] # ip netns exec foo ethtool -i nk driver: netkit [...] # ip netns exec foo ls /sys/class/net/nk/queues/ rx-0 rx-1 tx-0 # ip netns exec foo ./pyynl/cli.py \ --spec ~/netlink/specs/netdev.yaml \ --do queue-get \ --json '{"ifindex": 8, "id": 1, "type": "rx"}' {'id': 1, 'ifindex': 8, 'type': 'rx'} Note that the caller of netdev_nl_queue_fill_one() holds the netdevice lock. For the queue-get we do not lock both devices. When queues get {un,}leased, both devices are locked, thus if __netif_get_rx_queue_peer() returns true, the peer pointer points to a valid device. The netns-id is fetched via peernet2id_alloc() similarly as done in OVS. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260115082603.219152-4-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20net: Implement netdev_nl_queue_create_doitDaniel Borkmann3-6/+24
Implement netdev_nl_queue_create_doit which creates a new rx queue in a virtual netdev and then leases it to a rx queue in a physical netdev. Example with ynl client: # ./pyynl/cli.py \ --spec ~/netlink/specs/netdev.yaml \ --do queue-create \ --json '{"ifindex": 8, "type": "rx", "lease": {"ifindex": 4, "queue": {"type": "rx", "id": 15}}}' {'id': 1} Note that the netdevice locking order is always from the virtual to the physical device. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260115082603.219152-3-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20net: Add queue-create operationDaniel Borkmann1-0/+11
Add a ynl netdev family operation called queue-create that creates a new queue on a netdevice: name: queue-create attribute-set: queue flags: [admin-perm] do: request: attributes: - ifindex - type - lease reply: &queue-create-op attributes: - id This is a generic operation such that it can be extended for various use cases in future. Right now it is mandatory to specify ifindex, the queue type which is enforced to rx and a lease. The newly created queue id is returned to the caller. A queue from a virtual device can have a lease which refers to another queue from a physical device. This is useful for memory providers and AF_XDP operations which take an ifindex and queue id to allow applications to bind against virtual devices in containers. The lease couples both queues together and allows to proxy the operations from a virtual device in a container to the physical device. In future, the nested lease attribute can be lifted and made optional for other use-cases such as dynamic queue creation for physical netdevs. The lack of lease and the specification of the physical device as an ifindex will imply that we need a real queue to be allocated. Similarly, the queue type enforcement to rx can then be lifted as well to support tx. An early implementation had only driver-specific integration [0], but in order for other virtual devices to reuse, it makes sense to have this as a generic API in core net. For leasing queues, the virtual netdev must have real_num_rx_queue less than num_rx_queues at the time of calling queue-create. The queue-type must be rx as only rx queues are supported for leasing for now. We also enforce that the queue-create ifindex must point to a virtual device, and that the nested lease attribute's ifindex must point to a physical device. The nested lease attribute set contains a netns-id attribute which is currently only intended for dumping as part of the queue-get operation. Also, it is modeled as an s32 type similarly as done elsewhere in the stack. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf [0] Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260115082603.219152-2-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-19net: ethtool: Add support for 80Gbps speedMika Westerberg2-3/+5
USB4 v2 link used in peer-to-peer networking is symmetric 80Gbps so in order to support reading this link speed, add support for it to ethtool. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260115115646.328898-3-mika.westerberg@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19dpll: add dpll_device op to set working modeIvan Vecera1-0/+2
Currently, userspace can retrieve the DPLL working mode but cannot configure it. This prevents changing the device operation, such as switching from manual to automatic mode and vice versa. Add a new callback .mode_set() to struct dpll_device_ops. Extend the netlink policy and device-set command handling to process the DPLL_A_MODE attribute. Update the netlink YAML specification to include the mode attribute in the device-set operation. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20260114122726.120303-3-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19dpll: add dpll_device op to get supported modesIvan Vecera1-0/+3
Currently, the DPLL subsystem assumes that the only supported mode is the one currently active on the device. When dpll_msg_add_mode_supported() is called, it relies on ops->mode_get() and reports that single mode to userspace. This prevents users from discovering other modes the device might be capable of. Add a new callback .supported_modes_get() to struct dpll_device_ops. This allows drivers to populate a bitmap indicating all modes supported by the hardware. Update dpll_msg_add_mode_supported() to utilize this new callback: * if ops->supported_modes_get is defined, use it to retrieve the full bitmap of supported modes. * if not defined, fall back to the existing behavior: retrieve the current mode via ops->mode_get and set the corresponding bit in the bitmap. Finally, iterate over the bitmap and add a DPLL_A_MODE_SUPPORTED netlink attribute for every set bit, accurately reporting the device's capabilities to userspace. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20260114122726.120303-2-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19ipv6: annotate data-races in ip6_multipath_hash_{policy,fields}()Eric Dumazet1-2/+2
Add missing READ_ONCE() when reading sysctl values. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19ipv6: annotate date-race in ipv6_can_nonlocal_bind()Eric Dumazet1-3/+3
Add a missing READ_ONCE(), and add const qualifiers to the two parameters. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19ipv6: annotate data-races from ip6_make_flowlabel()Eric Dumazet1-10/+14
Use READ_ONCE() to read sysctl values in ip6_make_flowlabel() and ip6_make_flowlabel() Add a const qualifier to 'struct net' parameters. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19ipv6: add sysctl_ipv6_flowlabel groupEric Dumazet1-3/+7
Group together following struct netns_sysctl_ipv6 fields: - flowlabel_consistency - auto_flowlabels - flowlabel_state_ranges After this patch, ip6_make_flowlabel() uses a single cache line to fetch auto_flowlabels and flowlabel_state_ranges (instead of two before the patch). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-18tcp: move tcp_rate_skb_sent() to tcp_output.cEric Dumazet1-1/+0
It is only called from __tcp_transmit_skb() and __tcp_retransmit_skb(). Move it in tcp_output.c and make it static. clang compiler is now able to inline it from __tcp_transmit_skb(). gcc compiler inlines it in the two callers, which is also fine. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20260114165109.1747722-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-18netfilter: uapi: Use UAPI definition of INT_MAX and INT_MINThomas Weißschuh3-15/+10
Using <limits.h> to gain access to INT_MAX and INT_MIN introduces a dependency on a libc, which UAPI headers should not do. Use the equivalent UAPI constants. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260113-uapi-limits-v2-3-93c20f4b2c1a@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-18ethtool: uapi: Use UAPI definition of INT_MAXThomas Weißschuh1-5/+2
Using <limits.h> to gain access to INT_MAX introduces a dependency on a libc, which UAPI headers should not do. Use the equivalent UAPI constant. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260113-uapi-limits-v2-2-93c20f4b2c1a@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-18uapi: add INT_MAX and INT_MIN constantsThomas Weißschuh1-0/+8
Some UAPI headers use INT_MAX and INT_MIN. Currently they include <limits.h> for their definitions, which introduces a problematic dependency on libc. Add custom, namespaced definitions of INT_MAX and INT_MIN using the same values as the regular kernel code. These definitions are not added to uapi/linux/limits.h, as that header will conflict with libc definitions on some platforms. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260113-uapi-limits-v2-1-93c20f4b2c1a@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-16net: phy: remove unused fixup unregistering functionsHeiner Kallweit1-4/+0
No user of PHY fixups unregisters these. IOW: The fixup unregistering functions are unused and can be removed. Remove also documentation for these functions. Whilst at it, remove also mentioning of phy_register_fixup() from the Documentation, as this function has been static since ea47e70e476f ("net: phy: remove fixup-related definitions from phy.h which are not used outside phylib"). Fixup unregistering functions were added with f38e7a32ee4f ("phy: add phy fixup unregister functions") in 2016, and last user was removed with 6782d06a47ad ("net: usb: lan78xx: Remove KSZ9031 PHY fixup") in 2024. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/ff8ac321-435c-48d0-b376-fbca80c0c22e@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-16Merge tag 'phy_common_properties' of ↵Jakub Kicinski2-0/+36
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Vinod Koul says: ==================== phy common properties Introduce "rx-polarity" and "tx-polarity" device tree properties with Kunit tests (from Vladimir Oltean). * tag 'phy_common_properties' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: phy: add phy_get_rx_polarity() and phy_get_tx_polarity() dt-bindings: phy-common-props: RX and TX lane polarity inversion dt-bindings: phy-common-props: ensure protocol-names are unique dt-bindings: phy-common-props: create a reusable "protocol-names" definition dt-bindings: phy: rename transmit-amplitude.yaml to phy-common-props.yaml ==================== Link: https://patch.msgid.link/aWeXvFcGNK5T6As9@vaman Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-16ethtool: Clarify len/n_stats fields in/out semanticsGal Pressman1-2/+16
Document that the 'len' field in ethtool_gstrings and 'n_stats' field in ethtool_stats optionally serve dual purposes: on entry they specify the number of items requested, and on return they indicate the number actually returned (which is not necessarily the same). Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://patch.msgid.link/20260115060544.481550-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski31-233/+191
Cross-merge networking fixes after downstream PR (net-6.19-rc6). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-15Merge tag 'net-6.19-rc6' of ↵Linus Torvalds6-10/+46
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth, can and IPsec. Current release - regressions: - net: add net.core.qdisc_max_burst - can: propagate CAN device capabilities via ml_priv Previous releases - regressions: - dst: fix races in rt6_uncached_list_del() and rt_del_uncached_list() - ipv6: fix use-after-free in inet6_addr_del(). - xfrm: fix inner mode lookup in tunnel mode GSO segmentation - ip_tunnel: spread netdev_lockdep_set_classes() - ip6_tunnel: use skb_vlan_inet_prepare() in __ip6_tnl_rcv() - bluetooth: hci_sync: enable PA sync lost event - eth: virtio-net: - fix the deadlock when disabling rx NAPI - fix misalignment bug in struct virtnet_info Previous releases - always broken: - ipv4: ip_gre: make ipgre_header() robust - can: fix SSP_SRC in cases when bit-rate is higher than 1 MBit. - eth: - mlx5e: profile change fix - octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback - macvlan: fix possible UAF in macvlan_forward_source()" * tag 'net-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits) virtio_net: Fix misalignment bug in struct virtnet_info net: can: j1939: j1939_xtp_rx_rts_session_active(): deactivate session upon receiving the second rts can: raw: instantly reject disabled CAN frames can: propagate CAN device capabilities via ml_priv Revert "can: raw: instantly reject unsupported CAN frames" net/sched: sch_qfq: do not free existing class in qfq_change_class() selftests: drv-net: fix RPS mask handling for high CPU numbers selftests: drv-net: fix RPS mask handling in toeplitz test ipv6: Fix use-after-free in inet6_addr_del(). dst: fix races in rt6_uncached_list_del() and rt_del_uncached_list() net: hv_netvsc: reject RSS hash key programming without RX indirection table tools: ynl: render event op docs correctly net: add net.core.qdisc_max_burst net: airoha: Fix typo in airoha_ppe_setup_tc_block_cb definition net: phy: motorcomm: fix duplex setting error for phy leds net: octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback net/mlx5e: Restore destroying state bit after profile cleanup net/mlx5e: Pass netdev to mlx5e_destroy_netdev instead of priv net/mlx5e: Don't store mlx5e_priv in mlx5e_dev devlink priv net/mlx5e: Fix crash on profile change rollback failure ...
2026-01-15net: Introduce netif_xmit_timeout_ms() helperShahar Shitrit1-0/+11
Introduce a new helper function netif_xmit_timeout_ms() to check if a TX queue is stopped and has timed out and report the timeout duration. This makes the timeout logic reusable, and will be used in several places in subsequent patches. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768209383-1546791-2-git-send-email-tariqt@nvidia.com Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-15xsk: move cq_cached_prod_lock to avoid touching a cacheline in sending pathJason Xing1-5/+0
We (Paolo and I) noticed that in the sending path touching an extra cacheline due to cq_cached_prod_lock will impact the performance. After moving the lock from struct xsk_buff_pool to struct xsk_queue, the performance is increased by ~5% which can be observed by xdpsock. An alternative approach [1] can be using atomic_try_cmpxchg() to have the same effect. But unfortunately I don't have evident performance numbers to prove the atomic approach is better than the current patch. The advantage is to save the contention time among multiple xsks sharing the same pool while the disadvantage is losing good maintenance. The full discussion can be found at the following link. [1]: https://lore.kernel.org/all/20251128134601.54678-1-kerneljasonxing@gmail.com/ Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://patch.msgid.link/20260104012125.44003-3-kerneljasonxing@gmail.com Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-15can: propagate CAN device capabilities via ml_privOliver Hartkopp2-0/+25
Commit 1a620a723853 ("can: raw: instantly reject unsupported CAN frames") caused a sequence of dependency and linker fixes. Instead of accessing CAN device internal data structures which caused the dependency problems this patch introduces capability information into the CAN specific ml_priv data which is accessible from both sides. With this change the CAN network layer can check the required features and the decoupling of the driver layer and network layer is restored. Fixes: 1a620a723853 ("can: raw: instantly reject unsupported CAN frames") Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20260109144135.8495-3-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2026-01-15Revert "can: raw: instantly reject unsupported CAN frames"Oliver Hartkopp1-7/+0
This reverts commit 1a620a723853a0f49703c317d52dc6b9602cbaa8 and its follow-up fixes for the introduced dependency issues. commit 1a620a723853 ("can: raw: instantly reject unsupported CAN frames") commit cb2dc6d2869a ("can: Kconfig: select CAN driver infrastructure by default") commit 6abd4577bccc ("can: fix build dependency") commit 5a5aff6338c0 ("can: fix build dependency") The entire problem was caused by the requirement that a new network layer feature needed to know about the protocol capabilities of the CAN devices. Instead of accessing CAN device internal data structures which caused the dependency problems a better approach has been developed which makes use of CAN specific ml_priv data which is accessible from both sides. Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20260109144135.8495-2-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2026-01-14Merge tag 'scsi-fixes' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Only one core change (and one in doc only) the rest are drivers. The one core fix is for some inline encrypting drives that can't handle encryption requests on non-data commands (like error handling ones); it saves the request level encryption parameters in the eh_save structure so they can be cleared for error handling and restored after it is completed" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: host: mediatek: Make read-only array scale_us static const scsi: bfa: Update outdated comment scsi: mpt3sas: Update maintainer list scsi: ufs: core: Configure MCQ after link startup scsi: core: Fix error handler encryption support scsi: core: Correct documentation for scsi_test_unit_ready() scsi: ufs: dt-bindings: Fix several grammar errors
2026-01-14Merge tag 'media/v6.19-3' of ↵Linus Torvalds1-9/+0
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - ov02c10: some fixes related to preserving bayer pattern and horizontal control - ipu-bridge: Add quirks for some Dell XPS laptops with inverted sensors - mali-c55: Fix version identifier logic - rzg2l-cru: csi-2: fix RZ/V2H input sizes on some variants * tag 'media/v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: ov02c10: Remove unnecessary hflip and vflip pointers media: ipu-bridge: Add DMI quirk for Dell XPS laptops with upside down sensors media: ov02c10: Fix the horizontal flip control media: ov02c10: Adjust x-win/y-win when changing flipping to preserve bayer-pattern media: ov02c10: Fix bayer-pattern change after default vflip change media: rzg2l-cru: csi-2: Support RZ/V2H input sizes media: uapi: mali-c55-config: Remove version identifier media: mali-c55: Remove duplicated version check media: Documentation: mali-c55: Use v4l2-isp version identifier
2026-01-14phy: add phy_get_rx_polarity() and phy_get_tx_polarity()Vladimir Oltean1-0/+32
Add helpers in the generic PHY folder which can be used using 'select PHY_COMMON_PROPS' from Kconfig, without otherwise needing to enable GENERIC_PHY. These helpers need to deal with the slight messiness of the fact that the polarity properties are arrays per protocol, and with the fact that there is no default value mandated by the standard properties, all default values depend on driver and protocol (PHY_POL_NORMAL may be a good default for SGMII, whereas PHY_POL_AUTO may be a good default for PCIe). Push the supported mask of polarities to these helpers, to simplify drivers such that they don't need to validate what's in the device tree (or other firmware description). Add a KUnit test suite to make sure that the API produces the expected results. The fact that we use fwnode structures means we can validate with software nodes, and as opposed to the device_property API, we can bypass the need to have a device structure. Co-developed-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260111093940.975359-6-vladimir.oltean@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-01-14dt-bindings: phy-common-props: RX and TX lane polarity inversionVladimir Oltean1-0/+4
Differential signaling is a technique for high-speed protocols to be more resilient to noise. At the transmit side we have a positive and a negative signal which are mirror images of each other. At the receiver, if we subtract the negative signal (say of amplitude -A) from the positive signal (say +A), we recover the original single-ended signal at twice its original amplitude. But any noise, like one coming from EMI from outside sources, is supposed to have an almost equal impact upon the positive (A + E, E being for "error") and negative signal (-A + E). So (A + E) - (-A + E) eliminates this noise, and this is what makes differential signaling useful. Except that in order to work, there must be strict requirements observed during PCB design and layout, like the signal traces needing to have the same length and be physically close to each other, and many others. Sometimes it is not easy to fulfill all these requirements, a simple case to understand is when on chip A's pins, the positive pin is on the left and the negative is on the right, but on the chip B's pins (with which A tries to communicate), positive is on the right and negative on the left. The signals would need to cross, using vias and other ugly stuff that affects signal integrity (introduces impedance discontinuities which cause reflections, etc). So sometimes, board designers intentionally connect differential lanes the wrong way, and expect somebody else to invert that signal to recover useful data. This is where RX and TX polarity inversion comes in as a generic concept that applies to any high-speed serial protocol as long as it uses differential signaling. I've stopped two attempts to introduce more vendor-specific descriptions of this only in the past month: https://lore.kernel.org/linux-phy/20251110110536.2596490-1-horatiu.vultur@microchip.com/ https://lore.kernel.org/netdev/20251028000959.3kiac5kwo5pcl4ft@skbuf/ and in the kernel we already have merged: - "st,px_rx_pol_inv" - "st,pcie-tx-pol-inv" - "st,sata-tx-pol-inv" - "mediatek,pnswap" - "airoha,pnswap-rx" - "airoha,pnswap-tx" and maybe more. So it is pretty general. One additional element of complexity is introduced by the fact that for some protocols, receivers can automatically detect and correct for an inverted lane polarity (example: the PCIe LTSSM does this in the Polling.Configuration state; the USB 3.1 Link Layer Test Specification says that the detection and correction of the lane polarity inversion in SuperSpeed operation shall be enabled in Polling.RxEQ.). Whereas for other protocols (SGMII, SATA, 10GBase-R, etc etc), the polarity is all manual and there is no detection mechanism mandated by their respective standards. So why would one even describe rx-polarity and tx-polarity for protocols like PCIe, if it had to always be PHY_POL_AUTO? Related question: why would we define the polarity as an array per protocol? Isn't the physical PCB layout protocol-agnostic, and aren't we describing the same physical reality from the lens of different protocols? The answer to both questions is because multi-protocol PHYs exist (supporting e.g. USB2 and USB3, or SATA and PCIe, or PCIe and Ethernet over the same lane), one would need to manually set the polarity for SATA/Ethernet, while leaving it at auto for PCIe/USB 3.0+. I also investigated from another angle: what if polarity inversion in the PHY is one layer, and then the PCIe/USB3 LTSSM polarity detection is another layer on top? Then rx-polarity = <PHY_POL_AUTO> doesn't make sense, it can still be rx-polarity = <PHY_POL_NORMAL> or <PHY_POL_INVERT>, and the link training state machine figures things out on top of that. This would radically simplify the design, as the elimination of PHY_POL_AUTO inherently means that the need for a property array per protocol also goes away. I don't know how things are in the general case, but at least in the 10G and 28G Lynx SerDes blocks from NXP Layerscape devices, this isn't the case, and there's only a single level of RX polarity inversion: in the SerDes lane. In the case of PCIe, the controller is in charge of driving the RDAT_INV bit autonomously, and it is read-only to software. So the existence of this kind of SerDes lane proves the need for PHY_POL_AUTO to be a third state. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20260111093940.975359-5-vladimir.oltean@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-01-14net: mana: Implement ndo_tx_timeout and serialize queue resets per port.Dipayaan Roy2-2/+8
Implement .ndo_tx_timeout for MANA so any stalled TX queue can be detected and a device-controlled port reset for all queues can be scheduled to a ordered workqueue. The reset for all queues on stall detection is recomended by hardware team. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260112130552.GA11785@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-14net: phy: Only rely on phy_port for PHY-driven SFPMaxime Chevallier1-6/+0
Now that all PHY drivers that support downstream SFP have been converted to phy_port serdes handling, we can make the generic PHY SFP handling mandatory, thus making all phylib sfp helpers static. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260108080041.553250-14-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-14net: phy: marvell10g: Support SFP through phy_portMaxime Chevallier1-0/+1
Convert the Marvell10G driver to use the generic SFP handling, through a dedicated .attach_port() handler to populate the port's supported interfaces. As the 88x3310 supports multiple MDI, the .attach_port() logic handles both SFP attach with 10GBaseR support, and support for the "regular" port that usually is a BaseT port. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260108080041.553250-11-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-14net: phy: Introduce generic SFP handling for PHY driversMaxime Chevallier2-0/+4
There are currently 4 PHY drivers that can drive downstream SFPs: marvell.c, marvell10g.c, at803x.c and marvell-88x2222.c. Most of the logic is boilerplate, either calling into generic phylib helpers (for SFP PHY attach, bus attach, etc.) or performing the same tasks with a bit of validation : - Getting the module's expected interface mode - Making sure the PHY supports it - Optionaly perform some configuration to make sure the PHY outputs the right mode This can be made more generic by leveraging the phy_port, and its configure_mii() callback which allows setting a port's interfaces when the port is a serdes. Introduce a generic PHY SFP support. If a driver doesn't probe the SFP bus itself, but an SFP phandle is found in devicetree/firmware, then the generic PHY SFP support will be used, relying on port ops. PHY driver need to : - Register a .attach_port() callback - When a serdes port is registered to the PHY, drivers must set port->interfaces to the set of PHY_INTERFACE_MODE the port can output - If the port has limitations regarding speed, duplex and aneg, the port can also fine-tune the final linkmodes that can be supported - The port may register a set of ops, including .configure_mii(), that will be called at module_insert time to adjust the interface based on the module detected. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260108080041.553250-8-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-14net: phy: Introduce PHY ports representationMaxime Chevallier3-0/+162
Ethernet provides a wide variety of layer 1 protocols and standards for data transmission. The front-facing ports of an interface have their own complexity and configurability. Introduce a representation of these front-facing ports. The current code is minimalistic and only support ports controlled by PHY devices, but the plan is to extend that to SFP as well as raw Ethernet MACs that don't use PHY devices. This minimal port representation allows describing the media and number of pairs of a BaseT port. From that information, we can derive the linkmodes usable on the port, which can be used to limit the capabilities of an interface. For now, the port pairs and medium is derived from devicetree, defined by the PHY driver, or populated with default values (as we assume that all PHYs expose at least one port). The typical example is 100M ethernet. 100BaseTX works using only 2 pairs on a Cat 5 cables. However, in the situation where a 10/100/1000 capable PHY is wired to its RJ45 port through 2 pairs only, we have no way of detecting that. The "max-speed" DT property can be used, but a more accurate representation can be used : mdi { connector-0 { media = "BaseT"; pairs = <2>; }; }; From that information, we can derive the max speed reachable on the port. Another benefit of having that is to avoid vendor-specific DT properties (micrel,fiber-mode or ti,fiber-mode). This basic representation is meant to be expanded, by the introduction of port ops, userspace listing of ports, and support for multi-port devices. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260108080041.553250-4-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-14net: ethtool: Introduce ETHTOOL_LINK_MEDIUM_* valuesMaxime Chevallier1-3/+22
In an effort to have a better representation of Ethernet ports, introduce enumeration values representing the various ethernet Mediums. This is part of the 802.3 naming convention, for example : 1000 Base T 4 | | | | | | | \_ pairs (4) | | \___ Medium (T == Twisted Copper Pairs) | \_______ Baseband transmission \____________ Speed Other example : 10000 Base K X 4 | | \_ lanes (4) | \___ encoding (BaseX is 8b/10b while BaseR is 66b/64b) \_____ Medium (K is backplane ethernet) In the case of representing a physical port, only the medium and number of pairs should be relevant. One exception would be 1000BaseX, which is currently also used as a medium in what appears to be any of 1000BaseSX, 1000BaseCX, 1000BaseLX, 1000BaseEX, 1000BaseBX10 and some other. This was reflected in the mediums associated with the 1000BaseX linkmode. These mediums are set in the net/ethtool/common.c lookup table that maintains a list of all linkmodes with their number of pairs, medium, encoding, speed and duplex. One notable exception to this is 100BaseT Ethernet. It emcompasses 100BaseTX, which is a 2-pairs protocol but also 100BaseT4, that will also work on 4-pairs cables. As we don't make a disctinction between these, the lookup table contains 2 sets of pair numbers, indicating the min number of pairs for a protocol to work and the "nominal" number of pairs as well. Another set of exceptions are linkmodes such 100000baseLR4_ER4, where the same link mode seems to represent 100GBaseLR4 and 100GBaseER4. The macro __DEFINE_LINK_MODE_PARAMS_MEDIUMS is here used to populate the .mediums bitfield with all appropriate mediums. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260108080041.553250-3-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-14Merge branch 'mlx5-next' of ↵Jakub Kicinski3-6/+17
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2026-01-13 * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Add IFC bits for extended ETS rate limit bandwidth value net/mlx5: Add support for querying bond speed net/mlx5: Handle port and vport speed change events in MPESW net/mlx5: Propagate LAG effective max_tx_speed to vports net/mlx5: Add max_tx_speed and its CAP bit to IFC ==================== Link: https://patch.msgid.link/1768299471-1603093-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13Merge tag 'hyperv-fixes-signed-20260112' of ↵Linus Torvalds1-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Minor fixes and cleanups for the MSHV driver * tag 'hyperv-fixes-signed-20260112' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: mshv: release mutex on region invalidation failure hyperv: Avoid -Wflex-array-member-not-at-end warning mshv: hide x86-specific functions on arm64 mshv: Initialize local variables early upon region invalidation mshv: Use PMD_ORDER instead of HPAGE_PMD_ORDER when processing regions
2026-01-13net/sched: sch_cake: share shaper state across sub-instances of cake_mqJonas Köppeler1-0/+1
This commit adds shared shaper state across the cake instances beneath a cake_mq qdisc. It works by periodically tracking the number of active instances, and scaling the configured rate by the number of active queues. The scan is lockless and simply reads the qlen and the last_active state variable of each of the instances configured beneath the parent cake_mq instance. Locking is not required since the values are only updated by the owning instance, and eventual consistency is sufficient for the purpose of estimating the number of active queues. The interval for scanning the number of active queues is set to 200 us. We found this to be a good tradeoff between overhead and response time. For a detailed analysis of this aspect see the Netdevconf talk: https://netdevconf.info/0x19/docs/netdev-0x19-paper16-talk-paper.pdf Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-5-8d613fece5d8@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-13net/sched: Export mq functions for reuseToke Høiland-Jørgensen1-0/+27
To enable the cake_mq qdisc to reuse code from the mq qdisc, export a bunch of functions from sch_mq. Split common functionality out from some functions so it can be composed with other code, and export other functions wholesale. To discourage wanton reuse, put the symbols into a new NET_SCHED_INTERNAL namespace, and a sch_priv.h header file. No functional change intended. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20260109-mq-cake-sub-qdisc-v8-1-8d613fece5d8@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-13net: add net.core.qdisc_max_burstEric Dumazet2-0/+7
In blamed commit, I added a check against the temporary queue built in __dev_xmit_skb(). Idea was to drop packets early, before any spinlock was acquired. if (unlikely(defer_count > READ_ONCE(q->limit))) { kfree_skb_reason(skb, SKB_DROP_REASON_QDISC_DROP); return NET_XMIT_DROP; } It turned out that HTB Qdisc has a zero q->limit. HTB limits packets on a per-class basis. Some of our tests became flaky. Add a new sysctl : net.core.qdisc_max_burst to control how many packets can be stored in the temporary lockless queue. Also add a new QDISC_BURST_DROP drop reason to better diagnose future issues. Thanks Neal ! Fixes: 100dfa74cad9 ("net: dev_queue_xmit() llist adoption") Reported-and-bisected-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20260107104159.3669285-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-13net/mlx5: Add IFC bits for extended ETS rate limit bandwidth valueAlexei Lazar1-3/+4
Add hardware interface definitions to support extended bandwidth rate limiting in the QoS Enhanced Transmission Selection (ETS) configuration. The new fields include: - max_bw_value: extended from 8-bit to 16-bit in ets_tcn_config_reg, simplifying the implementation by using a single field instead of separate MSB/LSB fields. - qetcr_qshr_max_bw_val_msb: capability bit in qcam_qos_feature_cap_mask indicating device support for the extended 16-bit max_bw_value field. These interface additions are prerequisites for increasing the per-TC rate limit beyond 255 Gbps to support higher-bandwidth NICs. Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768200608-1543180-1-git-send-email-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-13net: phy: realtek: add dummy PHY driver for RTL8127ATFHeiner Kallweit1-0/+7
RTL8127ATF supports a SFP+ port for fiber modules (10GBASE-SR/LR/ER/ZR and DAC). The list of supported modes was provided by Realtek. According to the r8127 vendor driver also 1G modules are supported, but this needs some more complexity in the driver, and only 10G mode has been tested so far. Therefore mainline support will be limited to 10G for now. The SFP port signals are hidden in the chip IP and driven by firmware. Therefore mainline SFP support can't be used here. This PHY driver is used by the RTL8127ATF support in r8169. RTL8127ATF reports the same PHY ID as the TP version. Therefore use a dummy PHY ID. This PHY driver is used by the RTL8127ATF support in r8169. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/e3d55162-210a-4fab-9abf-99c6954eee10@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13net: airoha: Fix typo in airoha_ppe_setup_tc_block_cb definitionLorenzo Bianconi1-2/+2
Fix Typo in airoha_ppe_dev_setup_tc_block_cb routine definition when CONFIG_NET_AIROHA is not enabled. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601090517.Fj6v501r-lkp@intel.com/ Fixes: f45fc18b6de04 ("net: airoha: Add airoha_ppe_dev struct definition") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260109-airoha_ppe_dev_setup_tc_block_cb-typo-v1-1-282e8834a9f9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13Merge tag 'wireless-next-2026-01-12' of ↵Jakub Kicinski3-12/+85
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== First set of changes for the current -next cycle, of note: - ath12k gets an overhaul to support multi-wiphy device wiphy and pave the way for future device support in the same driver (rather than splitting to ath13k) - mac80211 gets some better iteration macros * tag 'wireless-next-2026-01-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (120 commits) wifi: mac80211: remove width argument from ieee80211_parse_bitrates wifi: mac80211_hwsim: remove NAN by default wifi: mac80211: improve station iteration ergonomics wifi: mac80211: improve interface iteration ergonomics wifi: cfg80211: include S1G_NO_PRIMARY flag when sending channel wifi: mac80211: unexport ieee80211_get_bssid() wl1251: Replace strncpy with strscpy in wl1251_acx_fw_version wifi: iwlegacy: 3945-rs: remove redundant pointer check in il3945_rs_tx_status() and il3945_rs_get_rate() wifi: mac80211: don't send an unused argument to ieee80211_check_combinations wifi: libertas: fix WARNING in usb_tx_block wifi: mwifiex: Allocate dev name earlier for interface workqueue name wifi: wlcore: sdio: Use pm_ptr instead of #ifdef CONFIG_PM wifi: cfg80211: Fix use_for flag update on BSS refresh wifi: brcmfmac: rename function that frees vif wifi: brcmfmac: fix/add kernel-doc comments wifi: mac80211: Update csa_finalize to use link_id wifi: cfg80211: add cfg80211_stop_link() for per-link teardown wifi: ath12k: Skip DP peer creation for scan vdev wifi: ath12k: move firmware stats request outside of atomic context wifi: ath12k: add the missing RCU lock in ath12k_dp_tx_free_txbuf() ... ==================== Link: https://patch.msgid.link/20260112185836.378736-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-12Merge tag 'cgroup-for-6.19-rc5-fixes' of ↵Linus Torvalds1-11/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: - Fix -Wflex-array-member-not-at-end warnings in cgroup_root * tag 'cgroup-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Eliminate cgrp_ancestor_storage in cgroup_root
2026-01-12wifi: mac80211: improve station iteration ergonomicsJohannes Berg1-4/+25
Right now, the only way to iterate stations is to declare an iterator function, possibly data structure to use, and pass all that to the iteration helper function. This is annoying, and there's really no inherent need for it. Add a new for_each_station() macro that does the iteration in a more ergonomic way. To avoid even more exported functions, do the old ieee80211_iterate_stations_mtx() as an inline using the new way, which may also let the compiler optimise it a bit more, e.g. via inlining the iterator function. Link: https://patch.msgid.link/20260108143431.d2b641f6f6af.I4470024f7404446052564b15bcf8b3f1ada33655@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-12wifi: mac80211: improve interface iteration ergonomicsJohannes Berg1-6/+36
Right now, the only way to iterate interfaces is to declare an iterator function, possibly data structure to use, and pass all that to the iteration helper function. This is annoying, and there's really no inherent need for it, except it was easier to implement with the iflist mutex, but that's not used much now. Add a new for_each_interface() macro that does the iteration in a more ergonomic way. To avoid even more exported functions, do the old ieee80211_iterate_active_interfaces_mtx() as an inline using the new way, which may also let the compiler optimise it a bit more, e.g. via inlining the iterator function. Also provide for_each_active_interface() for the common case of just iterating active interfaces. Link: https://patch.msgid.link/20260108143431.f2581e0c381a.Ie387227504c975c109c125b3c57f0bb3fdab2835@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-12wifi: cfg80211: include S1G_NO_PRIMARY flag when sending channelLachlan Hodges1-0/+4
When sending a channel ensure we include the IEEE80211_CHAN_S1G_NO_PRIMARY flag. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20260109081439.3168-1-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-11treewide: Update email addressThomas Gleixner13-13/+13
In a vain attempt to consolidate the email zoo switch everything to the kernel.org account. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>