summaryrefslogtreecommitdiff
path: root/include/linux
diff options
context:
space:
mode:
authorJakub Kicinski <kuba@kernel.org>2026-05-19 04:49:09 +0300
committerJakub Kicinski <kuba@kernel.org>2026-05-19 04:49:09 +0300
commit7a348a95f696d20f15c776de4df8b4415bcf3d77 (patch)
treef1d36a4bbc9559749742d7ba41987bbaf8a128cc /include/linux
parentf79a0466a46f81e5b42458fcbec033280c841293 (diff)
parent28c1cc999fbb882745130e450e5109bc4b8869a4 (diff)
downloadlinux-7a348a95f696d20f15c776de4df8b4415bcf3d77.tar.xz
Merge branch 'net-devmem-support-devmem-with-netkit-devices'
Bobby Eshleman says: ==================== net: devmem: support devmem with netkit devices This series enables TCP devmem TX through netkit devices. Netkit now supports queue leasing. A physical NIC's RX queue can be leased to a netkit guest interface inside a container namespace. This gives the container a devmem-capable data path on the RX side (bind-rx, etc...). On the TX side, the container process binds to its netkit guest interface and sends traffic that netkit redirects (via BPF or ip forwarding) to the physical NIC for DMA. Two things in the existing devmem TX path prevent this from working: 1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will forward a dmabuf-backed (unreadable) skb. This protects skbs from landing on devices that don't have the IOMMU mappings for the backing dmabuf or that don't speak netmem. Netkit, however, does not support DMA, doesn't attempt to read unreadable skb pages and so doesn't break netmem (it is pure skb routing and redirection). It is functionally capable of routing unreadable skbs, but there is no way for the TX validation pathway to distinguish between a device that will actually attempt DMA-ing the skb and another device (like netkit) that does not DMA but also does not break netmem. 2. bind_tx_doit uses the bound device as the DMA device. When the user binds devmem TX to the netkit guest, the bind handler attempts to create DMA mappings against netkit, which has no DMA capability and no IOMMU mappings. This series solves these problems as follows: 1. Extend netmem_tx to two bits, assigned to one of three values: NETMEM_TX_NONE - netmem not supported NETMEM_TX_DMA - netmem supported and performs DMA NETMEM_TX_NO_DMA - netmem supported, but does not DMA With these bits, phys devices can set NETMEM_TX_DMA and devices like netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any DMA-capable netdev exactly matches the bound device, guaranteeing the correct mapping of the bound dmabuf. The validation TX path also allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices will not misuse netmem or run into IOMMU faults. After redirection or routing and the skb finally makes its way through the stack to a physical device's TX path, the above NETMEM_TX_DMA check is performed again to guarantee the device has the appropriate binding/mappings. 2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and finds the phys TX device and binds to that instead. For the netkit case, if it has been leased a queue from a DMA-capable device already, then the bind action is performed on the DMA-capable device instead and the dmabuf is mapped correctly. ==================== Link: https://patch.msgid.link/20260514-tcp-dm-netkit-v5-0-408c59b91e66@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'include/linux')
-rw-r--r--include/linux/netdevice.h10
1 files changed, 8 insertions, 2 deletions
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e7af71491a47..bf3dd9b2c1a7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1794,6 +1794,12 @@ enum netdev_stat_type {
NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
};
+enum netmem_tx_mode {
+ NETMEM_TX_NONE, /* no netmem TX support */
+ NETMEM_TX_DMA, /* DMA-capable netmem TX (real HW) */
+ NETMEM_TX_NO_DMA, /* no DMA, e.g. passthrough for virtual devs */
+};
+
enum netdev_reg_state {
NETREG_UNINITIALIZED = 0,
NETREG_REGISTERED, /* completed register_netdevice */
@@ -1815,7 +1821,7 @@ enum netdev_reg_state {
* @lltx: device supports lockless Tx. Deprecated for real HW
* drivers. Mainly used by logical interfaces, such as
* bonding and tunnels
- * @netmem_tx: device support netmem_tx.
+ * @netmem_tx: device netmem TX mode
*
* @name: This is the first field of the "visible" part of this structure
* (i.e. as seen by users in the "Space.c" file). It is the name
@@ -2138,7 +2144,7 @@ struct net_device {
struct_group(priv_flags_fast,
unsigned long priv_flags:32;
unsigned long lltx:1;
- unsigned long netmem_tx:1;
+ unsigned long netmem_tx:2;
);
const struct net_device_ops *netdev_ops;
const struct header_ops *header_ops;