<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/Documentation/admin-guide/sysctl/net.rst, branch v7.1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-12T09:59:36+00:00</updated>
<entry>
<title>vsock: add G2H fallback for CIDs not owned by H2G transport</title>
<updated>2026-03-12T09:59:36+00:00</updated>
<author>
<name>Alexander Graf</name>
<email>graf@amazon.com</email>
</author>
<published>2026-03-04T23:00:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0de607dc4fd80ede3b2a35e8a72f99c7a0bbc321'/>
<id>urn:sha1:0de607dc4fd80ede3b2a35e8a72f99c7a0bbc321</id>
<content type='text'>
When no H2G transport is loaded, vsock currently routes all CIDs to the
G2H transport (commit 65b422d9b61b ("vsock: forward all packets to the
host when no H2G is registered"). Extend that existing behavior: when
an H2G transport is loaded but does not claim a given CID, the
connection falls back to G2H in the same way.

This matters in environments like Nitro Enclaves, where an instance may
run nested VMs via vhost-vsock (H2G) while also needing to reach sibling
enclaves at higher CIDs through virtio-vsock-pci (G2H). With the old
code, any CID &gt; 2 was unconditionally routed to H2G when vhost was
loaded, making those enclaves unreachable without setting
VMADDR_FLAG_TO_HOST explicitly on every connect.

Requiring every application to set VMADDR_FLAG_TO_HOST creates friction:
tools like socat, iperf, and others would all need to learn about it.
The flag was introduced 6 years ago and I am still not aware of any tool
that supports it. Even if there was support, it would be cumbersome to
use. The most natural experience is a single CID address space where H2G
only wins for CIDs it actually owns, and everything else falls through to
G2H, extending the behavior that already exists when H2G is absent.

To give user space at least a hint that the kernel applied this logic,
automatically set the VMADDR_FLAG_TO_HOST on the remote address so it
can determine the path taken via getpeername().

Add a per-network namespace sysctl net.vsock.g2h_fallback (default 1).
At 0 it forces strict routing: H2G always wins for CID &gt; VMADDR_CID_HOST,
or ENODEV if H2G is not loaded.

Signed-off-by: Alexander Graf &lt;graf@amazon.com&gt;
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Stefano Garzarella &lt;sgarzare@redhat.com&gt;
Link: https://patch.msgid.link/20260304230027.59857-1-graf@amazon.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>vsock: document write-once behavior of the child_ns_mode sysctl</title>
<updated>2026-02-26T10:10:03+00:00</updated>
<author>
<name>Bobby Eshleman</name>
<email>bobbyeshleman@meta.com</email>
</author>
<published>2026-02-23T22:38:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b6302e057fdc8f199ddae736ecdf45029f892e5c'/>
<id>urn:sha1:b6302e057fdc8f199ddae736ecdf45029f892e5c</id>
<content type='text'>
Update the vsock child_ns_mode documentation to include the new
write-once semantics of setting child_ns_mode. The semantics are
implemented in a preceding patch in this series.

Signed-off-by: Bobby Eshleman &lt;bobbyeshleman@meta.com&gt;
Reviewed-by: Stefano Garzarella &lt;sgarzare@redhat.com&gt;
Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-3-c0cde6959923@meta.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>vsock: document namespace mode sysctls</title>
<updated>2026-02-18T01:20:21+00:00</updated>
<author>
<name>Stefano Garzarella</name>
<email>sgarzare@redhat.com</email>
</author>
<published>2026-02-16T16:31:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a07c33c6f2fc693bf9c67514fcc15d9d417f390d'/>
<id>urn:sha1:a07c33c6f2fc693bf9c67514fcc15d9d417f390d</id>
<content type='text'>
Add documentation for the vsock per-namespace sysctls (`ns_mode` and
`child_ns_mode`) to Documentation/admin-guide/sysctl/net.rst.
These sysctls were introduced by commit eafb64f40ca4 ("vsock: add
netns to vsock core").

Document the two namespace modes (`global` and `local`), the
inheritance behavior of `child_ns_mode`, and the restriction preventing
local namespaces from setting `child_ns_mode` to `global`.

Signed-off-by: Stefano Garzarella &lt;sgarzare@redhat.com&gt;
Tested-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Acked-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Link: https://patch.msgid.link/20260216163147.236844-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: expand NETDEV_RSS_KEY_LEN to 256 bytes</title>
<updated>2026-01-25T21:20:46+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2026-01-22T19:03:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=37b0ea8fef56ccc29bd5a07f235ac218f7f98379'/>
<id>urn:sha1:37b0ea8fef56ccc29bd5a07f235ac218f7f98379</id>
<content type='text'>
NETDEV_RSS_KEY_LEN has been set to 52 bytes in 2014, until now.

Jakub suggested we bump the size to 128 bytes or more.

Some drivers (like idpf) were already working around the core limit.

Since this change might cause some issues in admin scripts,
bump it directly to 256 in one go.

tjbp26:~# cat /proc/sys/net/core/netdev_rss_key | wc -c
768

tjbp26:~# ethtool -x eth1
RX flow hash indirection table for eth1 with 32 RX ring(s):
...
RSS hash key:
fe:16:5b:2f:93:85:c2:c9:c1:ef:bd:60:c6:e0:2b:99:4d:bf:b7:14:c8:1e:8d:cb:31:17:51:da:55:eb:91:d9:9e:f9:89:9b:44:a1:dc:08:72:3a:b3:d6:31:86:9a:fe:02:3a:0d:eb:a1:7c:f5:a3:51:3b:08:56:c9:3f:71:69:01:ba:70:38
RSS hash function:
    toeplitz: on
    xor: off
    crc32: off

Suggested-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/netdev/20260122075206.504ec591@kernel.org/
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Link: https://patch.msgid.link/20260122190349.2771064-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: add net.core.qdisc_max_burst</title>
<updated>2026-01-13T09:12:11+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2026-01-07T10:41:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ffe4ccd359d006eba559cb1a3c6113144b7fb38c'/>
<id>urn:sha1:ffe4ccd359d006eba559cb1a3c6113144b7fb38c</id>
<content type='text'>
In blamed commit, I added a check against the temporary queue
built in __dev_xmit_skb(). Idea was to drop packets early,
before any spinlock was acquired.

if (unlikely(defer_count &gt; READ_ONCE(q-&gt;limit))) {
	kfree_skb_reason(skb, SKB_DROP_REASON_QDISC_DROP);
	return NET_XMIT_DROP;
}

It turned out that HTB Qdisc has a zero q-&gt;limit.
HTB limits packets on a per-class basis.
Some of our tests became flaky.

Add a new sysctl : net.core.qdisc_max_burst to control
how many packets can be stored in the temporary lockless queue.

Also add a new QDISC_BURST_DROP drop reason to better diagnose
future issues.

Thanks Neal !

Fixes: 100dfa74cad9 ("net: dev_queue_xmit() llist adoption")
Reported-and-bisected-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Link: https://patch.msgid.link/20260107104159.3669285-1-edumazet@google.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>net: increase skb_defer_max default to 128</title>
<updated>2025-11-08T03:02:40+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-11-06T20:29:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b61785852ed0a0e7dc16b606157e4a0228cd76cf'/>
<id>urn:sha1:b61785852ed0a0e7dc16b606157e4a0228cd76cf</id>
<content type='text'>
skb_defer_max value is very conservative, and can be increased
to avoid too many calls to kick_defer_list_purge().

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Reviewed-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Link: https://patch.msgid.link/20251106202935.1776179-4-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: Introduce net.core.bypass_prot_mem sysctl.</title>
<updated>2025-10-16T19:04:47+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@google.com</email>
</author>
<published>2025-10-14T23:54:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b46ab63181ff973ddce44ebc9ac24b269d42f481'/>
<id>urn:sha1:b46ab63181ff973ddce44ebc9ac24b269d42f481</id>
<content type='text'>
If a socket has sk-&gt;sk_bypass_prot_mem flagged, the socket opts out
of the global protocol memory accounting.

Let's control the flag by a new sysctl knob.

The flag is written once during socket(2) and is inherited to child
sockets.

Tested with a script that creates local socket pairs and send()s a
bunch of data without recv()ing.

Setup:

  # mkdir /sys/fs/cgroup/test
  # echo $$ &gt;&gt; /sys/fs/cgroup/test/cgroup.procs
  # sysctl -q net.ipv4.tcp_mem="1000 1000 1000"
  # ulimit -n 524288

Without net.core.bypass_prot_mem, charged to tcp_mem &amp; memcg

  # python3 pressure.py &amp;
  # cat /sys/fs/cgroup/test/memory.stat | grep sock
  sock 22642688 &lt;-------------------------------------- charged to memcg
  # cat /proc/net/sockstat| grep TCP
  TCP: inuse 2006 orphan 0 tw 0 alloc 2008 mem 5376 &lt;-- charged to tcp_mem
  # ss -tn | head -n 5
  State Recv-Q Send-Q Local Address:Port  Peer Address:Port
  ESTAB 2000   0          127.0.0.1:34479    127.0.0.1:53188
  ESTAB 2000   0          127.0.0.1:34479    127.0.0.1:49972
  ESTAB 2000   0          127.0.0.1:34479    127.0.0.1:53868
  ESTAB 2000   0          127.0.0.1:34479    127.0.0.1:53554
  # nstat | grep Pressure || echo no pressure
  TcpExtTCPMemoryPressures        1                  0.0

With net.core.bypass_prot_mem=1, charged to memcg only:

  # sysctl -q net.core.bypass_prot_mem=1
  # python3 pressure.py &amp;
  # cat /sys/fs/cgroup/test/memory.stat | grep sock
  sock 2757468160 &lt;------------------------------------ charged to memcg
  # cat /proc/net/sockstat | grep TCP
  TCP: inuse 2006 orphan 0 tw 0 alloc 2008 mem 0 &lt;- NOT charged to tcp_mem
  # ss -tn | head -n 5
  State Recv-Q Send-Q  Local Address:Port  Peer Address:Port
  ESTAB 111000 0           127.0.0.1:36019    127.0.0.1:49026
  ESTAB 110000 0           127.0.0.1:36019    127.0.0.1:45630
  ESTAB 110000 0           127.0.0.1:36019    127.0.0.1:44870
  ESTAB 111000 0           127.0.0.1:36019    127.0.0.1:45274
  # nstat | grep Pressure || echo no pressure
  no pressure

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Link: https://patch.msgid.link/20251014235604.3057003-4-kuniyu@google.com
</content>
</entry>
<entry>
<title>net: add /proc/sys/net/core/txq_reselection_ms control</title>
<updated>2025-10-15T16:04:21+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-10-13T15:22:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2ddef3462b3a5d62e5485e22ce128a5c02276438'/>
<id>urn:sha1:2ddef3462b3a5d62e5485e22ce128a5c02276438</id>
<content type='text'>
Add a new sysctl to control how often a queue reselection
can happen even if a flow has a persistent queue of skbs
in a Qdisc or NIC queue.

A value of zero means the feature is disabled.

Default is 1000 (1 second).

This sysctl is used in the following patch.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20251013152234.842065-4-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: set net.core.rmem_max and net.core.wmem_max to 4 MB</title>
<updated>2025-08-21T02:35:00+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-08-19T17:40:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a6d4f25888b83b8300aef28d9ee22765c1cc9b34'/>
<id>urn:sha1:a6d4f25888b83b8300aef28d9ee22765c1cc9b34</id>
<content type='text'>
SO_RCVBUF and SO_SNDBUF have limited range today, unless
distros or system admins change rmem_max and wmem_max.

Even iproute2 uses 1 MB SO_RCVBUF which is capped by
the kernel.

Decouple [rw]mem_max and [rw]mem_default and increase
[rw]mem_max to 4 MB.

Before:

$ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.core.wmem_default = 212992
net.core.wmem_max = 212992

After:

$ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
net.core.rmem_default = 212992
net.core.rmem_max = 4194304
net.core.wmem_default = 212992
net.core.wmem_max = 4194304

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Link: https://patch.msgid.link/20250819174030.1986278-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ARC: Add eBPF JIT support</title>
<updated>2024-05-12T23:51:36+00:00</updated>
<author>
<name>Shahab Vahedi</name>
<email>shahab@synopsys.com</email>
</author>
<published>2024-04-30T14:56:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f122668ddcce450c2585f0be4bf4478d6fd6176b'/>
<id>urn:sha1:f122668ddcce450c2585f0be4bf4478d6fd6176b</id>
<content type='text'>
This will add eBPF JIT support to the 32-bit ARCv2 processors. The
implementation is qualified by running the BPF tests on a Synopsys HSDK
board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.

The test_bpf.ko reports 2-10 fold improvements in execution time of its
tests. For instance:

test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS
test_bpf: #33 tcpdump port 22 jited:1 120  224  260 PASS

test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1  23 PASS

test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS

Deployment and structure
------------------------
The related codes are added to "arch/arc/net":

- bpf_jit.h       -- The interface that a back-end translator must provide
- bpf_jit_core.c  -- Knows how to handle the input eBPF byte stream
- bpf_jit_arcv2.c -- The back-end code that knows the translation logic

The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance
to the whole process. Normally, the translation is done in one pass,
namely the "normal pass". In case some relocations are not known during
this pass, some data (arc_jit_data) is allocated for the next pass to
come. This possible next (and last) pass is called the "extra pass".

1. Normal pass       # The necessary pass
     1a. Dry run       # Get the whole JIT length, epilogue offset, etc.
     1b. Emit phase    # Allocate memory and start emitting instructions
2. Extra pass        # Only needed if there are relocations to be fixed
     2a. Patch relocations

Support status
--------------
The JIT compiler supports BPF instructions up to "cpu=v4". However, it
does not yet provide support for:

- Tail calls
- Atomic operations
- 64-bit division/remainder
- BPF_PROBE_MEM* (exception table)

The result of "test_bpf" test suite on an HSDK board is:

hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf

  test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

All the failing test cases are due to the ones that were not JIT'ed.
Categorically, they can be represented as:

  .-----------.------------.-------------.
  | test type |   opcodes  | # of cases  |
  |-----------+------------+-------------|
  | atomic    | 0xC3, 0xDB |         149 |
  | div64     | 0x37, 0x3F |          22 |
  | mod64     | 0x97, 0x9F |          15 |
  `-----------^------------+-------------|
                           | (total) 186 |
                           `-------------'

Setup: build config
-------------------
The following configs must be set to have a working JIT test:

  CONFIG_BPF_JIT=y
  CONFIG_BPF_JIT_ALWAYS_ON=y
  CONFIG_TEST_BPF=m

The following options are not necessary for the tests module,
but are good to have:

  CONFIG_DEBUG_INFO=y             # prerequisite for below
  CONFIG_DEBUG_INFO_BTF=y         # so bpftool can generate vmlinux.h

  CONFIG_FTRACE=y                 #
  CONFIG_BPF_SYSCALL=y            # all these options lead to
  CONFIG_KPROBE_EVENTS=y          # having CONFIG_BPF_EVENTS=y
  CONFIG_PERF_EVENTS=y            #

Some BPF programs provide data through /sys/kernel/debug:
  CONFIG_DEBUG_FS=y
arc# mount -t debugfs debugfs /sys/kernel/debug

Setup: elfutils
---------------
The libdw.{so,a} library that is used by pahole for processing
the final binary must come from elfutils 0.189 or newer. The
support for ARCv2 [1] has been added since that version.

[1]
https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7

Setup: pahole
-------------
The line below in linux/scripts/Makefile.btf must be commented out:

pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats

Or else, the build will fail:

$ make V=1
  ...
  BTF     .btf.vmlinux.bin.o
pahole -J --btf_gen_floats                    \
       -j --lang_exclude=rust                 \
       --skip_encoding_btf_inconsistent_proto \
       --btf_gen_optimized .tmp_vmlinux.btf
Complex, interval and imaginary float types are not supported
Encountered error while encoding BTF.
  ...
  BTFIDS  vmlinux
./tools/bpf/resolve_btfids/resolve_btfids vmlinux
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available

This is due to the fact that the ARC toolchains generate
"complex float" DIE entries in libgcc and at the moment, pahole
can't handle such entries.

Running the tests
-----------------
host$ scp /bld/linux/lib/test_bpf.ko arc:
arc # sysctl net.core.bpf_jit_enable=1
arc # insmod test_bpf.ko test_suite=test_bpf
      ...
      test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS
      test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

Acknowledgments
---------------
- Claudiu Zissulescu for his unwavering support
- Yuriy Kolerov for testing and troubleshooting
- Vladimir Isaev for the pahole workaround
- Sergey Matyukevich for paving the road by adding the interpreter support

Signed-off-by: Shahab Vahedi &lt;shahab@synopsys.com&gt;
Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
</feed>
