Age | Commit message (Collapse) | Author | Files | Lines |
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-05-28
We've added 23 non-merge commits during the last 11 day(s) which contain
a total of 45 files changed, 696 insertions(+), 277 deletions(-).
The main changes are:
1) Rename skb's mono_delivery_time to tstamp_type for extensibility
and add SKB_CLOCK_TAI type support to bpf_skb_set_tstamp(),
from Abhishek Chauhan.
2) Add netfilter CT zone ID and direction to bpf_ct_opts so that arbitrary
CT zones can be used from XDP/tc BPF netfilter CT helper functions,
from Brad Cowie.
3) Several tweaks to the instruction-set.rst IETF doc to address
the Last Call review comments, from Dave Thaler.
4) Small batch of riscv64 BPF JIT optimizations in order to emit more
compressed instructions to the JITed image for better icache efficiency,
from Xiao Wang.
5) Sort bpftool C dump output from BTF, aiming to simplify vmlinux.h
diffing and forcing more natural type definitions ordering,
from Mykyta Yatsenko.
6) Use DEV_STATS_INC() macro in BPF redirect helpers to silence
a syzbot/KCSAN race report for the tx_errors counter,
from Jiang Yunshui.
7) Un-constify bpf_func_info in bpftool to fix compilation with LLVM 17+
which started treating const structs as constants and thus breaking
full BTF program name resolution, from Ivan Babrou.
8) Fix up BPF program numbers in test_sockmap selftest in order to reduce
some of the test-internal array sizes, from Geliang Tang.
9) Small cleanup in Makefile.btf script to use test-ge check for v1.25-only
pahole, from Alan Maguire.
10) Fix bpftool's make dependencies for vmlinux.h in order to avoid needless
rebuilds in some corner cases, from Artem Savkov.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
bpf, net: Use DEV_STAT_INC()
bpf, docs: Fix instruction.rst indentation
bpf, docs: Clarify call local offset
bpf, docs: Add table captions
bpf, docs: clarify sign extension of 64-bit use of 32-bit imm
bpf, docs: Use RFC 2119 language for ISA requirements
bpf, docs: Move sentence about returning R0 to abi.rst
bpf: constify member bpf_sysctl_kern:: Table
riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT
riscv, bpf: Use STACK_ALIGN macro for size rounding up
riscv, bpf: Optimize zextw insn with Zba extension
selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets
net: Add additional bit to support clockid_t timestamp type
net: Rename mono_delivery_time to tstamp_type for scalabilty
selftests/bpf: Update tests for new ct zone opts for nf_conntrack kfuncs
net: netfilter: Make ct zone opts configurable for bpf ct helpers
selftests/bpf: Fix prog numbers in test_sockmap
bpf: Remove unused variable "prev_state"
bpftool: Un-const bpf_func_info to fix it for llvm 17 and newer
bpf: Fix order of args in call to bpf_map_kvcalloc
...
====================
Link: https://lore.kernel.org/r/20240528105924.30905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In commit cdfbabfb2f0c ("net: Work around lockdep limitation in
sockets that use sockets"), it introduces 'af_kern_callback_keys'
to lockdep-init of sk_callback_lock according to 'sk_kern_sock',
it modifies sock_init_data() only, and sk_clone_lock() calls
sk_init_common() to initialize sk_callback_lock too, so the
lockdep-init of sk_callback_lock should be moved to sk_init_common().
Signed-off-by: Gou Hao <gouhao@uniontech.com>
Link: https://lore.kernel.org/r/20240526145718.9542-2-gouhao@uniontech.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
sk_callback_lock has already been initialized in sk_init_common().
Signed-off-by: Gou Hao <gouhao@uniontech.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240526145718.9542-1-gouhao@uniontech.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
syzbot/KCSAN reported that races happen when multiple CPUs updating
dev->stats.tx_error concurrently. Adopt SMP safe DEV_STATS_INC() to
update the dev->stats fields.
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: yunshui <jiangyunshui@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240523033520.4029314-1-jiangyunshui@kylinos.cn
|
|
tstamp_type is now set based on actual clockid_t compressed
into 2 bits.
To make the design scalable for future needs this commit bring in
the change to extend the tstamp_type:1 to tstamp_type:2 to support
other clockid_t timestamp.
We now support CLOCK_TAI as part of tstamp_type as part of this
commit with existing support CLOCK_MONOTONIC and CLOCK_REALTIME.
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509211834.3235191-3-quic_abchauha@quicinc.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
mono_delivery_time was added to check if skb->tstamp has delivery
time in mono clock base (i.e. EDT) otherwise skb->tstamp has
timestamp in ingress and delivery_time at egress.
Renaming the bitfield from mono_delivery_time to tstamp_type is for
extensibilty for other timestamps such as userspace timestamp
(i.e. SO_TXTIME) set via sock opts.
As we are renaming the mono_delivery_time to tstamp_type, it makes
sense to start assigning tstamp_type based on enum defined
in this commit.
Earlier we used bool arg flag to check if the tstamp is mono in
function skb_set_delivery_time, Now the signature of the functions
accepts tstamp_type to distinguish between mono and real time.
Also skb_set_delivery_type_by_clockid is a new function which accepts
clockid to determine the tstamp_type.
In future tstamp_type:1 can be extended to support userspace timestamp
by increasing the bitfield.
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509211834.3235191-2-quic_abchauha@quicinc.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Quite smaller than usual. Notably it includes the fix for the unix
regression from the past weeks. The TCP window fix will require some
follow-up, already queued.
Current release - regressions:
- af_unix: fix garbage collection of embryos
Previous releases - regressions:
- af_unix: fix race between GC and receive path
- ipv6: sr: fix missing sk_buff release in seg6_input_core
- tcp: remove 64 KByte limit for initial tp->rcv_wnd value
- eth: r8169: fix rx hangup
- eth: lan966x: remove ptp traps in case the ptp is not enabled
- eth: ixgbe: fix link breakage vs cisco switches
- eth: ice: prevent ethtool from corrupting the channels
Previous releases - always broken:
- openvswitch: set the skbuff pkt_type for proper pmtud support
- tcp: Fix shift-out-of-bounds in dctcp_update_alpha()
Misc:
- a bunch of selftests stabilization patches"
* tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (25 commits)
r8169: Fix possible ring buffer corruption on fragmented Tx packets.
idpf: Interpret .set_channels() input differently
ice: Interpret .set_channels() input differently
nfc: nci: Fix handling of zero-length payload packets in nci_rx_work()
net: relax socket state check at accept time.
tcp: remove 64 KByte limit for initial tp->rcv_wnd value
net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe()
tls: fix missing memory barrier in tls_init
net: fec: avoid lock evasion when reading pps_enable
Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI"
testing: net-drv: use stats64 for testing
net: mana: Fix the extra HZ in mana_hwc_send_request
net: lan966x: Remove ptp traps in case the ptp is not enabled.
openvswitch: Set the skbuff pkt_type for proper pmtud support.
selftest: af_unix: Make SCM_RIGHTS into OOB data.
af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
tcp: Fix shift-out-of-bounds in dctcp_update_alpha().
selftests/net: use tc rule to filter the na packet
ipv6: sr: fix memleak in seg6_hmac_init_algo
af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing cleanup from Steven Rostedt:
"Remove second argument of __assign_str()
The __assign_str() macro logic of the TRACE_EVENT() macro was
optimized so that it no longer needs the second argument. The
__assign_str() is always matched with __string() field that takes a
field name and the source for that field:
__string(field, source)
The TRACE_EVENT() macro logic will save off the source value and then
use that value to copy into the ring buffer via the __assign_str().
Before commit c1fa617caeb0 ("tracing: Rework __assign_str() and
__string() to not duplicate getting the string"), the __assign_str()
needed the second argument which would perform the same logic as the
__string() source parameter did. Not only would this add overhead, but
it was error prone as if the __assign_str() source produced something
different, it may not have allocated enough for the string in the ring
buffer (as the __string() source was used to determine how much to
allocate)
Now that the __assign_str() just uses the same string that was used in
__string() it no longer needs the source parameter. It can now be
removed"
* tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/treewide: Remove second parameter of __assign_str()
|
|
Pull virtio updates from Michael Tsirkin:
"Several new features here:
- virtio-net is finally supported in vduse
- virtio (balloon and mem) interaction with suspend is improved
- vhost-scsi now handles signals better/faster
And fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
virtio-pci: Check if is_avq is NULL
virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
MAINTAINERS: add Eugenio Pérez as reviewer
vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
vp_vdpa: don't allocate unused msix vectors
sound: virtio: drop owner assignment
fuse: virtio: drop owner assignment
scsi: virtio: drop owner assignment
rpmsg: virtio: drop owner assignment
nvdimm: virtio_pmem: drop owner assignment
wifi: mac80211_hwsim: drop owner assignment
vsock/virtio: drop owner assignment
net: 9p: virtio: drop owner assignment
net: virtio: drop owner assignment
net: caif: virtio: drop owner assignment
misc: nsm: drop owner assignment
iommu: virtio: drop owner assignment
drm/virtio: drop owner assignment
gpio: virtio: drop owner assignment
firmware: arm_scmi: virtio: drop owner assignment
...
|
|
When nci_rx_work() receives a zero-length payload packet, it should not
discard the packet and exit the loop. Instead, it should continue
processing subsequent packets.
Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet")
Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240521153444.535399-1-ryasuoka@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Christoph reported the following splat:
WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
Modules linked in:
CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
FS: 000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
do_accept+0x435/0x620 net/socket.c:1929
__sys_accept4_file net/socket.c:1969 [inline]
__sys_accept4+0x9b/0x110 net/socket.c:1999
__do_sys_accept net/socket.c:2016 [inline]
__se_sys_accept net/socket.c:2013 [inline]
__x64_sys_accept+0x7d/0x90 net/socket.c:2013
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x4315f9
Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
</TASK>
The reproducer invokes shutdown() before entering the listener status.
After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for
TCP_SYN_RECV sockets"), the above causes the child to reach the accept
syscall in FIN_WAIT1 status.
Eric noted we can relax the existing assertion in __inet_accept()
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490
Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/23ab880a44d8cfd967e84de8b93dbf48848e3d8c.1716299669.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Recently, we had some servers upgraded to the latest kernel and noticed
the indicator from the user side showed worse results than before. It is
caused by the limitation of tp->rcv_wnd.
In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin
to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
CDN teams would not benefit from this change because they cannot have a
large window to receive a big packet, which will be slowed down especially
in long RTT. Small rcv_wnd means slow transfer speed, to some extent. It's
the side effect for the latency/time-sensitive users.
To avoid future confusion, current change doesn't affect the initial
receive window on the wire in a SYN or SYN+ACK packet which are set within
65535 bytes according to RFC 7323 also due to the limit in
__tcp_transmit_skb():
th->window = htons(min(tp->rcv_wnd, 65535U));
In one word, __tcp_transmit_skb() already ensures that constraint is
respected, no matter how large tp->rcv_wnd is. The change doesn't violate
RFC.
Let me provide one example if with or without the patch:
Before:
client --- SYN: rwindow=65535 ---> server
client <--- SYN+ACK: rwindow=65535 ---- server
client --- ACK: rwindow=65536 ---> server
Note: for the last ACK, the calculation is 512 << 7.
After:
client --- SYN: rwindow=65535 ---> server
client <--- SYN+ACK: rwindow=65535 ---- server
client --- ACK: rwindow=175232 ---> server
Note: I use the following command to make it work:
ip route change default via [ip] dev eth0 metric 100 initrwnd 120
For the last ACK, the calculation is 1369 << 7.
When we apply such a patch, having a large rcv_wnd if the user tweak this
knob can help transfer data more rapidly and save some rtts.
Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20240521134220.12510-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In tls_init(), a write memory barrier is missing, and store-store
reordering may cause NULL dereference in tls_{setsockopt,getsockopt}.
CPU0 CPU1
----- -----
// In tls_init()
// In tls_ctx_create()
ctx = kzalloc()
ctx->sk_proto = READ_ONCE(sk->sk_prot) -(1)
// In update_sk_prot()
WRITE_ONCE(sk->sk_prot, tls_prots) -(2)
// In sock_common_setsockopt()
READ_ONCE(sk->sk_prot)->setsockopt()
// In tls_{setsockopt,getsockopt}()
ctx->sk_proto->setsockopt() -(3)
In the above scenario, when (1) and (2) are reordered, (3) can observe
the NULL value of ctx->sk_proto, causing NULL dereference.
To fix it, we rely on rcu_assign_pointer() which implies the release
barrier semantic. By moving rcu_assign_pointer() after ctx->sk_proto is
initialized, we can ensure that ctx->sk_proto are visible when
changing sk->sk_prot.
Fixes: d5bee7374b68 ("net/tls: Annotate access to sk_prot with READ_ONCE/WRITE_ONCE")
Signed-off-by: Yewon Choi <woni9911@gmail.com>
Signed-off-by: Dae R. Jeong <threeearcat@gmail.com>
Link: https://lore.kernel.org/netdev/ZU4OJG56g2V9z_H7@dragonet/T/
Link: https://lore.kernel.org/r/Zkx4vjSFp0mfpjQ2@libra05
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.
This means that with:
__string(field, mystring)
Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.
There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:
git grep -l __assign_str | while read a ; do
sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
mv /tmp/test-file $a;
done
I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.
Note, the same updates will need to be done for:
__assign_str_len()
__assign_rel_str()
__assign_rel_str_len()
I tested this with both an allmodconfig and an allyesconfig (build only for both).
[1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/
Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs
Tested-by: Guenter Roeck <linux@roeck-us.net>
|
|
Add ct zone id and direction to bpf_ct_opts so that arbitrary ct zones
can be used for xdp/tc bpf ct helper functions bpf_{xdp,skb}_ct_alloc
and bpf_{xdp,skb}_ct_lookup.
Signed-off-by: Brad Cowie <brad@faucet.nz>
Link: https://lore.kernel.org/r/20240522050712.732558-1-brad@faucet.nz
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
virtio core already sets the .owner, so driver does not need to.
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-19-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
|
|
virtio core already sets the .owner, so driver does not need to.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-18-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Alexander Gordeev:
- Switch read and write software bits for PUDs
- Add missing hardware bits for PUDs and PMDs
- Generate unwind information for C modules to fix GDB unwind error for
vDSO functions
- Create .build-id links for unstripped vDSO files to enable vDSO
debugging with symbols
- Use standard stack frame layout for vDSO generated stack frames to
manually walk stack frames without DWARF information
- Rework perf_callchain_user() and arch_stack_walk_user() functions to
reduce code duplication
- Skip first stack frame when walking user stack
- Add basic checks to identify invalid instruction pointers when
walking stack frames
- Introduce and use struct stack_frame_vdso_wrapper within vDSO user
wrapper code to automatically generate an asm-offset define. Also use
STACK_FRAME_USER_OVERHEAD instead of STACK_FRAME_OVERHEAD to document
that the code works with user space stack
- Clear the backchain of the extra stack frame added by the vDSO user
wrapper code. This allows the user stack walker to detect and skip
the non-standard stack frame. Without this an incorrect instruction
pointer would be added to stack traces.
- Rewrite psw_idle() function in C to ease maintenance and further
enhancements
- Remove get_vtimer() function and use get_cpu_timer() instead
- Mark psw variable in __load_psw_mask() as __unitialized to avoid
superfluous clearing of PSW
- Remove obsolete and superfluous comment about removed TIF_FPU flag
- Replace memzero_explicit() and kfree() with kfree_sensitive() to fix
warnings reported by Coccinelle
- Wipe sensitive data and all copies of protected- or secure-keys from
stack when an IOCTL fails
- Both do_airq_interrupt() and do_io_interrupt() functions set
CIF_NOHZ_DELAY flag. Move it in do_io_irq() to simplify the code
- Provide iucv_alloc_device() and iucv_release_device() helpers, which
can be used to deduplicate more or less identical IUCV device
allocation and release code in four different drivers
- Make use of iucv_alloc_device() and iucv_release_device() helpers to
get rid of quite some code and also remove a cast to an incompatible
function (clang W=1)
- There is no user of iucv_root outside of the core IUCV code left.
Therefore remove the EXPORT_SYMBOL
- __apply_alternatives() contains a runtime check which verifies that
the size of the to be patched code area is even. Convert this to a
compile time check
- Increase size of buffers for sending z/VM CP DIAGNOSE X'008' commands
from 128 to 240
- Do not accept z/VM CP DIAGNOSE X'008' commands longer than maximally
allowed
- Use correct defines IPL_BP_NVME_LEN and IPL_BP0_NVME_LEN instead of
IPL_BP_FCP_LEN and IPL_BP0_FCP_LEN ones to initialize NVMe reIPL
block on 'scp_data' sysfs attribute update
- Initialize the correct fields of the NVMe dump block, which were
confused with FCP fields
- Refactor macros for 'scp_data' (re-)IPL sysfs attribute to reduce
code duplication
- Introduce 'scp_data' sysfs attribute for dump IPL to allow tools such
as dumpconf passing additional kernel command line parameters to a
stand-alone dumper
- Rework the CPACF query functions to use the correct RRE or RRF
instruction formats and set instruction register fields correctly
- Instead of calling BUG() at runtime force a link error during compile
when a unsupported opcode is used with __cpacf_query() or
__cpacf_check_opcode() functions
- Fix a crash in ap_parse_bitmap_str() function on /sys/bus/ap/apmask
or /sys/bus/ap/aqmask sysfs file update with a relative mask value
- Fix "bindings complete" udev event which should be sent once all AP
devices have been bound to device drivers and again when unbind/bind
actions take place and all AP devices are bound again
- Facility list alt_stfle_fac_list is nowhere used in the decompressor,
therefore remove it there
- Remove custom kprobes insn slot allocator in favour of the standard
module_alloc() one, since kernel image and module areas are located
within 4GB
- Use kvcalloc() instead of kvmalloc_array() in zcrypt driver to avoid
calling memset() with a large byte count and get rid of the sparse
warning as result
* tag 's390-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
s390/zcrypt: Use kvcalloc() instead of kvmalloc_array()
s390/kprobes: Remove custom insn slot allocator
s390/boot: Remove alt_stfle_fac_list from decompressor
s390/ap: Fix bind complete udev event sent after each AP bus scan
s390/ap: Fix crash in AP internal function modify_bitmap()
s390/cpacf: Make use of invalid opcode produce a link error
s390/cpacf: Split and rework cpacf query functions
s390/ipl: Introduce sysfs attribute 'scp_data' for dump ipl
s390/ipl: Introduce macros for (re)ipl sysfs attribute 'scp_data'
s390/ipl: Fix incorrect initialization of nvme dump block
s390/ipl: Fix incorrect initialization of len fields in nvme reipl block
s390/ipl: Do not accept z/VM CP diag X'008' cmds longer than max length
s390/ipl: Fix size of vmcmd buffers for sending z/VM CP diag X'008' cmds
s390/alternatives: Convert runtime sanity check into compile time check
s390/iucv: Unexport iucv_root
tty: hvc-iucv: Make use of iucv_alloc_device()
s390/smsgiucv_app: Make use of iucv_alloc_device()
s390/netiucv: Make use of iucv_alloc_device()
s390/vmlogrdr: Make use of iucv_alloc_device()
s390/iucv: Provide iucv_alloc_device() / iucv_release_device()
...
|
|
Open vSwitch is originally intended to switch at layer 2, only dealing with
Ethernet frames. With the introduction of l3 tunnels support, it crossed
into the realm of needing to care a bit about some routing details when
making forwarding decisions. If an oversized packet would need to be
fragmented during this forwarding decision, there is a chance for pmtu
to get involved and generate a routing exception. This is gated by the
skbuff->pkt_type field.
When a flow is already loaded into the openvswitch module this field is
set up and transitioned properly as a packet moves from one port to
another. In the case that a packet execute is invoked after a flow is
newly installed this field is not properly initialized. This causes the
pmtud mechanism to omit sending the required exception messages across
the tunnel boundary and a second attempt needs to be made to make sure
that the routing exception is properly setup. To fix this, we set the
outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get
to the openvswitch module via a port device or packet command.
Even for bridge ports as users, the pkt_type needs to be reset when
doing the transmit as the packet is truly outgoing and routing needs
to get involved post packet transformations, in the case of
VXLAN/GENEVE/udp-tunnel packets. In general, the pkt_type on output
gets ignored, since we go straight to the driver, but in the case of
tunnel ports they go through IP routing layer.
This issue is periodically encountered in complex setups, such as large
openshift deployments, where multiple sets of tunnel traversal occurs.
A way to recreate this is with the ovn-heater project that can setup
a networking environment which mimics such large deployments. We need
larger environments for this because we need to ensure that flow
misses occur. In these environment, without this patch, we can see:
./ovn_cluster.sh start
podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200
podman exec ovn-chassis-1 ip netns exec sw01p1 ip r flush cache
podman exec ovn-chassis-1 ip netns exec sw01p1 \
ping 21.0.0.3 -M do -s 1300 -c2
PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142)
--- 21.0.0.3 ping statistics ---
...
Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not
sent into the server.
With this patch, setting the pkt_type, we see the following:
podman exec ovn-chassis-1 ip netns exec sw01p1 \
ping 21.0.0.3 -M do -s 1300 -c2
PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222)
ping: local error: message too long, mtu=1222
--- 21.0.0.3 ping statistics ---
...
In this case, the first ping request receives the FRAG_NEEDED message and
a local routing exception is created.
Tested-by: Jaime Caamano <jcaamano@redhat.com>
Reported-at: https://issues.redhat.com/browse/FDP-164
Fixes: 58264848a5a7 ("openvswitch: Add vxlan tunneling support.")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20240516200941.16152-1-aconole@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
GC attempts to explicitly drop oob_skb's reference before purging the hit
list.
The problem is with embryos: kfree_skb(u->oob_skb) is never called on an
embryo socket.
The python script below [0] sends a listener's fd to its embryo as OOB
data. While GC does collect the embryo's queue, it fails to drop the OOB
skb's refcount. The skb which was in embryo's receive queue stays as
unix_sk(sk)->oob_skb and keeps the listener's refcount [1].
Tell GC to dispose embryo's oob_skb.
[0]:
from array import array
from socket import *
addr = '\x00unix-oob'
lis = socket(AF_UNIX, SOCK_STREAM)
lis.bind(addr)
lis.listen(1)
s = socket(AF_UNIX, SOCK_STREAM)
s.connect(addr)
scm = (SOL_SOCKET, SCM_RIGHTS, array('i', [lis.fileno()]))
s.sendmsg([b'x'], [scm], MSG_OOB)
lis.close()
[1]
$ grep unix-oob /proc/net/unix
$ ./unix-oob.py
$ grep unix-oob /proc/net/unix
0000000000000000: 00000002 00000000 00000000 0001 02 0 @unix-oob
0000000000000000: 00000002 00000000 00010000 0001 01 6072 @unix-oob
Fixes: 4090fa373f0e ("af_unix: Replace garbage collection algorithm.")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In dctcp_update_alpha(), we use a module parameter dctcp_shift_g
as follows:
alpha -= min_not_zero(alpha, alpha >> dctcp_shift_g);
...
delivered_ce <<= (10 - dctcp_shift_g);
It seems syzkaller started fuzzing module parameters and triggered
shift-out-of-bounds [0] by setting 100 to dctcp_shift_g:
memcpy((void*)0x20000080,
"/sys/module/tcp_dctcp/parameters/dctcp_shift_g\000", 47);
res = syscall(__NR_openat, /*fd=*/0xffffffffffffff9cul, /*file=*/0x20000080ul,
/*flags=*/2ul, /*mode=*/0ul);
memcpy((void*)0x20000000, "100\000", 4);
syscall(__NR_write, /*fd=*/r[0], /*val=*/0x20000000ul, /*len=*/4ul);
Let's limit the max value of dctcp_shift_g by param_set_uint_minmax().
With this patch:
# echo 10 > /sys/module/tcp_dctcp/parameters/dctcp_shift_g
# cat /sys/module/tcp_dctcp/parameters/dctcp_shift_g
10
# echo 11 > /sys/module/tcp_dctcp/parameters/dctcp_shift_g
-bash: echo: write error: Invalid argument
[0]:
UBSAN: shift-out-of-bounds in net/ipv4/tcp_dctcp.c:143:12
shift exponent 100 is too large for 32-bit type 'u32' (aka 'unsigned int')
CPU: 0 PID: 8083 Comm: syz-executor345 Not tainted 6.9.0-05151-g1b294a1f3561 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x201/0x300 lib/dump_stack.c:114
ubsan_epilogue lib/ubsan.c:231 [inline]
__ubsan_handle_shift_out_of_bounds+0x346/0x3a0 lib/ubsan.c:468
dctcp_update_alpha+0x540/0x570 net/ipv4/tcp_dctcp.c:143
tcp_in_ack_event net/ipv4/tcp_input.c:3802 [inline]
tcp_ack+0x17b1/0x3bc0 net/ipv4/tcp_input.c:3948
tcp_rcv_state_process+0x57a/0x2290 net/ipv4/tcp_input.c:6711
tcp_v4_do_rcv+0x764/0xc40 net/ipv4/tcp_ipv4.c:1937
sk_backlog_rcv include/net/sock.h:1106 [inline]
__release_sock+0x20f/0x350 net/core/sock.c:2983
release_sock+0x61/0x1f0 net/core/sock.c:3549
mptcp_subflow_shutdown+0x3d0/0x620 net/mptcp/protocol.c:2907
mptcp_check_send_data_fin+0x225/0x410 net/mptcp/protocol.c:2976
__mptcp_close+0x238/0xad0 net/mptcp/protocol.c:3072
mptcp_close+0x2a/0x1a0 net/mptcp/protocol.c:3127
inet_release+0x190/0x1f0 net/ipv4/af_inet.c:437
__sock_release net/socket.c:659 [inline]
sock_close+0xc0/0x240 net/socket.c:1421
__fput+0x41b/0x890 fs/file_table.c:422
task_work_run+0x23b/0x300 kernel/task_work.c:180
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0x9c8/0x2540 kernel/exit.c:878
do_group_exit+0x201/0x2b0 kernel/exit.c:1027
__do_sys_exit_group kernel/exit.c:1038 [inline]
__se_sys_exit_group kernel/exit.c:1036 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xe4/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x67/0x6f
RIP: 0033:0x7f6c2b5005b6
Code: Unable to access opcode bytes at 0x7f6c2b50058c.
RSP: 002b:00007ffe883eb948 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f6c2b5862f0 RCX: 00007f6c2b5005b6
RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffffc0
R10: 0000000000000006 R11: 0000000000000246 R12: 00007f6c2b5862f0
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
</TASK>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: Yue Sun <samsun1006219@gmail.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Closes: https://lore.kernel.org/netdev/CAEkJfYNJM=cw-8x7_Vmj1J6uYVCWMbbvD=EFmDPVBGpTsqOxEA@mail.gmail.com/
Fixes: e3118e8359bb ("net: tcp: add DCTCP congestion control algorithm")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240517091626.32772-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
seg6_hmac_init_algo returns without cleaning up the previous allocations
if one fails, so it's going to leak all that memory and the crypto tfms.
Update seg6_hmac_exit to only free the memory when allocated, so we can
reuse the code directly.
Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Closes: https://lore.kernel.org/netdev/Zj3bh-gE7eT6V6aH@hog/
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20240517005435.2600277-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Billy Jheng Bing-Jhong reported a race between __unix_gc() and
queue_oob().
__unix_gc() tries to garbage-collect close()d inflight sockets,
and then if the socket has MSG_OOB in unix_sk(sk)->oob_skb, GC
will drop the reference and set NULL to it locklessly.
However, the peer socket still can send MSG_OOB message and
queue_oob() can update unix_sk(sk)->oob_skb concurrently, leading
NULL pointer dereference. [0]
To fix the issue, let's update unix_sk(sk)->oob_skb under the
sk_receive_queue's lock and take it everywhere we touch oob_skb.
Note that we defer kfree_skb() in manage_oob() to silence lockdep
false-positive (See [1]).
[0]:
BUG: kernel NULL pointer dereference, address: 0000000000000008
PF: supervisor write access in kernel mode
PF: error_code(0x0002) - not-present page
PGD 8000000009f5e067 P4D 8000000009f5e067 PUD 9f5d067 PMD 0
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc5-00191-gd091e579b864 #110
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: events delayed_fput
RIP: 0010:skb_dequeue (./include/linux/skbuff.h:2386 ./include/linux/skbuff.h:2402 net/core/skbuff.c:3847)
Code: 39 e3 74 3e 8b 43 10 48 89 ef 83 e8 01 89 43 10 49 8b 44 24 08 49 c7 44 24 08 00 00 00 00 49 8b 14 24 49 c7 04 24 00 00 00 00 <48> 89 42 08 48 89 10 e8 e7 c5 42 00 4c 89 e0 5b 5d 41 5c c3 cc cc
RSP: 0018:ffffc900001bfd48 EFLAGS: 00000002
RAX: 0000000000000000 RBX: ffff8880088f5ae8 RCX: 00000000361289f9
RDX: 0000000000000000 RSI: 0000000000000206 RDI: ffff8880088f5b00
RBP: ffff8880088f5b00 R08: 0000000000080000 R09: 0000000000000001
R10: 0000000000000003 R11: 0000000000000001 R12: ffff8880056b6a00
R13: ffff8880088f5280 R14: 0000000000000001 R15: ffff8880088f5a80
FS: 0000000000000000(0000) GS:ffff88807dd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000006314000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<TASK>
unix_release_sock (net/unix/af_unix.c:654)
unix_release (net/unix/af_unix.c:1050)
__sock_release (net/socket.c:660)
sock_close (net/socket.c:1423)
__fput (fs/file_table.c:423)
delayed_fput (fs/file_table.c:444 (discriminator 3))
process_one_work (kernel/workqueue.c:3259)
worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
kthread (kernel/kthread.c:388)
ret_from_fork (arch/x86/kernel/process.c:153)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
</TASK>
Modules linked in:
CR2: 0000000000000008
Link: https://lore.kernel.org/netdev/a00d3993-c461-43f2-be6d-07259c98509a@rbox.co/ [1]
Fixes: 1279f9d9dec2 ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.")
Reported-by: Billy Jheng Bing-Jhong <billy@starlabs.sg>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240516134835.8332-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- optimize DMA sync calls when they are no-ops (Alexander Lobakin)
- fix swiotlb padding for untrusted devices (Michael Kelley)
- add documentation for swiotb (Michael Kelley)
* tag 'dma-mapping-6.10-2024-05-20' of git://git.infradead.org/users/hch/dma-mapping:
dma: fix DMA sync for drivers not calling dma_set_mask*()
xsk: use generic DMA sync shortcut instead of a custom one
page_pool: check for DMA sync shortcut earlier
page_pool: don't use driver-set flags field directly
page_pool: make sure frag API fields don't span between cachelines
iommu/dma: avoid expensive indirect calls for sync operations
dma: avoid redundant calls for sync operations
dma: compile-out DMA sync op calls when not used
iommu/dma: fix zeroing of bounce buffer padding used by untrusted devices
swiotlb: remove alloc_size argument to swiotlb_tbl_map_single()
Documentation/core-api: add swiotlb documentation
|
|
syzbot reported the following uninit-value access issue [1]
nci_rx_work() parses received packet from ndev->rx_q. It should be
validated header size, payload size and total packet size before
processing the packet. If an invalid packet is detected, it should be
silently discarded.
Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet")
Reported-and-tested-by: syzbot+d7b4dc6cd50410152534@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d7b4dc6cd50410152534 [1]
Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The seg6_input() function is responsible for adding the SRH into a
packet, delegating the operation to the seg6_input_core(). This function
uses the skb_cow_head() to ensure that there is sufficient headroom in
the sk_buff for accommodating the link-layer header.
In the event that the skb_cow_header() function fails, the
seg6_input_core() catches the error but it does not release the sk_buff,
which will result in a memory leak.
This issue was introduced in commit af3b5158b89d ("ipv6: sr: fix BUG due
to headroom too small after SRH push") and persists even after commit
7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane"),
where the entire seg6_input() code was refactored to deal with netfilter
hooks.
The proposed patch addresses the identified memory leak by requiring the
seg6_input_core() function to release the sk_buff in the event that
skb_cow_head() fails.
Fixes: af3b5158b89d ("ipv6: sr: fix BUG due to headroom too small after SRH push")
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
|
|
Pull nfsd updates from Chuck Lever:
"This is a light release containing mostly optimizations, code clean-
ups, and minor bug fixes. This development cycle has focused on non-
upstream kernel work:
1. Continuing to build upstream CI for NFSD, based on kdevops
2. Backporting NFSD filecache-related fixes to selected LTS kernels
One notable new feature in v6.10 NFSD is the addition of a new netlink
protocol dedicated to configuring NFSD. A new user space tool,
nfsdctl, is to be added to nfs-utils. Lots more to come here.
As always I am very grateful to NFSD contributors, reviewers, testers,
and bug reporters who participated during this cycle"
* tag 'nfsd-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (29 commits)
NFSD: Force all NFSv4.2 COPY requests to be synchronous
SUNRPC: Fix gss_free_in_token_pages()
NFS/knfsd: Remove the invalid NFS error 'NFSERR_OPNOTSUPP'
knfsd: LOOKUP can return an illegal error value
nfsd: set security label during create operations
NFSD: Add COPY status code to OFFLOAD_STATUS response
NFSD: Record status of async copy operation in struct nfsd4_copy
SUNRPC: Remove comment for sp_lock
NFSD: add listener-{set,get} netlink command
SUNRPC: add a new svc_find_listener helper
SUNRPC: introduce svc_xprt_create_from_sa utility routine
NFSD: add write_version to netlink command
NFSD: convert write_threads to netlink command
NFSD: allow callers to pass in scope string to nfsd_svc
NFSD: move nfsd_mutex handling into nfsd_svc callers
lockd: host: Remove unnecessary statements'host = NULL;'
nfsd: don't create nfsv4recoverydir in nfsdfs when not used.
nfsd: optimise recalculate_deny_mode() for a common case
nfsd: add tracepoint in mark_client_expired_locked
nfsd: new tracepoint for check_slot_seqid
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Avoid 'constexpr', which is a keyword in C23
- Allow 'dtbs_check' and 'dt_compatible_check' run independently of
'dt_binding_check'
- Fix weak references to avoid GOT entries in position-independent code
generation
- Convert the last use of 'optional' property in arch/sh/Kconfig
- Remove support for the 'optional' property in Kconfig
- Remove support for Clang's ThinLTO caching, which does not work with
the .incbin directive
- Change the semantics of $(src) so it always points to the source
directory, which fixes Makefile inconsistencies between upstream and
downstream
- Fix 'make tar-pkg' for RISC-V to produce a consistent package
- Provide reasonable default coverage for objtool, sanitizers, and
profilers
- Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.
- Remove the last use of tristate choice in drivers/rapidio/Kconfig
- Various cleanups and fixes in Kconfig
* tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits)
kconfig: use sym_get_choice_menu() in sym_check_prop()
rapidio: remove choice for enumeration
kconfig: lxdialog: remove initialization with A_NORMAL
kconfig: m/nconf: merge two item_add_str() calls
kconfig: m/nconf: remove dead code to display value of bool choice
kconfig: m/nconf: remove dead code to display children of choice members
kconfig: gconf: show checkbox for choice correctly
kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal
Makefile: remove redundant tool coverage variables
kbuild: provide reasonable defaults for tool coverage
modules: Drop the .export_symbol section from the final modules
kconfig: use menu_list_for_each_sym() in sym_check_choice_deps()
kconfig: use sym_get_choice_menu() in conf_write_defconfig()
kconfig: add sym_get_choice_menu() helper
kconfig: turn defaults and additional prompt for choice members into error
kconfig: turn missing prompt for choice members into error
kconfig: turn conf_choice() into void function
kconfig: use linked list in sym_set_changed()
kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED
kconfig: gconf: remove debug code
...
|
|
Pull more io_uring updates from Jens Axboe:
"This adds support for IORING_CQE_F_SOCK_NONEMPTY for io_uring accept
requests.
This is very similar to previous work that enabled the same hint for
doing receives on sockets. By far the majority of the work here is
refactoring to enable the networking side to pass back whether or not
the socket had more pending requests after accepting the current one,
the last patch just wires it up for io_uring.
Not only does this enable applications to know whether there are more
connections to accept right now, it also enables smarter logic for
io_uring multishot accept on whether to retry immediately or wait for
a poll trigger"
* tag 'net-accept-more-20240515' of git://git.kernel.dk/linux:
io_uring/net: wire up IORING_CQE_F_SOCK_NONEMPTY for accept
net: pass back whether socket was empty post accept
net: have do_accept() take a struct proto_accept_arg argument
net: change proto and proto_ops accept type
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release - regressions:
- virtio_net: fix missed error path rtnl_unlock after control queue
locking rework
Current release - new code bugs:
- bpf: fix KASAN slab-out-of-bounds in percpu_array_map_gen_lookup,
caused by missing nested map handling
- drv: dsa: correct initialization order for KSZ88x3 ports
Previous releases - regressions:
- af_packet: do not call packet_read_pending() from
tpacket_destruct_skb() fix performance regression
- ipv6: fix route deleting failure when metric equals 0, don't assume
0 means not set / default in this case
Previous releases - always broken:
- bridge: couple of syzbot-driven fixes"
* tag 'net-6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits)
selftests: net: local_termination: annotate the expected failures
net: dsa: microchip: Correct initialization order for KSZ88x3 ports
MAINTAINERS: net: Update reviewers for TI's Ethernet drivers
dt-bindings: net: ti: Update maintainers list
l2tp: fix ICMP error handling for UDP-encap sockets
net: txgbe: fix to control VLAN strip
net: wangxun: match VLAN CTAG and STAG features
net: wangxun: fix to change Rx features
af_packet: do not call packet_read_pending() from tpacket_destruct_skb()
virtio_net: Fix missed rtnl_unlock
netrom: fix possible dead-lock in nr_rt_ioctl()
idpf: don't skip over ethtool tcp-data-split setting
dt-bindings: net: qcom: ethernet: Allow dma-coherent
bonding: fix oops during rmmod
net/ipv6: Fix route deleting failure when metric equals 0
selftests/net: reduce xfrm_policy test time
selftests/bpf: Adjust btf_dump test to reflect recent change in file_operations
selftests/bpf: Adjust test_access_variable_array after a kernel function name change
selftests/net/lib: no need to record ns name if it already exist
net: qrtr: ns: Fix module refcnt
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Remove sentinel elements from ctl_table structs in kernel/*
Removing sentinels in ctl_table arrays reduces the build time size
and runtime memory consumed by ~64 bytes per array. Removals for
net/, io_uring/, mm/, ipc/ and security/ are set to go into mainline
through their respective subsystems making the next release the most
likely place where the final series that removes the check for
proc_name == NULL will land.
This adds to removals already in arch/, drivers/ and fs/.
- Adjust ctl_table definitions and references to allow constification
- Remove unused ctl_table function arguments
- Move non-const elements from ctl_table to ctl_table_header
- Make ctl_table pointers const in ctl_table_root structure
Making the static ctl_table structs const will increase safety by
keeping the pointers to proc_handler functions in .rodata. Though no
ctl_tables where made const in this PR, the ground work for making
that possible has started with these changes sent by Thomas
Weißschuh.
* tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: drop now unnecessary out-of-bounds check
sysctl: move sysctl type to ctl_table_header
sysctl: drop sysctl_is_perm_empty_ctl_table
sysctl: treewide: constify argument ctl_table_root::permissions(table)
sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table)
bpf: Remove the now superfluous sentinel elements from ctl_table array
delayacct: Remove the now superfluous sentinel elements from ctl_table array
kprobes: Remove the now superfluous sentinel elements from ctl_table array
printk: Remove the now superfluous sentinel elements from ctl_table array
scheduler: Remove the now superfluous sentinel elements from ctl_table array
seccomp: Remove the now superfluous sentinel elements from ctl_table array
timekeeping: Remove the now superfluous sentinel elements from ctl_table array
ftrace: Remove the now superfluous sentinel elements from ctl_table array
umh: Remove the now superfluous sentinel elements from ctl_table array
kernel misc: Remove the now superfluous sentinel elements from ctl_table array
|
|
Since commit a36e185e8c85
("udp: Handle ICMP errors for tunnels with same destination port on both endpoints")
UDP's handling of ICMP errors has allowed for UDP-encap tunnels to
determine socket associations in scenarios where the UDP hash lookup
could not.
Subsequently, commit d26796ae58940
("udp: check udp sock encap_type in __udp_lib_err")
subtly tweaked the approach such that UDP ICMP error handling would be
skipped for any UDP socket which has encapsulation enabled.
In the case of L2TP tunnel sockets using UDP-encap, this latter
modification effectively broke ICMP error reporting for the L2TP
control plane.
To a degree this isn't catastrophic inasmuch as the L2TP control
protocol defines a reliable transport on top of the underlying packet
switching network which will eventually detect errors and time out.
However, paying attention to the ICMP error reporting allows for more
timely detection of errors in L2TP userspace, and aids in debugging
connectivity issues.
Reinstate ICMP error handling for UDP encap L2TP tunnels:
* implement struct udp_tunnel_sock_cfg .encap_err_rcv in order to allow
the L2TP code to handle ICMP errors;
* only implement error-handling for tunnels which have a managed
socket: unmanaged tunnels using a kernel socket have no userspace to
report errors back to;
* flag the error on the socket, which allows for userspace to get an
error such as -ECONNREFUSED back from sendmsg/recvmsg;
* pass the error into ip[v6]_icmp_error() which allows for userspace to
get extended error information via. MSG_ERRQUEUE.
Fixes: d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err")
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Link: https://lore.kernel.org/r/20240513172248.623261-1-tparkin@katalix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
trafgen performance considerably sank on hosts with many cores
after the blamed commit.
packet_read_pending() is very expensive, and calling it
in af_packet fast path defeats Daniel intent in commit
b013840810c2 ("packet: use percpu mmap tx frame pending refcount")
tpacket_destruct_skb() makes room for one packet, we can immediately
wakeup a producer, no need to completely drain the tx ring.
Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240515163358.4105915-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot loves netrom, and found a possible deadlock in nr_rt_ioctl [1]
Make sure we always acquire nr_node_list_lock before nr_node_lock(nr_node)
[1]
WARNING: possible circular locking dependency detected
6.9.0-rc7-syzkaller-02147-g654de42f3fc6 #0 Not tainted
------------------------------------------------------
syz-executor350/5129 is trying to acquire lock:
ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_node_lock include/net/netrom.h:152 [inline]
ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:464 [inline]
ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697
but task is already holding lock:
ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline]
ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (nr_node_list_lock){+...}-{2:2}:
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
nr_remove_node net/netrom/nr_route.c:299 [inline]
nr_del_node+0x4b4/0x820 net/netrom/nr_route.c:355
nr_rt_ioctl+0xa95/0x1090 net/netrom/nr_route.c:683
sock_do_ioctl+0x158/0x460 net/socket.c:1222
sock_ioctl+0x629/0x8e0 net/socket.c:1341
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&nr_node->node_lock){+...}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
nr_node_lock include/net/netrom.h:152 [inline]
nr_dec_obs net/netrom/nr_route.c:464 [inline]
nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697
sock_do_ioctl+0x158/0x460 net/socket.c:1222
sock_ioctl+0x629/0x8e0 net/socket.c:1341
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(nr_node_list_lock);
lock(&nr_node->node_lock);
lock(nr_node_list_lock);
lock(&nr_node->node_lock);
*** DEADLOCK ***
1 lock held by syz-executor350/5129:
#0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
#0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline]
#0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697
stack backtrace:
CPU: 0 PID: 5129 Comm: syz-executor350 Not tainted 6.9.0-rc7-syzkaller-02147-g654de42f3fc6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
nr_node_lock include/net/netrom.h:152 [inline]
nr_dec_obs net/netrom/nr_route.c:464 [inline]
nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697
sock_do_ioctl+0x158/0x460 net/socket.c:1222
sock_ioctl+0x629/0x8e0 net/socket.c:1341
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240515142934.3708038-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Problem
=========
After commit 67f695134703 ("ipv6: Move setting default metric for routes"),
we noticed that the logic of assigning the default value of fc_metirc
changed in the ioctl process. That is, when users use ioctl(fd, SIOCADDRT,
rt) with a non-zero metric to add a route, then they may fail to delete a
route with passing in a metric value of 0 to the kernel by ioctl(fd,
SIOCDELRT, rt). But iproute can succeed in deleting it.
As a reference, when using iproute tools by netlink to delete routes with
a metric parameter equals 0, like the command as follows:
ip -6 route del fe80::/64 via fe81::5054:ff:fe11:3451 dev eth0 metric 0
the user can still succeed in deleting the route entry with the smallest
metric.
Root Reason
===========
After commit 67f695134703 ("ipv6: Move setting default metric for routes"),
When ioctl() pass in SIOCDELRT with a zero metric, rtmsg_to_fib6_config()
will set a defalut value (1024) to cfg->fc_metric in kernel, and in
ip6_route_del() and the line 4074 at net/ipv3/route.c, it will check by
if (cfg->fc_metric && cfg->fc_metric != rt->fib6_metric)
continue;
and the condition is true and skip the later procedure (deleting route)
because cfg->fc_metric != rt->fib6_metric. But before that commit,
cfg->fc_metric is still zero there, so the condition is false and it
will do the following procedure (deleting).
Solution
========
In order to keep a consistent behaviour across netlink() and ioctl(), we
should allow to delete a route with a metric value of 0. So we only do
the default setting of fc_metric in route adding.
CC: stable@vger.kernel.org # 5.4+
Fixes: 67f695134703 ("ipv6: Move setting default metric for routes")
Co-developed-by: Fan Yu <fan.yu9@zte.com.cn>
Signed-off-by: Fan Yu <fan.yu9@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240514201102055dD2Ba45qKbLlUMxu_DTHP@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The qrtr protocol core logic and the qrtr nameservice are combined into
a single module. Neither the core logic or nameservice provide much
functionality by themselves; combining the two into a single module also
prevents any possible issues that may stem from client modules loading
inbetween qrtr and the ns.
Creating a socket takes two references to the module that owns the
socket protocol. Since the ns needs to create the control socket, this
creates a scenario where there are always two references to the qrtr
module. This prevents the execution of 'rmmod' for qrtr.
To resolve this, forcefully put the module refcount for the socket
opened by the nameservice.
Fixes: a365023a76f2 ("net: qrtr: combine nameservice into main module")
Reported-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Tested-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Chris Lew <quic_clew@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross merge.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported a suspicious rcu usage[1] in bridge's mst code. While
fixing it I noticed that nothing prevents a vlan to be freed while
walking the list from the same path (br forward delay timer). Fix the rcu
usage and also make sure we are not accessing freed memory by making
br_mst_vlan_set_state use rcu read lock.
[1]
WARNING: suspicious RCU usage
6.9.0-rc6-syzkaller #0 Not tainted
-----------------------------
net/bridge/br_private.h:1599 suspicious rcu_dereference_protected() usage!
...
stack backtrace:
CPU: 1 PID: 8017 Comm: syz-executor.1 Not tainted 6.9.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712
nbp_vlan_group net/bridge/br_private.h:1599 [inline]
br_mst_set_state+0x1ea/0x650 net/bridge/br_mst.c:105
br_set_state+0x28a/0x7b0 net/bridge/br_stp.c:47
br_forward_delay_timer_expired+0x176/0x440 net/bridge/br_stp_timer.c:88
call_timer_fn+0x18e/0x650 kernel/time/timer.c:1793
expire_timers kernel/time/timer.c:1844 [inline]
__run_timers kernel/time/timer.c:2418 [inline]
__run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2429
run_timer_base kernel/time/timer.c:2438 [inline]
run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2448
__do_softirq+0x2c6/0x980 kernel/softirq.c:554
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:lock_acquire+0x264/0x550 kernel/locking/lockdep.c:5758
Code: 2b 00 74 08 4c 89 f7 e8 ba d1 84 00 f6 44 24 61 02 0f 85 85 01 00 00 41 f7 c7 00 02 00 00 74 01 fb 48 c7 44 24 40 0e 36 e0 45 <4b> c7 44 25 00 00 00 00 00 43 c7 44 25 09 00 00 00 00 43 c7 44 25
RSP: 0018:ffffc90013657100 EFLAGS: 00000206
RAX: 0000000000000001 RBX: 1ffff920026cae2c RCX: 0000000000000001
RDX: dffffc0000000000 RSI: ffffffff8bcaca00 RDI: ffffffff8c1eaa60
RBP: ffffc90013657260 R08: ffffffff92efe507 R09: 1ffffffff25dfca0
R10: dffffc0000000000 R11: fffffbfff25dfca1 R12: 1ffff920026cae28
R13: dffffc0000000000 R14: ffffc90013657160 R15: 0000000000000246
Fixes: ec7328b59176 ("net: bridge: mst: Multiple Spanning Tree (MST) mode")
Reported-by: syzbot+fa04eb8a56fd923fc5d8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fa04eb8a56fd923fc5d8
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot triggered an uninit value[1] error in bridge device's xmit path
by sending a short (less than ETH_HLEN bytes) skb. To fix it check if
we can actually pull that amount instead of assuming.
Tested with dropwatch:
drop at: br_dev_xmit+0xb93/0x12d0 [bridge] (0xffffffffc06739b3)
origin: software
timestamp: Mon May 13 11:31:53 2024 778214037 nsec
protocol: 0x88a8
length: 2
original length: 2
drop reason: PKT_TOO_SMALL
[1]
BUG: KMSAN: uninit-value in br_dev_xmit+0x61d/0x1cb0 net/bridge/br_device.c:65
br_dev_xmit+0x61d/0x1cb0 net/bridge/br_device.c:65
__netdev_start_xmit include/linux/netdevice.h:4903 [inline]
netdev_start_xmit include/linux/netdevice.h:4917 [inline]
xmit_one net/core/dev.c:3531 [inline]
dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547
__dev_queue_xmit+0x34db/0x5350 net/core/dev.c:4341
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
__bpf_tx_skb net/core/filter.c:2136 [inline]
__bpf_redirect_common net/core/filter.c:2180 [inline]
__bpf_redirect+0x14a6/0x1620 net/core/filter.c:2187
____bpf_clone_redirect net/core/filter.c:2460 [inline]
bpf_clone_redirect+0x328/0x470 net/core/filter.c:2432
___bpf_prog_run+0x13fe/0xe0f0 kernel/bpf/core.c:1997
__bpf_prog_run512+0xb5/0xe0 kernel/bpf/core.c:2238
bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
__bpf_prog_run include/linux/filter.h:657 [inline]
bpf_prog_run include/linux/filter.h:664 [inline]
bpf_test_run+0x499/0xc30 net/bpf/test_run.c:425
bpf_prog_test_run_skb+0x14ea/0x1f20 net/bpf/test_run.c:1058
bpf_prog_test_run+0x6b7/0xad0 kernel/bpf/syscall.c:4269
__sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5678
__do_sys_bpf kernel/bpf/syscall.c:5767 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5765 [inline]
__x64_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5765
x64_sys_call+0x96b/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:322
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+a63a1f6a062033cf0f40@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a63a1f6a062033cf0f40
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd
passing functionality. New method based on Tarjan's Strongly
Connected Components algorithm should be both faster and remove a
lot of workarounds we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP
packets and forwarding them together. Useful for small switches /
routers which lack basic checksum offload in some scenarios (e.g.
PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't use
NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6
address labels, netdev threaded NAPI sysfs files, bonding driver's
sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics,
TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot
of the link information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory
accounting, RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2%
PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol
driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states. State can be
used either for input or output packet processing.
Things we sprinkled into general kernel code:
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter:
- Use GFP_KERNEL to clone elements, to deal better with OOM
situations and avoid failures in the .commit step.
BPF:
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function
entry and return, the entry program can decide if the return
program gets executed and the entry program can share u64 cookie
value with return program. "Session mode" is a common use-case for
tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw
tracepoint programs in order to ease migration from classic to raw
tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V
JITs. This allows inlining functions which need to access per-CPU
state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86
instruction. Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor
process-context bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto
APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API:
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by
rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line)
config.
- Initial bits of a queue control API, for now allowing a single
queue to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling:
- Remove the need to create a config file to run the net forwarding
tests so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test
machine). Add a few such tests.
- Create a shared code library for writing Python tests. Expose the
YAML Netlink library from tools/ to the tests for easy Netlink
access.
- Move netfilter tests under net/, extend them, separate performance
tests from correctness tests, and iron out issues found by running
them "on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF
info, TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers:
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application
rather than dmesg (large number of driver changes from Asbjørn
Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it
messes up TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the
MII bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API
cleanup
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices
drivers. Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support"
* tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits)
selftests: netfilter: fix packetdrill conntrack testcase
net: gro: fix napi_gro_cb zeroed alignment
Bluetooth: btintel_pcie: Refactor and code cleanup
Bluetooth: btintel_pcie: Fix warning reported by sparse
Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1
Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config
Bluetooth: btintel_pcie: Fix compiler warnings
Bluetooth: btintel_pcie: Add *setup* function to download firmware
Bluetooth: btintel_pcie: Add support for PCIe transport
Bluetooth: btintel: Export few static functions
Bluetooth: HCI: Remove HCI_AMP support
Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
Bluetooth: qca: Fix error code in qca_read_fw_build_info()
Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning
Bluetooth: btintel: Add support for Filmore Peak2 (BE201)
Bluetooth: btintel: Add support for BlazarI
LE Create Connection command timeout increased to 20 secs
dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth
Bluetooth: compute LE flow credits based on recvbuf space
Bluetooth: hci_sync: Use cmd->num_cis instead of magic number
...
|
|
There is no user of iucv_root outside of the core IUCV code left.
Therefore remove the EXPORT_SYMBOL.
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20240506194454.1160315-7-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Provide iucv_alloc_device() and iucv_release_device() helper functions,
which can be used to deduplicate more or less identical IUCV device
allocation and release code in four different drivers.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20240506194454.1160315-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Merge in late fixes to prepare for the 6.10 net-next PR.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If hdev->le_num_of_adv_sets is set to 1 it means that only handle 0x00
can be used, but since the MGMT interface instances start from 1
(instance 0 means all instances in case of MGMT_OP_REMOVE_ADVERTISING)
the code needs to map the instance to handle otherwise users will not be
able to advertise as instance 1 would attempt to use handle 0x01.
Fixes: 1d0fac2c38ed ("Bluetooth: Use controller sets when available")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Since BT_HS has been remove HCI_AMP controllers no longer has any use so
remove it along with the capability of creating AMP controllers.
Since we no longer need to differentiate between AMP and Primary
controllers, as only HCI_PRIMARY is left, this also remove
hdev->dev_type altogether.
Fixes: e7b02296fb40 ("Bluetooth: Remove BT_HS")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
l2cap_le_flowctl_init() can cause both div-by-zero and an integer
overflow since hdev->le_mtu may not fall in the valid range.
Move MTU from hci_dev to hci_conn to validate MTU and stop the connection
process earlier if MTU is invalid.
Also, add a missing validation in read_buffer_size() and make it return
an error value if the validation fails.
Now hci_conn_add() returns ERR_PTR() as it can fail due to the both a
kzalloc failure and invalid MTU value.
divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 PID: 67 Comm: kworker/u5:0 Tainted: G W 6.9.0-rc5+ #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: hci0 hci_rx_work
RIP: 0010:l2cap_le_flowctl_init+0x19e/0x3f0 net/bluetooth/l2cap_core.c:547
Code: e8 17 17 0c 00 66 41 89 9f 84 00 00 00 bf 01 00 00 00 41 b8 02 00 00 00 4c
89 fe 4c 89 e2 89 d9 e8 27 17 0c 00 44 89 f0 31 d2 <66> f7 f3 89 c3 ff c3 4d 8d
b7 88 00 00 00 4c 89 f0 48 c1 e8 03 42
RSP: 0018:ffff88810bc0f858 EFLAGS: 00010246
RAX: 00000000000002a0 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffff88810bc0f7c0 RDI: ffffc90002dcb66f
RBP: ffff88810bc0f880 R08: aa69db2dda70ff01 R09: 0000ffaaaaaaaaaa
R10: 0084000000ffaaaa R11: 0000000000000000 R12: ffff88810d65a084
R13: dffffc0000000000 R14: 00000000000002a0 R15: ffff88810d65a000
FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000100 CR3: 0000000103268003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
l2cap_le_connect_req net/bluetooth/l2cap_core.c:4902 [inline]
l2cap_le_sig_cmd net/bluetooth/l2cap_core.c:5420 [inline]
l2cap_le_sig_channel net/bluetooth/l2cap_core.c:5486 [inline]
l2cap_recv_frame+0xe59d/0x11710 net/bluetooth/l2cap_core.c:6809
l2cap_recv_acldata+0x544/0x10a0 net/bluetooth/l2cap_core.c:7506
hci_acldata_packet net/bluetooth/hci_core.c:3939 [inline]
hci_rx_work+0x5e5/0xb20 net/bluetooth/hci_core.c:4176
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0x90f/0x1530 kernel/workqueue.c:3335
worker_thread+0x926/0xe70 kernel/workqueue.c:3416
kthread+0x2e3/0x380 kernel/kthread.c:388
ret_from_fork+0x5c/0x90 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
Fixes: 6ed58ec520ad ("Bluetooth: Use LE buffers for LE traffic")
Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Prepare for the coming implementation by GCC and Clang of the
__counted_by attribute. Flexible array members annotated with
__counted_by can have their accesses bounds-checked at run-time
via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE
(for strcpy/memcpy-family functions).
Also, -Wflex-array-member-not-at-end is coming in GCC-14, and we are
getting ready to enable it globally.
So, use the `DEFINE_FLEX()` helper for an on-stack definition of
a flexible structure where the size of the flexible-array member
is known at compile-time, and refactor the rest of the code,
accordingly.
With these changes, fix the following warning:
net/bluetooth/hci_conn.c:669:41: warning: structure containing a
flexible array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
Link: https://github.com/KSPP/linux/issues/202
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
On our DUT, we can see that the host issues create connection cancel
command after 4-sec if there is no connection complete event for
LE create connection cmd.
As per core spec v5.3 section 7.8.5, advertisement interval range is-
Advertising_Interval_Min
Default : 0x0800(1.28s)
Time Range: 20ms to 10.24s
Advertising_Interval_Max
Default : 0x0800(1.28s)
Time Range: 20ms to 10.24s
If the remote device is using adv interval of > 4 sec, it is
difficult to make a connection with the current timeout value.
Also, with the default interval of 1.28 sec, we will get only
3 chances to capture the adv packets with the 4 sec window.
Hence we want to increase this timeout to 20sec.
Signed-off-by: Mahesh Talewad <mahesh.talewad@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Previously LE flow credits were returned to the
sender even if the socket's receive buffer was
full. This meant that no back-pressure
was applied to the sender, thus it continued to
send data, resulting in data loss without any
error being reported. Furthermore, the amount
of credits was essentially fixed to a small
amount, leading to reduced performance.
This is fixed by computing the number of returned
LE flow credits based on the estimated available
space in the receive buffer of an L2CAP socket.
Consequently, if the receive buffer is full, no
credits are returned until the buffer is read and
thus cleared by user-space.
Since the computation of available receive buffer
space can only be performed approximately (due to
sk_buff overhead) and the receive buffer size may
be changed by user-space after flow credits have
been sent, superfluous received data is temporary
stored within l2cap_pinfo. This is necessary
because Bluetooth LE provides no retransmission
mechanism once the data has been acked by the
physical layer.
If receive buffer space estimation is not possible
at the moment, we fall back to providing credits
for one full packet as before. This is currently
the case during connection setup, when MPS is not
yet available.
Fixes: b1c325c23d75 ("Bluetooth: Implement returning of LE L2CAP credits")
Signed-off-by: Sebastian Urban <surban@surban.net>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|