Age | Commit message (Collapse) | Author | Files | Lines |
|
https://github.com/openbmc/linux into openbmc/dev-5.15-intel-bump_v5.15.36
Signed-off-by: Sujoy Ray <sujoy.ray@intel.com>
|
|
This is the 5.15.36 stable release
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
commit 34243b9ec856309339172b1507379074156947e8 upstream.
The conversion erroneously removed the refcount increment.
In case we can use the percpu template, we need to increment
the refcount, else it will be released when the skb gets freed.
In case the slowpath is taken, the new template already has a
refcount of 1.
Fixes: 719774377622 ("netfilter: conntrack: convert to refcount_t api")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6ae7989c9af0d98ab64196f4f4c6f6499454bd23 upstream.
nf_ct_put() results in a usesless indirection:
nf_ct_put -> nf_conntrack_put -> nf_conntrack_destroy -> rcu readlock +
indirect call of ct_hooks->destroy().
There are two _put helpers:
nf_ct_put and nf_conntrack_put. The latter is what should be used in
code that MUST NOT cause a linker dependency on the conntrack module
(e.g. calls from core network stack).
Everyone else should call nf_ct_put() instead.
A followup patch will convert a few nf_conntrack_put() calls to
nf_ct_put(), in particular from modules that already have a conntrack
dependency such as act_ct or even nf_conntrack itself.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 719774377622bc4025d2a74f551b5dc2158c6c30 upstream.
Convert nf_conn reference counting from atomic_t to refcount_t based api.
refcount_t api provides more runtime sanity checks and will warn on
certain constructs, e.g. refcount_inc() on a zero reference count, which
usually indicates use-after-free.
For this reason template allocation is changed to init the refcount to
1, the subsequenct add operations are removed.
Likewise, init_conntrack() is changed to set the initial refcount to 1
instead refcount_inc().
This is safe because the new entry is not (yet) visible to other cpus.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cefa91b2332d7009bc0be5d951d6cbbf349f90f8 upstream.
Given a sufficiently large number of actions, while copying and
reserving memory for a new action of a new flow, if next_offset is
greater than MAX_ACTIONS_BUFSIZE, the function reserve_sfa_size() does
not return -EMSGSIZE as expected, but it allocates MAX_ACTIONS_BUFSIZE
bytes increasing actions_len by req_size. This can then lead to an OOB
write access, especially when further actions need to be copied.
Fix it by rearranging the flow action size check.
KASAN splat below:
==================================================================
BUG: KASAN: slab-out-of-bounds in reserve_sfa_size+0x1ba/0x380 [openvswitch]
Write of size 65360 at addr ffff888147e4001c by task handler15/836
CPU: 1 PID: 836 Comm: handler15 Not tainted 5.18.0-rc1+ #27
...
Call Trace:
<TASK>
dump_stack_lvl+0x45/0x5a
print_report.cold+0x5e/0x5db
? __lock_text_start+0x8/0x8
? reserve_sfa_size+0x1ba/0x380 [openvswitch]
kasan_report+0xb5/0x130
? reserve_sfa_size+0x1ba/0x380 [openvswitch]
kasan_check_range+0xf5/0x1d0
memcpy+0x39/0x60
reserve_sfa_size+0x1ba/0x380 [openvswitch]
__add_action+0x24/0x120 [openvswitch]
ovs_nla_add_action+0xe/0x20 [openvswitch]
ovs_ct_copy_action+0x29d/0x1130 [openvswitch]
? __kernel_text_address+0xe/0x30
? unwind_get_return_address+0x56/0xa0
? create_prof_cpu_mask+0x20/0x20
? ovs_ct_verify+0xf0/0xf0 [openvswitch]
? prep_compound_page+0x198/0x2a0
? __kasan_check_byte+0x10/0x40
? kasan_unpoison+0x40/0x70
? ksize+0x44/0x60
? reserve_sfa_size+0x75/0x380 [openvswitch]
__ovs_nla_copy_actions+0xc26/0x2070 [openvswitch]
? __zone_watermark_ok+0x420/0x420
? validate_set.constprop.0+0xc90/0xc90 [openvswitch]
? __alloc_pages+0x1a9/0x3e0
? __alloc_pages_slowpath.constprop.0+0x1da0/0x1da0
? unwind_next_frame+0x991/0x1e40
? __mod_node_page_state+0x99/0x120
? __mod_lruvec_page_state+0x2e3/0x470
? __kasan_kmalloc_large+0x90/0xe0
ovs_nla_copy_actions+0x1b4/0x2c0 [openvswitch]
ovs_flow_cmd_new+0x3cd/0xb10 [openvswitch]
...
Cc: stable@vger.kernel.org
Fixes: f28cd2af22a0 ("openvswitch: fix flow actions reallocation")
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 99c07327ae11e24886d552dddbe4537bfca2765d ]
netlink_dump() is allocating an skb, reserves space in it
but forgets to reset network header.
This allows a BPF program, invoked later from sk_filter()
to access uninitialized kernel memory from the reserved
space.
Theorically mac header reset could be omitted, because
it is set to a special initial value.
bpf_internal_load_pointer_neg_helper calls skb_mac_header()
without checking skb_mac_header_was_set().
Relying on skb->len not being too big seems fragile.
We also could add a sanity check in bpf_internal_load_pointer_neg_helper()
to avoid surprises in the future.
syzbot report was:
BUG: KMSAN: uninit-value in ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637
___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637
__bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796
bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline]
__bpf_prog_run include/linux/filter.h:626 [inline]
bpf_prog_run include/linux/filter.h:633 [inline]
__bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756
bpf_prog_run_save_cb include/linux/filter.h:770 [inline]
sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150
sk_filter include/linux/filter.h:905 [inline]
netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276
netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
sock_recvmsg_nosec net/socket.c:948 [inline]
sock_recvmsg net/socket.c:966 [inline]
sock_read_iter+0x5a9/0x630 net/socket.c:1039
do_iter_readv_writev+0xa7f/0xc70
do_iter_read+0x52c/0x14c0 fs/read_write.c:786
vfs_readv fs/read_write.c:906 [inline]
do_readv+0x432/0x800 fs/read_write.c:943
__do_sys_readv fs/read_write.c:1034 [inline]
__se_sys_readv fs/read_write.c:1031 [inline]
__x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x44/0xae
Uninit was stored to memory at:
___bpf_prog_run+0x96c/0xb420 kernel/bpf/core.c:1558
__bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796
bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline]
__bpf_prog_run include/linux/filter.h:626 [inline]
bpf_prog_run include/linux/filter.h:633 [inline]
__bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756
bpf_prog_run_save_cb include/linux/filter.h:770 [inline]
sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150
sk_filter include/linux/filter.h:905 [inline]
netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276
netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
sock_recvmsg_nosec net/socket.c:948 [inline]
sock_recvmsg net/socket.c:966 [inline]
sock_read_iter+0x5a9/0x630 net/socket.c:1039
do_iter_readv_writev+0xa7f/0xc70
do_iter_read+0x52c/0x14c0 fs/read_write.c:786
vfs_readv fs/read_write.c:906 [inline]
do_readv+0x432/0x800 fs/read_write.c:943
__do_sys_readv fs/read_write.c:1034 [inline]
__se_sys_readv fs/read_write.c:1031 [inline]
__x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x44/0xae
Uninit was created at:
slab_post_alloc_hook mm/slab.h:737 [inline]
slab_alloc_node mm/slub.c:3244 [inline]
__kmalloc_node_track_caller+0xde3/0x14f0 mm/slub.c:4972
kmalloc_reserve net/core/skbuff.c:354 [inline]
__alloc_skb+0x545/0xf90 net/core/skbuff.c:426
alloc_skb include/linux/skbuff.h:1158 [inline]
netlink_dump+0x30f/0x16c0 net/netlink/af_netlink.c:2242
netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
sock_recvmsg_nosec net/socket.c:948 [inline]
sock_recvmsg net/socket.c:966 [inline]
sock_read_iter+0x5a9/0x630 net/socket.c:1039
do_iter_readv_writev+0xa7f/0xc70
do_iter_read+0x52c/0x14c0 fs/read_write.c:786
vfs_readv fs/read_write.c:906 [inline]
do_readv+0x432/0x800 fs/read_write.c:943
__do_sys_readv fs/read_write.c:1034 [inline]
__se_sys_readv fs/read_write.c:1031 [inline]
__x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x44/0xae
CPU: 0 PID: 3470 Comm: syz-executor751 Not tainted 5.17.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: db65a3aaf29e ("netlink: Trim skb to alloc size to avoid MSG_TRUNC")
Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220415181442.551228-1-eric.dumazet@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0763120b090418a5257402754e22a34227ae5f12 ]
In case the checksum calculation is offloaded to the DSA master network
interface, it will include the switch trailing tag. As soon as the switch strips
that tag on egress, the calculated checksum is wrong.
Therefore, add the checksum calculation to the tagger (if required) before
adding the switch tag. This way, the hellcreek code works with all DSA master
interfaces regardless of their declared feature set.
Fixes: 01ef09caad66 ("net: dsa: Add tag handling for Hirschmann Hellcreek switches")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220415103320.90657-1-kurt@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d73497081710c876c3c61444445512989e102152 ]
The first attempt to fix a the 'impossible' WARN_ON_ONCE(1) in
isotp_tx_timer_handler() focussed on the identical CAN IDs created by
the syzbot reproducer and lead to upstream fix/commit 3ea566422cbd
("can: isotp: sanitize CAN ID checks in isotp_bind()"). But this did
not catch the root cause of the wrong tx.state in the tx_timer handler.
In the isotp 'first frame' case a timeout monitoring needs to be started
before the 'first frame' is send. But when this sending failed the timeout
monitoring for this specific frame has to be disabled too.
Otherwise the tx_timer is fired with the 'warn me' tx.state of ISOTP_IDLE.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20220405175112.2682-1-socketcan@hartkopp.net
Reported-by: syzbot+2339c27f5c66c652843e@syzkaller.appspotmail.com
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9cb7c013420f98fa6fd12fc6a5dc055170c108db ]
Reads and Writes to ip6_rt_gc_expire always have been racy,
as syzbot reported lately [1]
There is a possible risk of under-flow, leading
to unexpected high value passed to fib6_run_gc(),
although I have not observed this in the field.
Hosts hitting ip6_dst_gc() very hard are under pretty bad
state anyway.
[1]
BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc
read-write to 0xffff888102110744 of 4 bytes by task 13165 on cpu 1:
ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
dst_alloc+0x9b/0x160 net/core/dst.c:86
ip6_dst_alloc net/ipv6/route.c:344 [inline]
icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
mld_send_cr net/ipv6/mcast.c:2119 [inline]
mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
worker_thread+0x618/0xa70 kernel/workqueue.c:2436
kthread+0x1a9/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30
read-write to 0xffff888102110744 of 4 bytes by task 11607 on cpu 0:
ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
dst_alloc+0x9b/0x160 net/core/dst.c:86
ip6_dst_alloc net/ipv6/route.c:344 [inline]
icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
mld_send_cr net/ipv6/mcast.c:2119 [inline]
mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
worker_thread+0x618/0xa70 kernel/workqueue.c:2436
kthread+0x1a9/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30
value changed: 0x00000bb3 -> 0x00000ba9
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 11607 Comm: kworker/0:21 Not tainted 5.18.0-rc1-syzkaller-00037-g42e7a03d3bad-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: mld mld_ifc_work
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220413181333.649424-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
netdev_master_upper_dev_get_rcu
[ Upstream commit 83daab06252ee5d0e1f4373ff28b79304945fc19 ]
Next patch uses l3mdev_master_upper_ifindex_by_index_rcu which throws
a splat with debug kernels:
[13783.087570] ------------[ cut here ]------------
[13783.093974] RTNL: assertion failed at net/core/dev.c (6702)
[13783.100761] WARNING: CPU: 3 PID: 51132 at net/core/dev.c:6702 netdev_master_upper_dev_get+0x16a/0x1a0
[13783.184226] CPU: 3 PID: 51132 Comm: kworker/3:3 Not tainted 5.17.0-custom-100090-g6f963aafb1cc #682
[13783.194788] Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017
[13783.204755] Workqueue: mld mld_ifc_work [ipv6]
[13783.210338] RIP: 0010:netdev_master_upper_dev_get+0x16a/0x1a0
[13783.217209] Code: 0f 85 e3 fe ff ff e8 65 ac ec fe ba 2e 1a 00 00 48 c7 c6 60 6f 38 83 48 c7 c7 c0 70 38 83 c6 05 5e b5 d7 01 01 e8 c6 29 52 00 <0f> 0b e9 b8 fe ff ff e8 5a 6c 35 ff e9 1c ff ff ff 48 89 ef e8 7d
[13783.238659] RSP: 0018:ffffc9000b37f5a8 EFLAGS: 00010286
[13783.244995] RAX: 0000000000000000 RBX: ffff88812ee5c000 RCX: 0000000000000000
[13783.253379] RDX: ffff88811ce09d40 RSI: ffffffff812d0fcd RDI: fffff5200166fea7
[13783.261769] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8882375f4287
[13783.270138] R10: ffffed1046ebe850 R11: 0000000000000001 R12: dffffc0000000000
[13783.278510] R13: 0000000000000275 R14: ffffc9000b37f688 R15: ffff8881273b4af8
[13783.286870] FS: 0000000000000000(0000) GS:ffff888237400000(0000) knlGS:0000000000000000
[13783.296352] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13783.303177] CR2: 00007ff25fc9b2e8 CR3: 0000000174d23000 CR4: 00000000001006e0
[13783.311546] Call Trace:
[13783.314660] <TASK>
[13783.317553] l3mdev_master_upper_ifindex_by_index_rcu+0x43/0xe0
...
Change l3mdev_master_upper_ifindex_by_index_rcu to use
netdev_master_upper_dev_get_rcu.
Fixes: 6a6d6681ac1a ("l3mdev: add function to retreive upper master")
Signed-off-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
Cc: Alexis Bauvin <abauvin@scaleway.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ec5b0f605b105457f257f2870acad4a5d463984b ]
While investigating a related syzbot report,
I found that whenever call to tcf_exts_init()
from u32_init_knode() is failing, we end up
with an elevated refcount on ht->refcnt
To avoid that, only increase the refcount after
all possible errors have been evaluated.
Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ab198e1d0dd8dc4bc7575fb50758e2cbd51e14e1 ]
Feng reported an skb_under_panic BUG triggered by running
test_ip6gretap() in tools/testing/selftests/bpf/test_tunnel.sh:
[ 82.492551] skbuff: skb_under_panic: text:ffffffffb268bb8e len:403 put:12 head:ffff9997c5480000 data:ffff9997c547fff8 tail:0x18b end:0x2c0 dev:ip6gretap11
<...>
[ 82.607380] Call Trace:
[ 82.609389] <TASK>
[ 82.611136] skb_push.cold.109+0x10/0x10
[ 82.614289] __gre6_xmit+0x41e/0x590
[ 82.617169] ip6gre_tunnel_xmit+0x344/0x3f0
[ 82.620526] dev_hard_start_xmit+0xf1/0x330
[ 82.623882] sch_direct_xmit+0xe4/0x250
[ 82.626961] __dev_queue_xmit+0x720/0xfe0
<...>
[ 82.633431] packet_sendmsg+0x96a/0x1cb0
[ 82.636568] sock_sendmsg+0x30/0x40
<...>
The following sequence of events caused the BUG:
1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is
calculated based on old flags (see ip6gre_calc_hlen());
2. packet_snd() reserves header room for skb A, assuming
tunnel->tun_hlen is 4;
3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for
skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel());
4. __gre6_xmit() detects the new tunnel key, and recalculates
"tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and
TUNNEL_SEQ);
5. gre_build_header() calls skb_push() with insufficient reserved header
room, triggering the BUG.
As sugguested by Cong, fix it by moving the call to skb_cow_head() after
the recalculation of tun_hlen.
Reproducer:
OBJ=$LINUX/tools/testing/selftests/bpf/test_tunnel_kern.o
ip netns add at_ns0
ip link add veth0 type veth peer name veth1
ip link set veth0 netns at_ns0
ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
ip netns exec at_ns0 ip link set dev veth0 up
ip link set dev veth1 up mtu 1500
ip addr add dev veth1 172.16.1.200/24
ip netns exec at_ns0 ip addr add ::11/96 dev veth0
ip netns exec at_ns0 ip link set dev veth0 up
ip addr add dev veth1 ::22/96
ip link set dev veth1 up
ip netns exec at_ns0 \
ip link add dev ip6gretap00 type ip6gretap seq flowlabel 0xbcdef key 2 \
local ::11 remote ::22
ip netns exec at_ns0 ip addr add dev ip6gretap00 10.1.1.100/24
ip netns exec at_ns0 ip addr add dev ip6gretap00 fc80::100/96
ip netns exec at_ns0 ip link set dev ip6gretap00 up
ip link add dev ip6gretap11 type ip6gretap external
ip addr add dev ip6gretap11 10.1.1.200/24
ip addr add dev ip6gretap11 fc80::200/24
ip link set dev ip6gretap11 up
tc qdisc add dev ip6gretap11 clsact
tc filter add dev ip6gretap11 egress bpf da obj $OBJ sec ip6gretap_set_tunnel
tc filter add dev ip6gretap11 ingress bpf da obj $OBJ sec ip6gretap_get_tunnel
ping6 -c 3 -w 10 -q ::11
Fixes: 6712abc168eb ("ip6_gre: add ip6 gre and gretap collect_md mode")
Reported-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Co-developed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f40c064e933d7787ca7411b699504d7a2664c1f5 ]
Do not update tunnel->tun_hlen in data plane code. Use a local variable
instead, just like "tunnel_hlen" in net/ipv4/ip_gre.c:gre_fb_xmit().
Co-developed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 29e8e659f984be00d75ec5fef4e37c88def72712 ]
packet_sock xmit could be dev_queue_xmit, which also returns negative
errors. So only checking positive errors is not enough, or userspace
sendmsg may return success while packet is not send out.
Move the net_xmit_errno() assignment in the braces as checkpatch.pl said
do not use assignment in if condition.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1a74e99323746353bba11562a2f2d0aa8102f402 ]
Since commit e5d5aadcf3cd ("net/smc: fix sk_refcnt underflow on linkdown
and fallback"), for a fallback connection, __smc_release() does not call
sock_put() if its state is already SMC_CLOSED.
When calling smc_shutdown() after falling back, its state is set to
SMC_CLOSED but does not call sock_put(), so this patch calls it.
Reported-and-tested-by: syzbot+6e29a053eb165bd50de5@syzkaller.appspotmail.com
Fixes: e5d5aadcf3cd ("net/smc: fix sk_refcnt underflow on linkdown and fallback")
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ee3b0826b4764f6c13ad6db67495c5a1c38e9025 ]
A recent patch[1] from Eric Dumazet flipped the order in which the
keepalive timer and the keepalive worker were cancelled in order to fix a
syzbot reported issue[2]. Unfortunately, this enables the mirror image bug
whereby the timer races with rxrpc_exit_net(), restarting the worker after
it has been cancelled:
CPU 1 CPU 2
=============== =====================
if (rxnet->live)
<INTERRUPT>
rxnet->live = false;
cancel_work_sync(&rxnet->peer_keepalive_work);
rxrpc_queue_work(&rxnet->peer_keepalive_work);
del_timer_sync(&rxnet->peer_keepalive_timer);
Fix this by restoring the removed del_timer_sync() so that we try to remove
the timer twice. If the timer runs again, it should see ->live == false
and not restart the worker.
Fixes: 1946014ca3b1 ("rxrpc: fix a race in rxrpc_exit_net()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Dumazet <edumazet@google.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20220404183439.3537837-1-eric.dumazet@gmail.com/ [1]
Link: https://syzkaller.appspot.com/bug?extid=724378c4bb58f703b09a [2]
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5bd8baab087dff657e05387aee802e70304cc813 ]
Commit ebe48d368e97 ("esp: Fix possible buffer overflow in ESP
transformation") tried to fix skb_page_frag_refill usage in ESP by
capping allocsize to 32k, but that doesn't completely solve the issue,
as skb_page_frag_refill may return a single page. If that happens, we
will write out of bounds, despite the check introduced in the previous
patch.
This patch forces COW in cases where we would end up calling
skb_page_frag_refill with a size larger than a page (first in
esp_output_head with tailen, then in esp_output_tail with
skb->data_len).
Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 3db09e762dc79584a69c10d74a6b98f89a9979f8 upstream.
We are now able to detect extra put_net() at the moment
they happen, instead of much later in correct code paths.
u32_init_knode() / tcf_exts_init() populates the ->exts.net
pointer, but as mentioned in tcf_exts_init(),
the refcount on netns has not been elevated yet.
The refcount is taken only once tcf_exts_get_net()
is called.
So the two u32_destroy_key() calls from u32_change()
are attempting to release an invalid reference on the netns.
syzbot report:
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 0 PID: 21708 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
Modules linked in:
CPU: 0 PID: 21708 Comm: syz-executor.5 Not tainted 5.18.0-rc2-next-20220412-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
Code: 1d 14 b6 b2 09 31 ff 89 de e8 6d e9 89 fd 84 db 75 e0 e8 84 e5 89 fd 48 c7 c7 40 aa 26 8a c6 05 f4 b5 b2 09 01 e8 e5 81 2e 05 <0f> 0b eb c4 e8 68 e5 89 fd 0f b6 1d e3 b5 b2 09 31 ff 89 de e8 38
RSP: 0018:ffffc900051af1b0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000040000 RSI: ffffffff8160a0c8 RDI: fffff52000a35e28
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81604a9e R11: 0000000000000000 R12: 1ffff92000a35e3b
R13: 00000000ffffffef R14: ffff8880211a0194 R15: ffff8880577d0a00
FS: 00007f25d183e700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f19c859c028 CR3: 0000000051009000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__refcount_dec include/linux/refcount.h:344 [inline]
refcount_dec include/linux/refcount.h:359 [inline]
ref_tracker_free+0x535/0x6b0 lib/ref_tracker.c:118
netns_tracker_free include/net/net_namespace.h:327 [inline]
put_net_track include/net/net_namespace.h:341 [inline]
tcf_exts_put_net include/net/pkt_cls.h:255 [inline]
u32_destroy_key.isra.0+0xa7/0x2b0 net/sched/cls_u32.c:394
u32_change+0xe01/0x3140 net/sched/cls_u32.c:909
tc_new_tfilter+0x98d/0x2200 net/sched/cls_api.c:2148
rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:6016
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2495
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x6e2/0x800 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f25d0689049
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f25d183e168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f25d079c030 RCX: 00007f25d0689049
RDX: 0000000000000000 RSI: 0000000020000340 RDI: 0000000000000005
RBP: 00007f25d06e308d R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd0b752e3f R14: 00007f25d183e300 R15: 0000000000022000
</TASK>
Fixes: 35c55fc156d8 ("cls_u32: use tcf_exts_get_net() before call_rcu()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This is the 5.15.35 stable release
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
commit 82e31755e55fbcea6a9dfaae5fe4860ade17cbc0 upstream.
There are race conditions that may lead to UAF bugs in
ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(),
ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we call
ax25_release() to deallocate ax25_dev.
One of the UAF bugs caused by ax25_release() is shown below:
(Thread 1) | (Thread 2)
ax25_dev_device_up() //(1) |
... | ax25_kill_by_device()
ax25_bind() //(2) |
ax25_connect() | ...
ax25_std_establish_data_link() |
ax25_start_t1timer() | ax25_dev_device_down() //(3)
mod_timer(&ax25->t1timer,..) |
| ax25_release()
(wait a time) | ...
| ax25_dev_put(ax25_dev) //(4)FREE
ax25_t1timer_expiry() |
ax25->ax25_dev->values[..] //USE| ...
... |
We increase the refcount of ax25_dev in position (1) and (2), and
decrease the refcount of ax25_dev in position (3) and (4).
The ax25_dev will be freed in position (4) and be used in
ax25_t1timer_expiry().
The fail log is shown below:
==============================================================
[ 106.116942] BUG: KASAN: use-after-free in ax25_t1timer_expiry+0x1c/0x60
[ 106.116942] Read of size 8 at addr ffff88800bda9028 by task swapper/0/0
[ 106.116942] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-06123-g0905eec574
[ 106.116942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-14
[ 106.116942] Call Trace:
...
[ 106.116942] ax25_t1timer_expiry+0x1c/0x60
[ 106.116942] call_timer_fn+0x122/0x3d0
[ 106.116942] __run_timers.part.0+0x3f6/0x520
[ 106.116942] run_timer_softirq+0x4f/0xb0
[ 106.116942] __do_softirq+0x1c2/0x651
...
This patch adds del_timer_sync() in ax25_release(), which could ensure
that all timers stop before we deallocate ax25_dev.
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[OP: backport to 5.15: adjust context]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fc6d01ff9ef03b66d4a3a23b46fc3c3d8cf92009 upstream.
The previous commit 7ec02f5ac8a5 ("ax25: fix NPD bug in ax25_disconnect")
move ax25_disconnect into lock_sock() in order to prevent NPD bugs. But
there are race conditions that may lead to null pointer dereferences in
ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(),
ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we use
ax25_kill_by_device() to detach the ax25 device.
One of the race conditions that cause null pointer dereferences can be
shown as below:
(Thread 1) | (Thread 2)
ax25_connect() |
ax25_std_establish_data_link() |
ax25_start_t1timer() |
mod_timer(&ax25->t1timer,..) |
| ax25_kill_by_device()
(wait a time) | ...
| s->ax25_dev = NULL; //(1)
ax25_t1timer_expiry() |
ax25->ax25_dev->values[..] //(2)| ...
... |
We set null to ax25_cb->ax25_dev in position (1) and dereference
the null pointer in position (2).
The corresponding fail log is shown below:
===============================================================
BUG: kernel NULL pointer dereference, address: 0000000000000050
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc6-00794-g45690b7d0
RIP: 0010:ax25_t1timer_expiry+0x12/0x40
...
Call Trace:
call_timer_fn+0x21/0x120
__run_timers.part.0+0x1ca/0x250
run_timer_softirq+0x2c/0x60
__do_softirq+0xef/0x2f3
irq_exit_rcu+0xb6/0x100
sysvec_apic_timer_interrupt+0xa2/0xd0
...
This patch moves ax25_disconnect() before s->ax25_dev = NULL
and uses del_timer_sync() to delete timers in ax25_disconnect().
If ax25_disconnect() is called by ax25_kill_by_device() or
ax25->ax25_dev is NULL, the reason in ax25_disconnect() will be
equal to ENETUNREACH, it will wait all timers to stop before we
set null to s->ax25_dev in ax25_kill_by_device().
Fixes: 7ec02f5ac8a5 ("ax25: fix NPD bug in ax25_disconnect")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
[OP: backport to 5.15: adjust context]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7ec02f5ac8a5be5a3f20611731243dc5e1d9ba10 upstream.
The ax25_disconnect() in ax25_kill_by_device() is not
protected by any locks, thus there is a race condition
between ax25_disconnect() and ax25_destroy_socket().
when ax25->sk is assigned as NULL by ax25_destroy_socket(),
a NULL pointer dereference bug will occur if site (1) or (2)
dereferences ax25->sk.
ax25_kill_by_device() | ax25_release()
ax25_disconnect() | ax25_destroy_socket()
... |
if(ax25->sk != NULL) | ...
... | ax25->sk = NULL;
bh_lock_sock(ax25->sk); //(1) | ...
... |
bh_unlock_sock(ax25->sk); //(2)|
This patch moves ax25_disconnect() into lock_sock(), which can
synchronize with ax25_destroy_socket() in ax25_release().
Fail log:
===============================================================
BUG: kernel NULL pointer dereference, address: 0000000000000088
...
RIP: 0010:_raw_spin_lock+0x7e/0xd0
...
Call Trace:
ax25_disconnect+0xf6/0x220
ax25_device_event+0x187/0x250
raw_notifier_call_chain+0x5e/0x70
dev_close_many+0x17d/0x230
rollback_registered_many+0x1f1/0x950
unregister_netdevice_queue+0x133/0x200
unregister_netdev+0x13/0x20
...
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
[OP: backport to 5.15: adjust context]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5352a761308397a0e6250fdc629bb3f615b94747 upstream.
There are UAF bugs in ax25_send_control(), when we call ax25_release()
to deallocate ax25_dev. The possible race condition is shown below:
(Thread 1) | (Thread 2)
ax25_dev_device_up() //(1) |
| ax25_kill_by_device()
ax25_bind() //(2) |
ax25_connect() | ...
ax25->state = AX25_STATE_1 |
... | ax25_dev_device_down() //(3)
(Thread 3)
ax25_release() |
ax25_dev_put() //(4) FREE |
case AX25_STATE_1: |
ax25_send_control() |
alloc_skb() //USE |
The refcount of ax25_dev increases in position (1) and (2), and
decreases in position (3) and (4). The ax25_dev will be freed
before dereference sites in ax25_send_control().
The following is part of the report:
[ 102.297448] BUG: KASAN: use-after-free in ax25_send_control+0x33/0x210
[ 102.297448] Read of size 8 at addr ffff888009e6e408 by task ax25_close/602
[ 102.297448] Call Trace:
[ 102.303751] ax25_send_control+0x33/0x210
[ 102.303751] ax25_release+0x356/0x450
[ 102.305431] __sock_release+0x6d/0x120
[ 102.305431] sock_close+0xf/0x20
[ 102.305431] __fput+0x11f/0x420
[ 102.305431] task_work_run+0x86/0xd0
[ 102.307130] get_signal+0x1075/0x1220
[ 102.308253] arch_do_signal_or_restart+0x1df/0xc00
[ 102.308253] exit_to_user_mode_prepare+0x150/0x1e0
[ 102.308253] syscall_exit_to_user_mode+0x19/0x50
[ 102.308253] do_syscall_64+0x48/0x90
[ 102.308253] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 102.308253] RIP: 0033:0x405ae7
This patch defers the free operation of ax25_dev and net_device after
all corresponding dereference sites in ax25_release() to avoid UAF.
Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[OP: backport to 5.15: adjust dev_put_track()->dev_put()]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9fd75b66b8f68498454d685dc4ba13192ae069b0 upstream.
The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to
avoid UAF bugs") and commit feef318c855a ("ax25: fix UAF bugs of
net_device caused by rebinding operation") increase the refcounts of
ax25_dev and net_device in ax25_bind() and decrease the matching refcounts
in ax25_kill_by_device() in order to prevent UAF bugs, but there are
reference count leaks.
The root cause of refcount leaks is shown below:
(Thread 1) | (Thread 2)
ax25_bind() |
... |
ax25_addr_ax25dev() |
ax25_dev_hold() //(1) |
... |
dev_hold_track() //(2) |
... | ax25_destroy_socket()
| ax25_cb_del()
| ...
| hlist_del_init() //(3)
|
|
(Thread 3) |
ax25_kill_by_device() |
... |
ax25_for_each(s, &ax25_list) { |
if (s->ax25_dev == ax25_dev) //(4) |
... |
Firstly, we use ax25_bind() to increase the refcount of ax25_dev in
position (1) and increase the refcount of net_device in position (2).
Then, we use ax25_cb_del() invoked by ax25_destroy_socket() to delete
ax25_cb in hlist in position (3) before calling ax25_kill_by_device().
Finally, the decrements of refcounts in ax25_kill_by_device() will not
be executed, because no s->ax25_dev equals to ax25_dev in position (4).
This patch adds decrements of refcounts in ax25_release() and use
lock_sock() to do synchronization. If refcounts decrease in ax25_release(),
the decrements of refcounts in ax25_kill_by_device() will not be
executed and vice versa.
Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Fixes: 87563a043cef ("ax25: fix reference count leaks of ax25_dev")
Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation")
Reported-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
[OP: backport to 5.15: adjust dev_put_track()->dev_put()]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit feef318c855a361a1eccd880f33e88c460eb63b4 upstream.
The ax25_kill_by_device() will set s->ax25_dev = NULL and
call ax25_disconnect() to change states of ax25_cb and
sock, if we call ax25_bind() before ax25_kill_by_device().
However, if we call ax25_bind() again between the window of
ax25_kill_by_device() and ax25_dev_device_down(), the values
and states changed by ax25_kill_by_device() will be reassigned.
Finally, ax25_dev_device_down() will deallocate net_device.
If we dereference net_device in syscall functions such as
ax25_release(), ax25_sendmsg(), ax25_getsockopt(), ax25_getname()
and ax25_info_show(), a UAF bug will occur.
One of the possible race conditions is shown below:
(USE) | (FREE)
ax25_bind() |
| ax25_kill_by_device()
ax25_bind() |
ax25_connect() | ...
| ax25_dev_device_down()
| ...
| dev_put_track(dev, ...) //FREE
ax25_release() | ...
ax25_send_control() |
alloc_skb() //USE |
the corresponding fail log is shown below:
===============================================================
BUG: KASAN: use-after-free in ax25_send_control+0x43/0x210
...
Call Trace:
...
ax25_send_control+0x43/0x210
ax25_release+0x2db/0x3b0
__sock_release+0x6d/0x120
sock_close+0xf/0x20
__fput+0x11f/0x420
...
Allocated by task 1283:
...
__kasan_kmalloc+0x81/0xa0
alloc_netdev_mqs+0x5a/0x680
mkiss_open+0x6c/0x380
tty_ldisc_open+0x55/0x90
...
Freed by task 1969:
...
kfree+0xa3/0x2c0
device_release+0x54/0xe0
kobject_put+0xa5/0x120
tty_ldisc_kill+0x3e/0x80
...
In order to fix these UAF bugs caused by rebinding operation,
this patch adds dev_hold_track() into ax25_bind() and
corresponding dev_put_track() into ax25_kill_by_device().
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
[OP: backport to 5.15: adjust dev_put_track()->dev_put() and
dev_hold_track()->dev_hold()]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 87563a043cef044fed5db7967a75741cc16ad2b1 upstream.
The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev
to avoid UAF bugs") introduces refcount into ax25_dev, but there
are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(),
ax25_rt_add(), ax25_rt_del() and ax25_rt_opt().
This patch uses ax25_dev_put() and adjusts the position of
ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev.
Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[OP: backport to 5.15: adjust context]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d01ffb9eee4af165d83b08dd73ebdf9fe94a519b upstream.
If we dereference ax25_dev after we call kfree(ax25_dev) in
ax25_dev_device_down(), it will lead to concurrency UAF bugs.
There are eight syscall functions suffer from UAF bugs, include
ax25_bind(), ax25_release(), ax25_connect(), ax25_ioctl(),
ax25_getname(), ax25_sendmsg(), ax25_getsockopt() and
ax25_info_show().
One of the concurrency UAF can be shown as below:
(USE) | (FREE)
| ax25_device_event
| ax25_dev_device_down
ax25_bind | ...
... | kfree(ax25_dev)
ax25_fillin_cb() | ...
ax25_fillin_cb_from_dev() |
... |
The root cause of UAF bugs is that kfree(ax25_dev) in
ax25_dev_device_down() is not protected by any locks.
When ax25_dev, which there are still pointers point to,
is released, the concurrency UAF bug will happen.
This patch introduces refcount into ax25_dev in order to
guarantee that there are no pointers point to it when ax25_dev
is released.
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
[OP: backport to 5.15: adjusted context]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e3fa461d8b0e185b7da8a101fe94dfe6dd500ac0 upstream.
kongweibin reported a kernel panic in ip6_forward() when input interface
has no in6 dev associated.
The following tc commands were used to reproduce this panic:
tc qdisc del dev vxlan100 root
tc qdisc add dev vxlan100 root netem corrupt 5%
CC: stable@vger.kernel.org
Fixes: ccd27f05ae7b ("ipv6: fix 'disable_policy' for fwd packets")
Reported-by: kongweibin <kongweibin2@huawei.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6624bb34b4eb19f715db9908cca00122748765d7 upstream.
We need this to be at least two bytes, so we can access
alpha2[0] and alpha2[1]. It may be three in case some
userspace used NUL-termination since it was NLA_STRING
(and we also push it out with NUL-termination).
Cc: stable@vger.kernel.org
Reported-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220411114201.fd4a31f06541.Ie7ff4be2cf348d8cc28ed0d626fc54becf7ea799@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 773f91b2cf3f52df0d7508fdbf60f37567cdaee4 upstream.
Trond Myklebust reports an NFSD crash in svc_rdma_sendto(). Further
investigation shows that the crash occurred while NFSD was handling
a deferred request.
This patch addresses two inter-related issues that prevent request
deferral from working correctly for RPC/RDMA requests:
1. Prevent the crash by ensuring that the original
svc_rqst::rq_xprt_ctxt value is available when the request is
revisited. Otherwise svc_rdma_sendto() does not have a Receive
context available with which to construct its reply.
2. Possibly since before commit 71641d99ce03 ("svcrdma: Properly
compute .len and .buflen for received RPC Calls"),
svc_rdma_recvfrom() did not include the transport header in the
returned xdr_buf. There should have been no need for svc_defer()
and friends to save and restore that header, as of that commit.
This issue is addressed in a backport-friendly way by simply
having svc_rdma_recvfrom() set rq_xprt_hlen to zero
unconditionally, just as svc_tcp_recvfrom() does. This enables
svc_deferred_recv() to correctly reconstruct an RPC message
received via RPC/RDMA.
Reported-by: Trond Myklebust <trondmy@hammerspace.com>
Link: https://lore.kernel.org/linux-nfs/82662b7190f26fb304eb0ab1bb04279072439d4e.camel@hammerspace.com/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ef27324e2cb7bb24542d6cb2571740eefe6b00dc ]
Our detector found a concurrent use-after-free bug when detaching an
NCI device. The main reason for this bug is the unexpected scheduling
between the used delayed mechanism (timer and workqueue).
The race can be demonstrated below:
Thread-1 Thread-2
| nci_dev_up()
| nci_open_device()
| __nci_request(nci_reset_req)
| nci_send_cmd
| queue_work(cmd_work)
nci_unregister_device() |
nci_close_device() | ...
del_timer_sync(cmd_timer)[1] |
... | Worker
nci_free_device() | nci_cmd_work()
kfree(ndev)[3] | mod_timer(cmd_timer)[2]
In short, the cleanup routine thought that the cmd_timer has already
been detached by [1] but the mod_timer can re-attach the timer [2], even
it is already released [3], resulting in UAF.
This UAF is easy to trigger, crash trace by POC is like below
[ 66.703713] ==================================================================
[ 66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
[ 66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33
[ 66.703974]
[ 66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
[ 66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
[ 66.703974] Call Trace:
[ 66.703974] <TASK>
[ 66.703974] dump_stack_lvl+0x57/0x7d
[ 66.703974] print_report.cold+0x5e/0x5db
[ 66.703974] ? enqueue_timer+0x448/0x490
[ 66.703974] kasan_report+0xbe/0x1c0
[ 66.703974] ? enqueue_timer+0x448/0x490
[ 66.703974] enqueue_timer+0x448/0x490
[ 66.703974] __mod_timer+0x5e6/0xb80
[ 66.703974] ? mark_held_locks+0x9e/0xe0
[ 66.703974] ? try_to_del_timer_sync+0xf0/0xf0
[ 66.703974] ? lockdep_hardirqs_on_prepare+0x17b/0x410
[ 66.703974] ? queue_work_on+0x61/0x80
[ 66.703974] ? lockdep_hardirqs_on+0xbf/0x130
[ 66.703974] process_one_work+0x8bb/0x1510
[ 66.703974] ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 66.703974] ? pwq_dec_nr_in_flight+0x230/0x230
[ 66.703974] ? rwlock_bug.part.0+0x90/0x90
[ 66.703974] ? _raw_spin_lock_irq+0x41/0x50
[ 66.703974] worker_thread+0x575/0x1190
[ 66.703974] ? process_one_work+0x1510/0x1510
[ 66.703974] kthread+0x2a0/0x340
[ 66.703974] ? kthread_complete_and_exit+0x20/0x20
[ 66.703974] ret_from_fork+0x22/0x30
[ 66.703974] </TASK>
[ 66.703974]
[ 66.703974] Allocated by task 267:
[ 66.703974] kasan_save_stack+0x1e/0x40
[ 66.703974] __kasan_kmalloc+0x81/0xa0
[ 66.703974] nci_allocate_device+0xd3/0x390
[ 66.703974] nfcmrvl_nci_register_dev+0x183/0x2c0
[ 66.703974] nfcmrvl_nci_uart_open+0xf2/0x1dd
[ 66.703974] nci_uart_tty_ioctl+0x2c3/0x4a0
[ 66.703974] tty_ioctl+0x764/0x1310
[ 66.703974] __x64_sys_ioctl+0x122/0x190
[ 66.703974] do_syscall_64+0x3b/0x90
[ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 66.703974]
[ 66.703974] Freed by task 406:
[ 66.703974] kasan_save_stack+0x1e/0x40
[ 66.703974] kasan_set_track+0x21/0x30
[ 66.703974] kasan_set_free_info+0x20/0x30
[ 66.703974] __kasan_slab_free+0x108/0x170
[ 66.703974] kfree+0xb0/0x330
[ 66.703974] nfcmrvl_nci_unregister_dev+0x90/0xd0
[ 66.703974] nci_uart_tty_close+0xdf/0x180
[ 66.703974] tty_ldisc_kill+0x73/0x110
[ 66.703974] tty_ldisc_hangup+0x281/0x5b0
[ 66.703974] __tty_hangup.part.0+0x431/0x890
[ 66.703974] tty_release+0x3a8/0xc80
[ 66.703974] __fput+0x1f0/0x8c0
[ 66.703974] task_work_run+0xc9/0x170
[ 66.703974] exit_to_user_mode_prepare+0x194/0x1a0
[ 66.703974] syscall_exit_to_user_mode+0x19/0x50
[ 66.703974] do_syscall_64+0x48/0x90
[ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae
To fix the UAF, this patch adds flush_workqueue() to ensure the
nci_cmd_work is finished before the following del_timer_sync.
This combination will promise the timer is actually detached.
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6c6f9f31ecd47dce1d0dafca4bec8805f9bc97cd ]
Since commit 6e1acfa387b9 ("netfilter: nf_tables: validate registers
coming from userspace.") nft_parse_register can return a negative value,
but the function prototype is still returning an unsigned int.
Fixes: 6e1acfa387b9 ("netfilter: nf_tables: validate registers coming from userspace.")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8467dda0c26583547731e7f3ea73fc3856bae3bf ]
Function sctp_do_peeloff() wrongly initializes daddr of the original
socket instead of the peeled off socket, which makes getpeername()
return zeroes instead of the primary address. Initialize the new socket
instead.
Fixes: d570ee490fb1 ("[SCTP]: Correctly set daddr for IPv6 sockets during peeloff")
Signed-off-by: Petr Malat <oss@malat.biz>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/20220409063611.673193-1-oss@malat.biz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d22f4f977236f97e01255a80bca2ea93a8094fc8 ]
dev_name() was called with dev.parent as argument but without to
NULL-check it before.
Solve this by checking the pointer before the call to dev_name().
Fixes: af5f60c7e3d5 ("net/smc: allow PCI IDs as ib device names in the pnet table")
Reported-by: syzbot+03e3e228510223dabd34@syzkaller.appspotmail.com
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 05ae2fba821c4d122ab4ba3e52144e21586c4010 ]
cgroupv2 helper function ignores the already-looked up sk
and uses skb->sk instead.
Just pass sk from the calling function instead; this will
make cgroup matching work for udp and tcp in input even when
edemux did not set skb->sk already.
Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2")
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a5199b5626cd6913cf8776a835bc63d40e0686ad ]
Synchronize additions to nontrans_list of transmitting BSS with
bss_lock to avoid races. Also when cfg80211_add_nontrans_list() fails
__cfg80211_unlink_bss() needs bss_lock to be held (has lockdep assert
on bss_lock). So protect the whole block with bss_lock to avoid
races and warnings. Found during code review.
Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning")
Signed-off-by: Rameshkumar Sundaram <quic_ramess@quicinc.com>
Link: https://lore.kernel.org/r/1649668071-9370-1-git-send-email-quic_ramess@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e8a64bbaaad1f6548cec5508297bc6d45e8ab69e ]
A user may set the SO_TXTIME socket option to ensure a packet is send
at a given time. The taprio scheduler has to confirm, that it is allowed
to send a packet at that given time, by a check against the packet time
schedule. The scheduler drop the packet, if the gates are closed at the
given send time.
The check, if SO_TXTIME is set, may fail since sk_flags are part of an
union and the union is used otherwise. This happen, if a socket is not
a full socket, like a request socket for example.
Add a check to verify, if the union is used for sk_flags.
Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e65812fd22eba32f11abe28cb377cbd64cfb1ba0 ]
Currently, when inserting a new filter that needs to sit at the head
of chain 0, it will first update the heads pointer on all devices using
the (shared) block, and only then complete the initialization of the new
element so that it has a "next" element.
This can lead to a situation that the chain 0 head is propagated to
another CPU before the "next" initialization is done. When this race
condition is triggered, packets being matched on that CPU will simply
miss all other filters, and will flow through the stack as if there were
no other filters installed. If the system is using OVS + TC, such
packets will get handled by vswitchd via upcall, which results in much
higher latency and reordering. For other applications it may result in
packet drops.
This is reproducible with a tc only setup, but it varies from system to
system. It could be reproduced with a shared block amongst 10 veth
tunnels, and an ingress filter mirroring packets to another veth.
That's because using the last added veth tunnel to the shared block to
do the actual traffic, it makes the race window bigger and easier to
trigger.
The fix is rather simple, to just initialize the next pointer of the new
filter instance (tp) before propagating the head change.
The fixes tag is pointing to the original code though this issue should
only be observed when using it unlocked.
Fixes: 2190d1d0944f ("net: sched: introduce helpers to work with filter chains")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/b97d5f4eaffeeb9d058155bcab63347527261abf.1649341369.git.marcelo.leitner@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2105f700b53c24aa48b65c15652acc386044d26a ]
A tc flower filter matching TCA_FLOWER_KEY_VLAN_ETH_TYPE is expected to
match the L2 ethertype following the first VLAN header, as confirmed by
linked discussion with the maintainer. However, such rule also matches
packets that have additional second VLAN header, even though filter has
both eth_type and vlan_ethtype set to "ipv4". Looking at the code this
seems to be mostly an artifact of the way flower uses flow dissector.
First, even though looking at the uAPI eth_type and vlan_ethtype appear
like a distinct fields, in flower they are all mapped to the same
key->basic.n_proto. Second, flow dissector skips following VLAN header as
no keys for FLOW_DISSECTOR_KEY_CVLAN are set and eventually assigns the
value of n_proto to last parsed header. With these, such filters ignore any
headers present between first VLAN header and first "non magic"
header (ipv4 in this case) that doesn't result
FLOW_DISSECT_RET_PROTO_AGAIN.
Fix the issue by extending flow dissector VLAN key structure with new
'vlan_eth_type' field that matches first ethertype following previously
parsed VLAN header. Modify flower classifier to set the new
flow_dissector_key_vlan->vlan_eth_type with value obtained from
TCA_FLOWER_KEY_VLAN_ETH_TYPE/TCA_FLOWER_KEY_CVLAN_ETH_TYPE uAPIs.
Link: https://lore.kernel.org/all/Yjhgi48BpTGh6dig@nanopsycho/
Fixes: 9399ae9a6cb2 ("net_sched: flower: Add vlan support")
Fixes: d64efd0926ba ("net/sched: flower: Add supprt for matching on QinQ vlan headers")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
This is the 5.15.34 stable release
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
|
commit 89f42494f92f448747bd8a7ab1ae8b5d5520577d upstream.
Avoid socket state races due to repeated calls to ->connect() using the
same socket. If connect() returns 0 due to the connection having
completed, but we are in fact in a closing state, then we may leave the
XPRT_CONNECTING flag set on the transport.
Reported-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
Fixes: 3be232f11a3c ("SUNRPC: Prevent immediate close+reconnect")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9a69e2b385f443f244a7e8b8bcafe5ccfb0866b4 upstream.
remote_port is another case of a BPF context field documented as a 32-bit
value in network byte order for which the BPF context access converter
generates a load of a zero-padded 16-bit integer in network byte order.
First such case was dst_port in bpf_sock which got addressed in commit
4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit wide").
Loading 4-bytes from the remote_port offset and converting the value with
bpf_ntohl() leads to surprising results, as the expected value is shifted
by 16 bits.
Reduce the confusion by splitting the field in two - a 16-bit field holding
a big-endian integer, and a 16-bit zero-padding anonymous field that
follows it.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220209184333.654927-2-jakub@cloudflare.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b70a5cc045197aad9c159042621baf3c015f6cc7 upstream.
In commit ea785a1a573b("net/smc: Send directly when
TCP_CORK is cleared"), we don't use delayed work
to implement cork.
This patch use the same algorithm, removes the
delayed work when setting TCP_NODELAY and send
directly in setsockopt(). This also makes the
TCP_NODELAY the same as TCP.
Cc: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3be232f11a3cc9b0ef0795e39fa11bdb8e422a06 upstream.
If we have already set up the socket and are waiting for it to connect,
then don't immediately close and retry.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b056fa070814897be32d83b079dbc311375588e7 ]
The allocation is done with GFP_KERNEL, but it could still fail in a low
memory situation.
Fixes: 4a85a6a3320b ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9d82819d5b065348ce623f196bf601028e22ed00 ]
We need to handle ENFILE, ENOBUFS, and ENOMEM, because
xprt_wake_pending_tasks() can be called with any one of these due to
socket creation failures.
Fixes: b61d59fffd3e ("SUNRPC: xs_tcp_connect_worker{4,6}: merge common code")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d3c15033b240767d0287f1c4a529cbbe2d5ded8a ]
Both call_transmit() and call_bc_transmit() can now return ENOMEM, so
let's make sure that we handle the errors gracefully.
Fixes: 0472e4766049 ("SUNRPC: Convert socket page send code to use iov_iter()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2e8702cc0cfa1080f29fd64003c00a3e24ac38de ]
bpf_tcp_gen_syncookie looks at the IP version in the IP header and
validates the address family of the socket. It supports IPv4 packets in
AF_INET6 dual-stack sockets.
On the other hand, bpf_tcp_check_syncookie looks only at the address
family of the socket, ignoring the real IP version in headers, and
validates only the packet size. This implementation has some drawbacks:
1. Packets are not validated properly, allowing a BPF program to trick
bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4
socket.
2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end
up receiving a SYNACK with the cookie, but the following ACK gets
dropped.
This patch fixes these issues by changing the checks in
bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP
version from the header is taken into account, and it is validated
properly with address family.
Fixes: 399040847084 ("bpf: add helper to check for a valid SYN cookie")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Arthur Fabre <afabre@cloudflare.com>
Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1946014ca3b19be9e485e780e862c375c6f98bad ]
Current code can lead to the following race:
CPU0 CPU1
rxrpc_exit_net()
rxrpc_peer_keepalive_worker()
if (rxnet->live)
rxnet->live = false;
del_timer_sync(&rxnet->peer_keepalive_timer);
timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay);
cancel_work_sync(&rxnet->peer_keepalive_work);
rxrpc_exit_net() exits while peer_keepalive_timer is still armed,
leading to use-after-free.
syzbot report was:
ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0
WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Modules linked in:
CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c0207623 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0
R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__debug_check_no_obj_freed lib/debugobjects.c:992 [inline]
debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023
kfree+0xd6/0x310 mm/slab.c:3809
ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176
ops_free_list net/core/net_namespace.c:174 [inline]
cleanup_net+0x591/0xb00 net/core/net_namespace.c:598
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
</TASK>
Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: linux-afs@lists.infradead.org
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|