| Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 605b52497bf89b3b154674deb135da98f916e390 ]
rlb_clear_slave intentionally keeps RLB hash-table entries on
the rx_hashtbl_used_head list with slave set to NULL when no
replacement slave is available. However, bond_debug_rlb_hash_show
visites client_info->slave without checking if it's NULL.
Other used-list iterators in bond_alb.c already handle this NULL-slave
state safely:
- rlb_update_client returns early on !client_info->slave
- rlb_req_update_slave_clients, rlb_clear_slave, and rlb_rebalance
compare slave values before visiting
- lb_req_update_subnet_clients continues if slave is NULL
The following NULL deref crash can be trigger in
bond_debug_rlb_hash_show:
[ 1.289791] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1.292058] RIP: 0010:bond_debug_rlb_hash_show (drivers/net/bonding/bond_debugfs.c:41)
[ 1.293101] RSP: 0018:ffffc900004a7d00 EFLAGS: 00010286
[ 1.293333] RAX: 0000000000000000 RBX: ffff888102b48200 RCX: ffff888102b48204
[ 1.293631] RDX: ffff888102b48200 RSI: ffffffff839daad5 RDI: ffff888102815078
[ 1.293924] RBP: ffff888102815078 R08: ffff888102b4820e R09: 0000000000000000
[ 1.294267] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888100f929c0
[ 1.294564] R13: ffff888100f92a00 R14: 0000000000000001 R15: ffffc900004a7ed8
[ 1.294864] FS: 0000000001395380(0000) GS:ffff888196e75000(0000) knlGS:0000000000000000
[ 1.295239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.295480] CR2: 0000000000000000 CR3: 0000000102adc004 CR4: 0000000000772ef0
[ 1.295897] Call Trace:
[ 1.296134] seq_read_iter (fs/seq_file.c:231)
[ 1.296341] seq_read (fs/seq_file.c:164)
[ 1.296493] full_proxy_read (fs/debugfs/file.c:378 (discriminator 1))
[ 1.296658] vfs_read (fs/read_write.c:572)
[ 1.296981] ksys_read (fs/read_write.c:717)
[ 1.297132] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[ 1.297325] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Add a NULL check and print "(none)" for entries with no assigned slave.
Fixes: caafa84251b88 ("bonding: add the debugfs interface to see RLB hash table")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Link: https://patch.msgid.link/20260317005034.1888794-1-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 30021e969d48e5819d5ae56936c2f34c0f7ce997 ]
When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never
initialized because inet6_init() exits before ndisc_init() is called
which initializes it. If bonding ARP/NS validation is enabled, an IPv6
NS/NA packet received on a slave can reach bond_validate_na(), which
calls bond_has_this_ip6(). That path calls ipv6_chk_addr() and can
crash in __ipv6_chk_addr_and_flags().
BUG: kernel NULL pointer dereference, address: 00000000000005d8
Oops: Oops: 0000 [#1] SMP NOPTI
RIP: 0010:__ipv6_chk_addr_and_flags+0x69/0x170
Call Trace:
<IRQ>
ipv6_chk_addr+0x1f/0x30
bond_validate_na+0x12e/0x1d0 [bonding]
? __pfx_bond_handle_frame+0x10/0x10 [bonding]
bond_rcv_validate+0x1a0/0x450 [bonding]
bond_handle_frame+0x5e/0x290 [bonding]
? srso_alias_return_thunk+0x5/0xfbef5
__netif_receive_skb_core.constprop.0+0x3e8/0xe50
? srso_alias_return_thunk+0x5/0xfbef5
? update_cfs_rq_load_avg+0x1a/0x240
? srso_alias_return_thunk+0x5/0xfbef5
? __enqueue_entity+0x5e/0x240
__netif_receive_skb_one_core+0x39/0xa0
process_backlog+0x9c/0x150
__napi_poll+0x30/0x200
? srso_alias_return_thunk+0x5/0xfbef5
net_rx_action+0x338/0x3b0
handle_softirqs+0xc9/0x2a0
do_softirq+0x42/0x60
</IRQ>
<TASK>
__local_bh_enable_ip+0x62/0x70
__dev_queue_xmit+0x2d3/0x1000
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? packet_parse_headers+0x10a/0x1a0
packet_sendmsg+0x10da/0x1700
? kick_pool+0x5f/0x140
? srso_alias_return_thunk+0x5/0xfbef5
? __queue_work+0x12d/0x4f0
__sys_sendto+0x1f3/0x220
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x101/0xf80
? exc_page_fault+0x6e/0x170
? srso_alias_return_thunk+0x5/0xfbef5
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Fix this by checking ipv6_mod_enabled() before dispatching IPv6 packets to
bond_na_rcv(). If IPv6 is disabled, return early from bond_rcv_validate()
and avoid the path to ipv6_chk_addr().
Suggested-by: Fernando Fernandez Mancera <fmancera@suse.de>
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-2-e2677e85628c@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3348be7978f450ede0c308a4e8416ac716cf1015 ]
Before the fixed commit, we check slave->new_link during commit
state, which values are only BOND_LINK_{NOCHANGE, UP, DOWN}. After
the commit, we start using slave->link_new_state, which state also could
be BOND_LINK_{FAIL, BACK}.
For example, when we set updelay/downdelay, after a failover,
the slave->link_new_state could be set to BOND_LINK_{FAIL, BACK} in
bond_miimon_inspect(). And later in bond_miimon_commit(), it will treat
it as invalid and print an error, which would cause confusion for users.
[ 106.440254] bond0: (slave veth2): link status down for interface, disabling it in 200 ms
[ 106.440265] bond0: (slave veth2): invalid new link 1 on slave
[ 106.648276] bond0: (slave veth2): link status definitely down, disabling slave
[ 107.480271] bond0: (slave veth2): link status up, enabling it in 200 ms
[ 107.480288] bond0: (slave veth2): invalid new link 3 on slave
[ 107.688302] bond0: (slave veth2): link status definitely up, 10000 Mbps full duplex
Let's handle BOND_LINK_{FAIL, BACK} as valid link states.
Fixes: 1899bb325149 ("bonding: fix state transition issue in link monitoring")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260304-b4-bond_updelay-v1-2-f72eb2e454d0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 479d589b40b836442bbdadc3fdb37f001bb67f26 ]
bond_option_mode_set() already rejects mode changes that would make a
loaded XDP program incompatible via bond_xdp_check(). However,
bond_option_xmit_hash_policy_set() has no such guard.
For 802.3ad and balance-xor modes, bond_xdp_check() returns false when
xmit_hash_policy is vlan+srcmac, because the 802.1q payload is usually
absent due to hardware offload. This means a user can:
1. Attach a native XDP program to a bond in 802.3ad/balance-xor mode
with a compatible xmit_hash_policy (e.g. layer2+3).
2. Change xmit_hash_policy to vlan+srcmac while XDP remains loaded.
This leaves bond->xdp_prog set but bond_xdp_check() now returning false
for the same device. When the bond is later destroyed, dev_xdp_uninstall()
calls bond_xdp_set(dev, NULL, NULL) to remove the program, which hits
the bond_xdp_check() guard and returns -EOPNOTSUPP, triggering:
WARN_ON(dev_xdp_install(dev, mode, bpf_op, NULL, 0, NULL))
Fix this by rejecting xmit_hash_policy changes to vlan+srcmac when an
XDP program is loaded on a bond in 802.3ad or balance-xor mode.
commit 39a0876d595b ("net, bonding: Disallow vlan+srcmac with XDP")
introduced bond_xdp_check() which returns false for 802.3ad/balance-xor
modes when xmit_hash_policy is vlan+srcmac. The check was wired into
bond_xdp_set() to reject XDP attachment with an incompatible policy, but
the symmetric path -- preventing xmit_hash_policy from being changed to an
incompatible value after XDP is already loaded -- was left unguarded in
bond_option_xmit_hash_policy_set().
Note:
commit 094ee6017ea0 ("bonding: check xdp prog when set bond mode")
later added a similar guard to bond_option_mode_set(), but
bond_option_xmit_hash_policy_set() remained unprotected.
Reported-by: syzbot+5a287bcdc08104bc3132@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6995aff6.050a0220.2eeac1.014e.GAE@google.com/T/
Fixes: 39a0876d595b ("net, bonding: Disallow vlan+srcmac with XDP")
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://patch.msgid.link/20260226080306.98766-2-jiayuan.chen@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e6834a4c474697df23ab9948fd3577b26bf48656 ]
The ALB RX path may access rx_hashtbl concurrently with bond
teardown. During rapid bond up/down cycles, rlb_deinitialize()
frees rx_hashtbl while RX handlers are still running, leading
to a null pointer dereference detected by KASAN.
However, the root cause is that rlb_arp_recv() can still be accessed
after setting recv_probe to NULL, which is actually a use-after-free
(UAF) issue. That is the reason for using the referenced commit in the
Fixes tag.
[ 214.174138] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001d: 0000 [#1] SMP KASAN PTI
[ 214.186478] KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
[ 214.194933] CPU: 30 UID: 0 PID: 2375 Comm: ping Kdump: loaded Not tainted 6.19.0-rc8+ #2 PREEMPT(voluntary)
[ 214.205907] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.14.0 01/14/2022
[ 214.214357] RIP: 0010:rlb_arp_recv+0x505/0xab0 [bonding]
[ 214.220320] Code: 0f 85 2b 05 00 00 48 b8 00 00 00 00 00 fc ff df 40 0f b6 ed 48 c1 e5 06 49 03 ad 78 01 00 00 48 8d 7d 28 48 89 fa 48 c1 ea 03 <0f> b6
04 02 84 c0 74 06 0f 8e 12 05 00 00 80 7d 28 00 0f 84 8c 00
[ 214.241280] RSP: 0018:ffffc900073d8870 EFLAGS: 00010206
[ 214.247116] RAX: dffffc0000000000 RBX: ffff888168556822 RCX: ffff88816855681e
[ 214.255082] RDX: 000000000000001d RSI: dffffc0000000000 RDI: 00000000000000e8
[ 214.263048] RBP: 00000000000000c0 R08: 0000000000000002 R09: ffffed11192021c8
[ 214.271013] R10: ffff8888c9010e43 R11: 0000000000000001 R12: 1ffff92000e7b119
[ 214.278978] R13: ffff8888c9010e00 R14: ffff888168556822 R15: ffff888168556810
[ 214.286943] FS: 00007f85d2d9cb80(0000) GS:ffff88886ccb3000(0000) knlGS:0000000000000000
[ 214.295966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 214.302380] CR2: 00007f0d047b5e34 CR3: 00000008a1c2e002 CR4: 00000000001726f0
[ 214.310347] Call Trace:
[ 214.313070] <IRQ>
[ 214.315318] ? __pfx_rlb_arp_recv+0x10/0x10 [bonding]
[ 214.320975] bond_handle_frame+0x166/0xb60 [bonding]
[ 214.326537] ? __pfx_bond_handle_frame+0x10/0x10 [bonding]
[ 214.332680] __netif_receive_skb_core.constprop.0+0x576/0x2710
[ 214.339199] ? __pfx_arp_process+0x10/0x10
[ 214.343775] ? sched_balance_find_src_group+0x98/0x630
[ 214.349513] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 214.356513] ? arp_rcv+0x307/0x690
[ 214.360311] ? __pfx_arp_rcv+0x10/0x10
[ 214.364499] ? __lock_acquire+0x58c/0xbd0
[ 214.368975] __netif_receive_skb_one_core+0xae/0x1b0
[ 214.374518] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 214.380743] ? lock_acquire+0x10b/0x140
[ 214.385026] process_backlog+0x3f1/0x13a0
[ 214.389502] ? process_backlog+0x3aa/0x13a0
[ 214.394174] __napi_poll.constprop.0+0x9f/0x370
[ 214.399233] net_rx_action+0x8c1/0xe60
[ 214.403423] ? __pfx_net_rx_action+0x10/0x10
[ 214.408193] ? lock_acquire.part.0+0xbd/0x260
[ 214.413058] ? sched_clock_cpu+0x6c/0x540
[ 214.417540] ? mark_held_locks+0x40/0x70
[ 214.421920] handle_softirqs+0x1fd/0x860
[ 214.426302] ? __pfx_handle_softirqs+0x10/0x10
[ 214.431264] ? __neigh_event_send+0x2d6/0xf50
[ 214.436131] do_softirq+0xb1/0xf0
[ 214.439830] </IRQ>
The issue is reproducible by repeatedly running
ip link set bond0 up/down while receiving ARP messages, where
rlb_arp_recv() can race with rlb_deinitialize() and dereference
a freed rx_hashtbl entry.
Fix this by setting recv_probe to NULL and then calling
synchronize_net() to wait for any concurrent RX processing to finish.
This ensures that no RX handler can access rx_hashtbl after it is freed
in bond_alb_deinitialize().
Reported-by: Liang Li <liali@redhat.com>
Fixes: 3aba891dde38 ("bonding: move processing of recv handlers into handle_frame()")
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260218060919.101574-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 48dec8d88af96039a4a17b8c2f148f2a4066e195 ]
bond_update_speed_duplex() first set speed/duplex to unknown and
then asks slave driver for current speed/duplex. Since getting
speed/duplex might take longer there is a race, where this false state
is visible by /proc/net/bonding. With commit 691b2bf14946 ("bonding:
update port speed when getting bond speed") this race gets more visible,
if user space is calling ethtool on a regular base.
Fix this by only setting speed/duplex to unknown, if link speed is
really unknown/unusable.
Fixes: 98f41f694f46 ("bonding:update speed/duplex for NETDEV_CHANGE")
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260203141153.51581-1-tbogendoerfer@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f6c3665b6dc53c3ab7d31b585446a953a74340ef ]
slave->last_rx and slave->target_last_arp_rx[...] can be read and written
locklessly. Add READ_ONCE() and WRITE_ONCE() annotations.
syzbot reported:
BUG: KCSAN: data-race in bond_rcv_validate / bond_rcv_validate
write to 0xffff888149f0d428 of 8 bytes by interrupt on cpu 1:
bond_rcv_validate+0x202/0x7a0 drivers/net/bonding/bond_main.c:3335
bond_handle_frame+0xde/0x5e0 drivers/net/bonding/bond_main.c:1533
__netif_receive_skb_core+0x5b1/0x1950 net/core/dev.c:6039
__netif_receive_skb_one_core net/core/dev.c:6150 [inline]
__netif_receive_skb+0x59/0x270 net/core/dev.c:6265
netif_receive_skb_internal net/core/dev.c:6351 [inline]
netif_receive_skb+0x4b/0x2d0 net/core/dev.c:6410
...
write to 0xffff888149f0d428 of 8 bytes by interrupt on cpu 0:
bond_rcv_validate+0x202/0x7a0 drivers/net/bonding/bond_main.c:3335
bond_handle_frame+0xde/0x5e0 drivers/net/bonding/bond_main.c:1533
__netif_receive_skb_core+0x5b1/0x1950 net/core/dev.c:6039
__netif_receive_skb_one_core net/core/dev.c:6150 [inline]
__netif_receive_skb+0x59/0x270 net/core/dev.c:6265
netif_receive_skb_internal net/core/dev.c:6351 [inline]
netif_receive_skb+0x4b/0x2d0 net/core/dev.c:6410
br_netif_receive_skb net/bridge/br_input.c:30 [inline]
NF_HOOK include/linux/netfilter.h:318 [inline]
...
value changed: 0x0000000100005365 -> 0x0000000100005366
Fixes: f5b2b966f032 ("[PATCH] bonding: Validate probe replies in ARP monitor")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://patch.msgid.link/20260122162914.2299312-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5f9b329096596b7e53e07d041d7fca4cbe1be752 ]
After 3cbf4ffba5ee ("net: plumb network namespace into __skb_flow_dissect")
we have to provide a net pointer to __skb_flow_dissect(),
either via skb->dev, skb->sk, or a user provided pointer.
In the following case, syzbot was able to cook a bare skb.
WARNING: net/core/flow_dissector.c:1131 at __skb_flow_dissect+0xb57/0x68b0 net/core/flow_dissector.c:1131, CPU#1: syz.2.1418/11053
Call Trace:
<TASK>
bond_flow_dissect drivers/net/bonding/bond_main.c:4093 [inline]
__bond_xmit_hash+0x2d7/0xba0 drivers/net/bonding/bond_main.c:4157
bond_xmit_hash_xdp drivers/net/bonding/bond_main.c:4208 [inline]
bond_xdp_xmit_3ad_xor_slave_get drivers/net/bonding/bond_main.c:5139 [inline]
bond_xdp_get_xmit_slave+0x1fd/0x710 drivers/net/bonding/bond_main.c:5515
xdp_master_redirect+0x13f/0x2c0 net/core/filter.c:4388
bpf_prog_run_xdp include/net/xdp.h:700 [inline]
bpf_test_run+0x6b2/0x7d0 net/bpf/test_run.c:421
bpf_prog_test_run_xdp+0x795/0x10e0 net/bpf/test_run.c:1390
bpf_prog_test_run+0x2c7/0x340 kernel/bpf/syscall.c:4703
__sys_bpf+0x562/0x860 kernel/bpf/syscall.c:6182
__do_sys_bpf kernel/bpf/syscall.c:6274 [inline]
__se_sys_bpf kernel/bpf/syscall.c:6272 [inline]
__x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:6272
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
Fixes: 58deb77cc52d ("bonding: balance ICMP echoes in layer3+4 mode")
Reported-by: syzbot+c46409299c70a221415e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/696faa23.050a0220.4cb9c.001f.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Matteo Croce <mcroce@redhat.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260120161744.1893263-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c84fcb79e5dbde0b8d5aeeaf04282d2149aebcf6 ]
BOND_MODE_8023AD makes sense for ARPHRD_ETHER only.
syzbot reported:
BUG: KASAN: global-out-of-bounds in __hw_addr_create net/core/dev_addr_lists.c:63 [inline]
BUG: KASAN: global-out-of-bounds in __hw_addr_add_ex+0x25d/0x760 net/core/dev_addr_lists.c:118
Read of size 16 at addr ffffffff8bf94040 by task syz.1.3580/19497
CPU: 1 UID: 0 PID: 19497 Comm: syz.1.3580 Tainted: G L syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:-1 [inline]
kasan_check_range+0x2b0/0x2c0 mm/kasan/generic.c:200
__asan_memcpy+0x29/0x70 mm/kasan/shadow.c:105
__hw_addr_create net/core/dev_addr_lists.c:63 [inline]
__hw_addr_add_ex+0x25d/0x760 net/core/dev_addr_lists.c:118
__dev_mc_add net/core/dev_addr_lists.c:868 [inline]
dev_mc_add+0xa1/0x120 net/core/dev_addr_lists.c:886
bond_enslave+0x2b8b/0x3ac0 drivers/net/bonding/bond_main.c:2180
do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2963
do_setlink+0xcf0/0x41c0 net/core/rtnetlink.c:3165
rtnl_changelink net/core/rtnetlink.c:3776 [inline]
__rtnl_newlink net/core/rtnetlink.c:3935 [inline]
rtnl_newlink+0x161c/0x1c90 net/core/rtnetlink.c:4072
rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6958
netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2550
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x82f/0x9e0 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg+0x21c/0x270 net/socket.c:742
____sys_sendmsg+0x505/0x820 net/socket.c:2592
___sys_sendmsg+0x21f/0x2a0 net/socket.c:2646
__sys_sendmsg+0x164/0x220 net/socket.c:2678
do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline]
__do_fast_syscall_32+0x1dc/0x560 arch/x86/entry/syscall_32.c:307
do_fast_syscall_32+0x34/0x80 arch/x86/entry/syscall_32.c:332
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
</TASK>
The buggy address belongs to the variable:
lacpdu_mcast_addr+0x0/0x40
Fixes: 872254dd6b1f ("net/bonding: Enable bonding to enslave non ARPHRD_ETHER")
Reported-by: syzbot+9c081b17773615f24672@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6966946b.a70a0220.245e30.0002.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20260113191201.3970737-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 094ee6017ea09c11d6af187935a949df32803ce0 ]
Following operations can trigger a warning[1]:
ip netns add ns1
ip netns exec ns1 ip link add bond0 type bond mode balance-rr
ip netns exec ns1 ip link set dev bond0 xdp obj af_xdp_kern.o sec xdp
ip netns exec ns1 ip link set bond0 type bond mode broadcast
ip netns del ns1
When delete the namespace, dev_xdp_uninstall() is called to remove xdp
program on bond dev, and bond_xdp_set() will check the bond mode. If bond
mode is changed after attaching xdp program, the warning may occur.
Some bond modes (broadcast, etc.) do not support native xdp. Set bond mode
with xdp program attached is not good. Add check for xdp program when set
bond mode.
[1]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 11 at net/core/dev.c:9912 unregister_netdevice_many_notify+0x8d9/0x930
Modules linked in:
CPU: 0 UID: 0 PID: 11 Comm: kworker/u4:0 Not tainted 6.14.0-rc4 #107
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:unregister_netdevice_many_notify+0x8d9/0x930
Code: 00 00 48 c7 c6 6f e3 a2 82 48 c7 c7 d0 b3 96 82 e8 9c 10 3e ...
RSP: 0018:ffffc90000063d80 EFLAGS: 00000282
RAX: 00000000ffffffa1 RBX: ffff888004959000 RCX: 00000000ffffdfff
RDX: 0000000000000000 RSI: 00000000ffffffea RDI: ffffc90000063b48
RBP: ffffc90000063e28 R08: ffffffff82d39b28 R09: 0000000000009ffb
R10: 0000000000000175 R11: ffffffff82d09b40 R12: ffff8880049598e8
R13: 0000000000000001 R14: dead000000000100 R15: ffffc90000045000
FS: 0000000000000000(0000) GS:ffff888007a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000d406b60 CR3: 000000000483e000 CR4: 00000000000006f0
Call Trace:
<TASK>
? __warn+0x83/0x130
? unregister_netdevice_many_notify+0x8d9/0x930
? report_bug+0x18e/0x1a0
? handle_bug+0x54/0x90
? exc_invalid_op+0x18/0x70
? asm_exc_invalid_op+0x1a/0x20
? unregister_netdevice_many_notify+0x8d9/0x930
? bond_net_exit_batch_rtnl+0x5c/0x90
cleanup_net+0x237/0x3d0
process_one_work+0x163/0x390
worker_thread+0x293/0x3b0
? __pfx_worker_thread+0x10/0x10
kthread+0xec/0x1e0
? __pfx_kthread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
---[ end trace 0000000000000000 ]---
Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Acked-by: Jussi Maki <joamaki@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250321044852.1086551-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Rajani Kantha <681739313@139.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 22ccb684c1cae37411450e6e86a379cd3c29cb8f ]
Bonding only supports native XDP for specific modes, which can lead to
confusion for users regarding why XDP loads successfully at times and
fails at others. This patch enhances error handling by returning detailed
error messages, providing users with clearer insights into the specific
reasons for the failure when loading native XDP.
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241021031211.814-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Rajani Kantha <681739313@139.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 10843e1492e474c02b91314963161731fa92af91 upstream.
If the send_peer_notif counter and the peer event notify are not synchronized.
It may cause problems such as the loss or dup of peer notify event.
Before this patch:
- If should_notify_peers is true and the lock for send_peer_notif-- fails, peer
event may be sent again in next mii_monitor loop, because should_notify_peers
is still true.
- If should_notify_peers is true and the lock for send_peer_notif-- succeeded,
but the lock for peer event fails, the peer event will be lost.
This patch locks the RTNL for send_peer_notif, events, and commit simultaneously.
Fixes: 07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications")
Cc: Jay Vosburgh <jv@jvosburgh.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Vincent Bernat <vincent@bernat.ch>
Cc: <stable@vger.kernel.org>
Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20251021050933.46412-1-tonghao@bamaicloud.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a8ba87f04ca9cdec06776ce92dce1395026dc3bb ]
Unlike IPv4, IPv6 routing strictly requires the source address to be valid
on the outgoing interface. If the NS target is set to a remote VLAN interface,
and the source address is also configured on a VLAN over a bond interface,
setting the oif to the bond device will fail to retrieve the correct
destination route.
Fix this by not setting the oif to the bond device when retrieving the NS
target destination. This allows the correct destination device (the VLAN
interface) to be determined, so that bond_verify_device_path can return the
proper VLAN tags for sending NS messages.
Reported-by: David Wilder <wilder@us.ibm.com>
Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/
Suggested-by: Jay Vosburgh <jv@jvosburgh.net>
Tested-by: David Wilder <wilder@us.ibm.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250916080127.430626-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 35ae4e86292ef7dfe4edbb9942955c884e984352 ]
After commit 5c3bf6cba791 ("bonding: assign random address if device
address is same as bond"), bonding will erroneously randomize the MAC
address of the first interface added to the bond if fail_over_mac =
follow.
Correct this by additionally testing for the bond being empty before
randomizing the MAC.
Fixes: 5c3bf6cba791 ("bonding: assign random address if device address is same as bond")
Reported-by: Qiuling Ren <qren@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250910024336.400253-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
LACPDU
[ Upstream commit 0599640a21e98f0d6a3e9ff85c0a687c90a8103b ]
When `lacp_active` is set to `off`, the bond operates in passive mode, meaning
it only "speaks when spoken to." However, the current kernel implementation
only sends an LACPDU in response when the partner's state changes.
As a result, once LACP negotiation succeeds, the actor stops sending LACPDUs
until the partner times out and sends an "expired" LACPDU. This causes
continuous LACP state flapping.
According to IEEE 802.1AX-2014, 6.4.13 Periodic Transmission machine. The
values of Partner_Oper_Port_State.LACP_Activity and
Actor_Oper_Port_State.LACP_Activity determine whether periodic transmissions
take place. If either or both parameters are set to Active LACP, then periodic
transmissions occur; if both are set to Passive LACP, then periodic
transmissions do not occur.
To comply with this, we remove the `!bond->params.lacp_active` check in
`ad_periodic_machine()`. Instead, we initialize the actor's port's
`LACP_STATE_LACP_ACTIVITY` state based on `lacp_active` setting.
Additionally, we avoid setting the partner's state to
`LACP_STATE_LACP_ACTIVITY` in the EXPIRED state, since we should not assume
the partner is active by default.
This ensures that in passive mode, the bond starts sending periodic LACPDUs
after receiving one from the partner, and avoids flapping due to inactivity.
Fixes: 3a755cd8b7c6 ("bonding: add new option lacp_active")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250815062000.22220-3-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 240fd405528bbf7fafa0559202ca7aa524c9cd96 ]
Add support for the independent control state machine per IEEE
802.1AX-2008 5.4.15 in addition to the existing implementation of the
coupled control state machine.
Introduces two new states, AD_MUX_COLLECTING and AD_MUX_DISTRIBUTING in
the LACP MUX state machine for separated handling of an initial
Collecting state before the Collecting and Distributing state. This
enables a port to be in a state where it can receive incoming packets
while not still distributing. This is useful for reducing packet loss when
a port begins distributing before its partner is able to collect.
Added new functions such as bond_set_slave_tx_disabled_flags and
bond_set_slave_rx_enabled_flags to precisely manage the port's collecting
and distributing states. Previously, there was no dedicated method to
disable TX while keeping RX enabled, which this patch addresses.
Note that the regular flow process in the kernel's bonding driver remains
unaffected by this patch. The extension requires explicit opt-in by the
user (in order to ensure no disruptions for existing setups) via netlink
support using the new bonding parameter coupled_control. The default value
for coupled_control is set to 1 so as to preserve existing behaviour.
Signed-off-by: Aahil Awatramani <aahila@google.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240202175858.1573852-1-aahila@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 0599640a21e9 ("bonding: send LACPDUs periodically in passive mode after receiving partner's LACPDU")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b64d035f77b1f02ab449393342264b44950a75ae ]
The port's actor_oper_port_state activity flag should be updated immediately
after changing the lacp_active option to reflect the current mode correctly.
Fixes: 3a755cd8b7c6 ("bonding: add new option lacp_active")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250815062000.22220-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5c3bf6cba7911f470afd748606be5c03a9512fcc ]
This change addresses a MAC address conflict issue in failover scenarios,
similar to the problem described in commit a951bc1e6ba5 ("bonding: correct
the MAC address for 'follow' fail_over_mac policy").
In fail_over_mac=follow mode, the bonding driver expects the formerly active
slave to swap MAC addresses with the newly active slave during failover.
However, under certain conditions, two slaves may end up with the same MAC
address, which breaks this policy:
1) ip link set eth0 master bond0
-> bond0 adopts eth0's MAC address (MAC0).
2) ip link set eth1 master bond0
-> eth1 is added as a backup with its own MAC (MAC1).
3) ip link set eth0 nomaster
-> eth0 is released and restores its MAC (MAC0).
-> eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.
4) ip link set eth0 master bond0
-> eth0 is re-added to bond0, now both eth0 and eth1 have MAC0.
This results in a MAC address conflict and violates the expected behavior
of the failover policy.
To fix this, we assign a random MAC address to any newly added slave if
its current MAC address matches that of the bond. The original (permanent)
MAC address is saved and will be restored when the device is released
from the bond.
This ensures that each slave has a unique MAC address during failover
transitions, preserving the integrity of the fail_over_mac=follow policy.
Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 28d68d396a1cd21591e8c6d74afbde33a7ea107e ]
Normally, a bond uses the MAC address of the first added slave as the bond’s
MAC address. And the bond will set active slave’s MAC address to bond’s
address if fail_over_mac is set to none (0) or follow (2).
When the first slave is removed, the bond will still use the removed slave’s
MAC address, which can lead to a duplicate MAC address and potentially cause
issues with the switch. To avoid confusion, let's warn the user in all
situations, including when fail_over_mac is set to 2 or not in active-backup
mode.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250225033914.18617-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0c5e145a350de3b38cd5ae77a401b12c46fb7c1d ]
When validation on the backup slave is enabled, we need to validate the
Neighbor Solicitation (NS) messages received on the backup slave. To
receive these messages, the correct destination MAC address must be added
to the slave. However, the target in bonding is a unicast address, which
we cannot use directly. Instead, we should first convert it to a
Solicited-Node Multicast Address and then derive the corresponding MAC
address.
Fix the incorrect MAC address setting on both slave_set_ns_maddr() and
slave_set_ns_maddrs(). Since the two function names are similar. Add
some description for the functions. Also only use one mac_addr variable
in slave_set_ns_maddr() to save some code and logic.
Fixes: 8eb36164d1a6 ("bonding: add ns target multicast address to slave device")
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250306023923.38777-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 77b11c8bf3a228d1c63464534c2dcc8d9c8bf7ff ]
Drivers like mlx5 expose NIC's vlan_features such as
NETIF_F_GSO_UDP_TUNNEL & NETIF_F_GSO_UDP_TUNNEL_CSUM which are
later not propagated when the underlying devices are bonded and
a vlan device created on top of the bond.
Right now, the more cumbersome workaround for this is to create
the vlan on top of the mlx5 and then enslave the vlan devices
to a bond.
To fix this, add NETIF_F_GSO_ENCAP_ALL to BOND_VLAN_FEATURES
such that bond_compute_features() can probe and propagate the
vlan_features from the slave devices up to the vlan device.
Given the following bond:
# ethtool -i enp2s0f{0,1}np{0,1}
driver: mlx5_core
[...]
# ethtool -k enp2s0f0np0 | grep udp
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: on
rx-udp-gro-forwarding: off
# ethtool -k enp2s0f1np1 | grep udp
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: on
rx-udp-gro-forwarding: off
# ethtool -k bond0 | grep udp
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: off [fixed]
rx-udp-gro-forwarding: off
Before:
# ethtool -k bond0.100 | grep udp
tx-udp_tnl-segmentation: off [requested on]
tx-udp_tnl-csum-segmentation: off [requested on]
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: off [fixed]
rx-udp-gro-forwarding: off
After:
# ethtool -k bond0.100 | grep udp
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: off [fixed]
rx-udp-gro-forwarding: off
Various users have run into this reporting performance issues when
configuring Cilium in vxlan tunneling mode and having the combination
of bond & vlan for the core devices connecting the Kubernetes cluster
to the outside world.
Fixes: a9b3ace44c7d ("bonding: fix vlan_features computing")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241210141245.327886-3-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8eb36164d1a6769a20ed43033510067ff3dab9ee ]
Commit 4598380f9c54 ("bonding: fix ns validation on backup slaves")
tried to resolve the issue where backup slaves couldn't be brought up when
receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only
worked for drivers that receive all multicast messages, such as the veth
interface.
For standard drivers, the NS multicast message is silently dropped because
the slave device is not a member of the NS target multicast group.
To address this, we need to make the slave device join the NS target
multicast group, ensuring it can receive these IPv6 NS messages to validate
the slave’s status properly.
There are three policies before joining the multicast group:
1. All settings must be under active-backup mode (alb and tlb do not support
arp_validate), with backup slaves and slaves supporting multicast.
2. We can add or remove multicast groups when arp_validate changes.
3. Other operations, such as enslaving, releasing, or setting NS targets,
need to be guarded by arp_validate.
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0cbfd45fbcf0cb26d85c981b91c62fe73cdee01c ]
syzbot reported a WARNING in bond_xdp_get_xmit_slave. To reproduce
this[1], one bond device (bond1) has xdpdrv, which increases
bpf_master_redirect_enabled_key. Another bond device (bond0) which is
unsupported by XDP but its slave (veth3) has xdpgeneric that returns
XDP_TX. This triggers WARN_ON_ONCE() from the xdp_master_redirect().
To reduce unnecessary warnings and improve log management, we need to
delete the WARN_ON_ONCE() and add ratelimit to the netdev_err().
[1] Steps to reproduce:
# Needs tx_xdp with return XDP_TX;
ip l add veth0 type veth peer veth1
ip l add veth3 type veth peer veth4
ip l add bond0 type bond mode 6 # BOND_MODE_ALB, unsupported by XDP
ip l add bond1 type bond # BOND_MODE_ROUNDROBIN by default
ip l set veth0 master bond1
ip l set bond1 up
# Increases bpf_master_redirect_enabled_key
ip l set dev bond1 xdpdrv object tx_xdp.o section xdp_tx
ip l set veth3 master bond0
ip l set bond0 up
ip l set veth4 up
# Triggers WARN_ON_ONCE() from the xdp_master_redirect()
ip l set veth3 xdpgeneric object tx_xdp.o section xdp_tx
Reported-by: syzbot+c187823a52ed505b2257@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c187823a52ed505b2257
Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver")
Signed-off-by: Jiwon Kim <jiwonaid0@gmail.com>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20240918140602.18644-1-jiwonaid0@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2aeeef906d5a526dc60cf4af92eda69836c39b1f ]
In the cited commit, bond->ipsec_lock is added to protect ipsec_list,
hence xdo_dev_state_add and xdo_dev_state_delete are called inside
this lock. As ipsec_lock is a spin lock and such xfrmdev ops may sleep,
"scheduling while atomic" will be triggered when changing bond's
active slave.
[ 101.055189] BUG: scheduling while atomic: bash/902/0x00000200
[ 101.055726] Modules linked in:
[ 101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1
[ 101.058760] Hardware name:
[ 101.059434] Call Trace:
[ 101.059436] <TASK>
[ 101.060873] dump_stack_lvl+0x51/0x60
[ 101.061275] __schedule_bug+0x4e/0x60
[ 101.061682] __schedule+0x612/0x7c0
[ 101.062078] ? __mod_timer+0x25c/0x370
[ 101.062486] schedule+0x25/0xd0
[ 101.062845] schedule_timeout+0x77/0xf0
[ 101.063265] ? asm_common_interrupt+0x22/0x40
[ 101.063724] ? __bpf_trace_itimer_state+0x10/0x10
[ 101.064215] __wait_for_common+0x87/0x190
[ 101.064648] ? usleep_range_state+0x90/0x90
[ 101.065091] cmd_exec+0x437/0xb20 [mlx5_core]
[ 101.065569] mlx5_cmd_do+0x1e/0x40 [mlx5_core]
[ 101.066051] mlx5_cmd_exec+0x18/0x30 [mlx5_core]
[ 101.066552] mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core]
[ 101.067163] ? bonding_sysfs_store_option+0x4d/0x80 [bonding]
[ 101.067738] ? kmalloc_trace+0x4d/0x350
[ 101.068156] mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core]
[ 101.068747] mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core]
[ 101.069312] bond_change_active_slave+0x392/0x900 [bonding]
[ 101.069868] bond_option_active_slave_set+0x1c2/0x240 [bonding]
[ 101.070454] __bond_opt_set+0xa6/0x430 [bonding]
[ 101.070935] __bond_opt_set_notify+0x2f/0x90 [bonding]
[ 101.071453] bond_opt_tryset_rtnl+0x72/0xb0 [bonding]
[ 101.071965] bonding_sysfs_store_option+0x4d/0x80 [bonding]
[ 101.072567] kernfs_fop_write_iter+0x10c/0x1a0
[ 101.073033] vfs_write+0x2d8/0x400
[ 101.073416] ? alloc_fd+0x48/0x180
[ 101.073798] ksys_write+0x5f/0xe0
[ 101.074175] do_syscall_64+0x52/0x110
[ 101.074576] entry_SYSCALL_64_after_hwframe+0x4b/0x53
As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called
from bond_change_active_slave, which requires holding the RTNL lock.
And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state
xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user
context. So ipsec_lock doesn't have to be spin lock, change it to
mutex, and thus the above issue can be resolved.
Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20240823031056.110999-4-jianbol@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 907ed83a7583e8ffede88c5ac088392701a7d458 ]
Add a local variable for slave->dev, to prepare for the lock change in
the next patch. There is no functionality change.
Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20240823031056.110999-3-jianbol@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ec13009472f4a756288eb4e18e20a7845da98d10 ]
Add this implementation for bonding, so hardware resources can be
freed from the active slave after xfrm state is deleted. The netdev
used to invoke xdo_dev_state_free callback, is saved in the xfrm state
(xs->xso.real_dev), which is also the bond's active slave. To prevent
it from being freed, acquire netdev reference before leaving RCU
read-side critical section, and release it after callback is done.
And call it when deleting all SAs from old active real interface while
switching current active slave.
Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20240823031056.110999-2-jianbol@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c4c5c5d2ef40a9f67a9241dc5422eac9ffe19547 ]
If the active slave is cleared manually the xfrm state is not flushed.
This leads to xfrm add/del imbalance and adding the same state multiple
times. For example when the device cannot handle anymore states we get:
[ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
because it's filled with the same state after multiple active slave
clearings. This change also has a few nice side effects: user-space
gets a notification for the change, the old device gets its mac address
and promisc/mcast adjusted properly.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f8cde9805981c50d0c029063dc7d82821806fc44 ]
We shouldn't set real_dev to NULL because packets can be in transit and
xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
real_dev is set.
Example trace:
kernel: BUG: unable to handle page fault for address: 0000000000001030
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: #PF: supervisor write access in kernel mode
kernel: #PF: error_code(0x0002) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: 0002 [#1] PREEMPT SMP
kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel:
kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
kernel: FS: 00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: Call Trace:
kernel: <TASK>
kernel: ? __die+0x1f/0x60
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: ? page_fault_oops+0x142/0x4c0
kernel: ? do_user_addr_fault+0x65/0x670
kernel: ? kvm_read_and_reset_apf_flags+0x3b/0x50
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: ? exc_page_fault+0x7b/0x180
kernel: ? asm_exc_page_fault+0x22/0x30
kernel: ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: bond_ipsec_offload_ok+0x7b/0x90 [bonding]
kernel: xfrm_output+0x61/0x3b0
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: ip_push_pending_frames+0x56/0x80
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 95c90e4ad89d493a7a14fa200082e466e2548f9d ]
We must check if there is an active slave before dereferencing the pointer.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fc59b9a5f7201b9f7272944596113a82cc7773d5 ]
Fix the return type which should be bool.
Fixes: 955b785ec6b3 ("bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3ba359c0cd6eb5ea772125a7aededb4a2d516684 ]
RCU use in bond_should_notify_peers() looks wrong, since it does
rcu_dereference(), leaves the critical section, and uses the
pointer after that.
Luckily, it's called either inside a nested RCU critical section
or with the RTNL held.
Annotate it with rcu_dereference_rtnl() instead, and remove the
inner RCU critical section.
Fixes: 4cb4f97b7e36 ("bonding: rebuild the lock use for bond_mii_monitor()")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20240719094119.35c62455087d.I68eb9c0f02545b364b79a59f2110f2cf5682a8e2@changeid
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e271ff53807e8f2c628758290f0e499dbe51cb3d ]
In function bond_option_arp_ip_targets_set(), if newval->string is an
empty string, newval->string+1 will point to the byte after the
string, causing an out-of-bound read.
BUG: KASAN: slab-out-of-bounds in strlen+0x7d/0xa0 lib/string.c:418
Read of size 1 at addr ffff8881119c4781 by task syz-executor665/8107
CPU: 1 PID: 8107 Comm: syz-executor665 Not tainted 6.7.0-rc7 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:364 [inline]
print_report+0xc1/0x5e0 mm/kasan/report.c:475
kasan_report+0xbe/0xf0 mm/kasan/report.c:588
strlen+0x7d/0xa0 lib/string.c:418
__fortify_strlen include/linux/fortify-string.h:210 [inline]
in4_pton+0xa3/0x3f0 net/core/utils.c:130
bond_option_arp_ip_targets_set+0xc2/0x910
drivers/net/bonding/bond_options.c:1201
__bond_opt_set+0x2a4/0x1030 drivers/net/bonding/bond_options.c:767
__bond_opt_set_notify+0x48/0x150 drivers/net/bonding/bond_options.c:792
bond_opt_tryset_rtnl+0xda/0x160 drivers/net/bonding/bond_options.c:817
bonding_sysfs_store_option+0xa1/0x120 drivers/net/bonding/bond_sysfs.c:156
dev_attr_store+0x54/0x80 drivers/base/core.c:2366
sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136
kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334
call_write_iter include/linux/fs.h:2020 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x96a/0xd80 fs/read_write.c:584
ksys_write+0x122/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
---[ end trace ]---
Fix it by adding a check of string length before using it.
Fixes: f9de11a16594 ("bonding: add ip checks when store ip target")
Signed-off-by: Yue Sun <samsun1006219@gmail.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20240702-bond-oob-v6-1-2dfdba195c19@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit a45835a0bb6ef7d5ddbc0714dd760de979cb6ece upstream.
"rmmod bonding" causes an oops ever since commit cc317ea3d927 ("bonding:
remove redundant NULL check in debugfs function"). Here are the relevant
functions being called:
bonding_exit()
bond_destroy_debugfs()
debugfs_remove_recursive(bonding_debug_root);
bonding_debug_root = NULL; <--------- SET TO NULL HERE
bond_netlink_fini()
rtnl_link_unregister()
__rtnl_link_unregister()
unregister_netdevice_many_notify()
bond_uninit()
bond_debug_unregister()
(commit removed check for bonding_debug_root == NULL)
debugfs_remove()
simple_recursive_removal()
down_write() -> OOPS
However, reverting the bad commit does not solve the problem completely
because the original code contains a race that could cause the same
oops, although it was much less likely to be triggered unintentionally:
CPU1
rmmod bonding
bonding_exit()
bond_destroy_debugfs()
debugfs_remove_recursive(bonding_debug_root);
CPU2
echo -bond0 > /sys/class/net/bonding_masters
bond_uninit()
bond_debug_unregister()
if (!bonding_debug_root)
CPU1
bonding_debug_root = NULL;
So do NOT revert the bad commit (since the removed checks were racy
anyway), and instead change the order of actions taken during module
removal. The same oops can also happen if there is an error during
module init, so apply the same fix there.
Fixes: cc317ea3d927 ("bonding: remove redundant NULL check in debugfs function")
Cc: stable@vger.kernel.org
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/641f914f-3216-4eeb-87dd-91b78aa97773@cybernetics.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f267f262815033452195f46c43b572159262f533 ]
Commit 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
changed the driver from reporting everything as supported before a device
was bonded into having the driver report that no XDP feature is supported
until a real device is bonded as it seems to be more truthful given
eventually real underlying devices decide what XDP features are supported.
The change however did not take into account when all slave devices get
removed from the bond device. In this case after 9b0ed890ac2a, the driver
keeps reporting a feature mask of 0x77, that is, NETDEV_XDP_ACT_MASK &
~NETDEV_XDP_ACT_XSK_ZEROCOPY whereas it should have reported a feature
mask of 0.
Fix it by resetting XDP feature flags in the same way as if no XDP program
is attached to the bond device. This was uncovered by the XDP bond selftest
which let BPF CI fail. After adjusting the starting masks on the latter
to 0 instead of NETDEV_XDP_ACT_MASK the test passes again together with
this fix.
Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Prashant Batra <prbatra.mail@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Message-ID: <20240305090829.17131-1-daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9b0ed890ac2ae233efd8b27d11aee28a19437bb8 ]
Do not report the XDP capability NETDEV_XDP_ACT_XSK_ZEROCOPY as the
bonding driver does not support XDP and AF_XDP in zero-copy mode even
if the real NIC drivers do.
Note that the driver used to report everything as supported before a
device was bonded. Instead of just masking out the zero-copy support
from this, have the driver report that no XDP feature is supported
until a real device is bonded. This seems to be more truthful as it is
the real drivers that decide what XDP features are supported.
Fixes: cb9e6e584d58 ("bonding: add xdp_features support")
Reported-by: Prashant Batra <prbatra.mail@gmail.com>
Link: https://lore.kernel.org/all/CAJ8uoz2ieZCopgqTvQ9ZY6xQgTbujmC6XkMTamhp68O-h_-rLg@mail.gmail.com/T/
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20240207084737.20890-1-magnus.karlsson@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 486058f42a4728053ae69ebbf78e9731d8ce6f8b upstream.
As suggested by Paolo in link[1], if the memory allocation fails, the mm
layer will emit a lot warning comprising the backtrace, so remove the
print.
[1] https://lore.kernel.org/all/20231118081653.1481260-1-shaozhengchao@huawei.com/
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d6b83f1e3707c4d60acfa58afd3515e17e5d5384 ]
If failed to allocate "tags" or could not find the final upper device from
start_dev's upper list in bond_verify_device_path(), only the loopback
detection of the current upper device should be affected, and the system is
no need to be panic.
So return -ENOMEM in alb_upper_dev_walk to stop walking, print some warn
information when failed to allocate memory for vlan tags in
bond_verify_device_path.
I also think that the following function calls
netdev_walk_all_upper_dev_rcu
---->>>alb_upper_dev_walk
---------->>>bond_verify_device_path
From this way, "end device" can eventually be obtained from "start device"
in bond_verify_device_path, IS_ERR(tags) could be instead of
IS_ERR_OR_NULL(tags) in alb_upper_dev_walk.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20231118081653.1481260-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3cffa2ddc4d3fcf70cde361236f5a614f81a09b2 ]
Commit 9eed321cde22 ("net: lapbether: only support ethernet devices")
has been able to keep syzbot away from net/lapb, until today.
In the following splat [1], the issue is that a lapbether device has
been created on a bonding device without members. Then adding a non
ARPHRD_ETHER member forced the bonding master to change its type.
The fix is to make sure we call dev_close() in bond_setup_by_slave()
so that the potential linked lapbether devices (or any other devices
having assumptions on the physical device) are removed.
A similar bug has been addressed in commit 40baec225765
("bonding: fix panic on non-ARPHRD_ETHER enslave failure")
[1]
skbuff: skb_under_panic: text:ffff800089508810 len:44 put:40 head:ffff0000c78e7c00 data:ffff0000c78e7bea tail:0x16 end:0x140 dev:bond0
kernel BUG at net/core/skbuff.c:192 !
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 6007 Comm: syz-executor383 Not tainted 6.6.0-rc3-syzkaller-gbf6547d8715b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : skb_panic net/core/skbuff.c:188 [inline]
pc : skb_under_panic+0x13c/0x140 net/core/skbuff.c:202
lr : skb_panic net/core/skbuff.c:188 [inline]
lr : skb_under_panic+0x13c/0x140 net/core/skbuff.c:202
sp : ffff800096a06aa0
x29: ffff800096a06ab0 x28: ffff800096a06ba0 x27: dfff800000000000
x26: ffff0000ce9b9b50 x25: 0000000000000016 x24: ffff0000c78e7bea
x23: ffff0000c78e7c00 x22: 000000000000002c x21: 0000000000000140
x20: 0000000000000028 x19: ffff800089508810 x18: ffff800096a06100
x17: 0000000000000000 x16: ffff80008a629a3c x15: 0000000000000001
x14: 1fffe00036837a32 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000201 x10: 0000000000000000 x9 : cb50b496c519aa00
x8 : cb50b496c519aa00 x7 : 0000000000000001 x6 : 0000000000000001
x5 : ffff800096a063b8 x4 : ffff80008e280f80 x3 : ffff8000805ad11c
x2 : 0000000000000001 x1 : 0000000100000201 x0 : 0000000000000086
Call trace:
skb_panic net/core/skbuff.c:188 [inline]
skb_under_panic+0x13c/0x140 net/core/skbuff.c:202
skb_push+0xf0/0x108 net/core/skbuff.c:2446
ip6gre_header+0xbc/0x738 net/ipv6/ip6_gre.c:1384
dev_hard_header include/linux/netdevice.h:3136 [inline]
lapbeth_data_transmit+0x1c4/0x298 drivers/net/wan/lapbether.c:257
lapb_data_transmit+0x8c/0xb0 net/lapb/lapb_iface.c:447
lapb_transmit_buffer+0x178/0x204 net/lapb/lapb_out.c:149
lapb_send_control+0x220/0x320 net/lapb/lapb_subr.c:251
__lapb_disconnect_request+0x9c/0x17c net/lapb/lapb_iface.c:326
lapb_device_event+0x288/0x4e0 net/lapb/lapb_iface.c:492
notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93
raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461
call_netdevice_notifiers_info net/core/dev.c:1970 [inline]
call_netdevice_notifiers_extack net/core/dev.c:2008 [inline]
call_netdevice_notifiers net/core/dev.c:2022 [inline]
__dev_close_many+0x1b8/0x3c4 net/core/dev.c:1508
dev_close_many+0x1e0/0x470 net/core/dev.c:1559
dev_close+0x174/0x250 net/core/dev.c:1585
lapbeth_device_event+0x2e4/0x958 drivers/net/wan/lapbether.c:466
notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93
raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461
call_netdevice_notifiers_info net/core/dev.c:1970 [inline]
call_netdevice_notifiers_extack net/core/dev.c:2008 [inline]
call_netdevice_notifiers net/core/dev.c:2022 [inline]
__dev_close_many+0x1b8/0x3c4 net/core/dev.c:1508
dev_close_many+0x1e0/0x470 net/core/dev.c:1559
dev_close+0x174/0x250 net/core/dev.c:1585
bond_enslave+0x2298/0x30cc drivers/net/bonding/bond_main.c:2332
bond_do_ioctl+0x268/0xc64 drivers/net/bonding/bond_main.c:4539
dev_ifsioc+0x754/0x9ac
dev_ioctl+0x4d8/0xd34 net/core/dev_ioctl.c:786
sock_do_ioctl+0x1d4/0x2d0 net/socket.c:1217
sock_ioctl+0x4e8/0x834 net/socket.c:1322
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl fs/ioctl.c:857 [inline]
__arm64_sys_ioctl+0x14c/0x1c8 fs/ioctl.c:857
__invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155
el0_svc+0x58/0x16c arch/arm64/kernel/entry-common.c:678
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Code: aa1803e6 aa1903e7 a90023f5 94785b8b (d4210000)
Fixes: 872254dd6b1f ("net/bonding: Enable bonding to enslave non ARPHRD_ETHER")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20231109180102.4085183-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Since 429e3d123d9a ("bonding: Fix extraction of ports from the packet
headers"), header offsets used to compute a hash in bond_xmit_hash() are
relative to skb->data and not skb->head. If the tail of the header buffer
of an skb really needs to be advanced and the operation is successful, the
pointer to the data must be returned (and not a pointer to the head of the
buffer).
Fixes: 429e3d123d9a ("bonding: Fix extraction of ports from the packet headers")
Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
include/net/inet_sock.h
f866fbc842de ("ipv4: fix data-races around inet->inet_id")
c274af224269 ("inet: introduce inet->inet_flags")
https://lore.kernel.org/all/679ddff6-db6e-4ff6-b177-574e90d0103d@tessares.net/
Adjacent changes:
drivers/net/bonding/bond_alb.c
e74216b8def3 ("bonding: fix macvlan over alb bond support")
f11e5bd159b0 ("bonding: support balance-alb with openvswitch")
drivers/net/ethernet/broadcom/bgmac.c
d6499f0b7c7c ("net: bgmac: Return PTR_ERR() for fixed_phy_register()")
23a14488ea58 ("net: bgmac: Fix return value check for fixed_phy_register()")
drivers/net/ethernet/broadcom/genet/bcmmii.c
32bbe64a1386 ("net: bcmgenet: Fix return value check for fixed_phy_register()")
acf50d1adbf4 ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()")
net/sctp/socket.c
f866fbc842de ("ipv4: fix data-races around inet->inet_id")
b09bde5c3554 ("inet: move inet->mc_loop to inet->inet_frags")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The commit 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode
bonds") aims to enable the use of macvlans on top of rlb bond mode. However,
the current rlb bond mode only handles ARP packets to update remote neighbor
entries. This causes an issue when a macvlan is on top of the bond, and
remote devices send packets to the macvlan using the bond's MAC address
as the destination. After delivering the packets to the macvlan, the macvlan
will rejects them as the MAC address is incorrect. Consequently, this commit
makes macvlan over bond non-functional.
To address this problem, one potential solution is to check for the presence
of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev)
and return NULL in the rlb_arp_xmit() function. However, this approach
doesn't fully resolve the situation when a VLAN exists between the bond and
macvlan.
So let's just do a partial revert for commit 14af9963ba1e in rlb_arp_xmit().
As the comment said, Don't modify or load balance ARPs that do not originate
locally.
Fixes: 14af9963ba1e ("bonding: Support macvlans on top of tlb/rlb mode bonds")
Reported-by: susan.zheng@veritas.com
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117816
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Andrew reported a bonding issue that if we put an active-back bond on top
of a 802.3ad bond interface. When the 802.3ad bond's speed/duplex changed
dynamically. The upper bonding interface's speed/duplex can't be changed at
the same time, which will show incorrect speed.
Fix it by updating the port speed when calling ethtool.
Reported-by: Andrew Schorr <ajschorr@alumni.princeton.edu>
Closes: https://lore.kernel.org/netdev/ZEt3hvyREPVdbesO@Laptop-X1/
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20230821101008.797482-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The free_percpu function also could check whether "rr_tx_counter"
parameter is NULL. Therefore, remove NULL check in bond_destructor.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In bond_reset_slave_arr(), values are assigned and memory is released only
when the variables "usable" and "all" are not NULL. But even if the
"usable" and "all" variables are NULL, they can still work, because value
will be checked in kfree_rcu. Therefore, use bond_set_slave_arr() and set
the input parameters "usable_slaves" and "all_slaves" to NULL to simplify
the code in bond_reset_slave_arr(). And the same to bond_uninit().
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Because debugfs_create_dir returns ERR_PTR, so bonding_debug_root will
never be NULL. Remove redundant NULL check for bonding_debug_root in
debugfs function. The later debugfs_create_dir/debugfs_remove_recursive
/debugfs_remove_recursive functions will check the dentry with IS_ERR().
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Because debugfs_create_dir returns ERR_PTR, so IS_ERR should be used to
check whether the directory is successfully created.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some functions are only used in initialization and exit functions, so add
the __init/__net_init and __net_exit modifiers to these functions.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/intel/igc/igc_main.c
06b412589eef ("igc: Add lock to safeguard global Qbv variables")
d3750076d464 ("igc: Add TransmissionOverrun counter")
drivers/net/ethernet/microsoft/mana/mana_en.c
a7dfeda6fdec ("net: mana: Fix MANA VF unload when hardware is unresponsive")
a9ca9f9ceff3 ("page_pool: split types and declarations from page_pool.h")
92272ec4107e ("eth: add missing xdp.h includes in drivers")
net/mptcp/protocol.h
511b90e39250 ("mptcp: fix disconnect vs accept race")
b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning")
tools/testing/selftests/net/mptcp/mptcp_join.sh
c8c101ae390a ("selftests: mptcp: join: fix 'implicit EP' test")
03668c65d153 ("selftests: mptcp: join: rework detailed report")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with
following testcase:
# ip netns add ns1
# ip netns exec ns1 ip link add bond0 type bond mode 0
# ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
# ip netns exec ns1 ip link set bond_slave_1 master bond0
# ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link set bond_slave_1 nomaster
# ip netns del ns1
The logical analysis of the problem is as follows:
1. create ETH_P_8021AD protocol vlan10 for bond_slave_1:
register_vlan_dev()
vlan_vid_add()
vlan_info_alloc()
__vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1
2. create ETH_P_8021AD protocol bond0_vlan10 for bond0:
register_vlan_dev()
vlan_vid_add()
__vlan_vid_add()
vlan_add_rx_filter_info()
if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER
return 0;
if (netif_device_present(dev))
return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called
// The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid.
3. detach bond_slave_1 from bond0:
__bond_release_one()
vlan_vids_del_by_dev()
list_for_each_entry(vid_info, &vlan_info->vid_list, list)
vlan_vid_del(dev, vid_info->proto, vid_info->vid);
// bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted.
// bond_slave_1->vlan_info will be assigned NULL.
4. delete vlan10 during delete ns1:
default_device_exit_batch()
dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10
vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1
BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!!
Add S-VLAN tag related features support to bond driver. So the bond driver
will always propagate the VLAN info to its slaves.
Fixes: 8ad227ff89a7 ("net: vlan: add 802.1ad support")
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230802114320.4156068-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2023-08-03
We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 84 files changed, 4026 insertions(+), 562 deletions(-).
The main changes are:
1) Add SO_REUSEPORT support for TC bpf_sk_assign from Lorenz Bauer,
Daniel Borkmann
2) Support new insns from cpu v4 from Yonghong Song
3) Non-atomically allocate freelist during prefill from YiFei Zhu
4) Support defragmenting IPv(4|6) packets in BPF from Daniel Xu
5) Add tracepoint to xdp attaching failure from Leon Hwang
6) struct netdev_rx_queue and xdp.h reshuffling to reduce
rebuild time from Jakub Kicinski
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
net: invert the netdevice.h vs xdp.h dependency
net: move struct netdev_rx_queue out of netdevice.h
eth: add missing xdp.h includes in drivers
selftests/bpf: Add testcase for xdp attaching failure tracepoint
bpf, xdp: Add tracepoint to xdp attaching failure
selftests/bpf: fix static assert compilation issue for test_cls_*.c
bpf: fix bpf_probe_read_kernel prototype mismatch
riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework
libbpf: fix typos in Makefile
tracing: bpf: use struct trace_entry in struct syscall_tp_t
bpf, devmap: Remove unused dtab field from bpf_dtab_netdev
bpf, cpumap: Remove unused cmap field from bpf_cpu_map_entry
netfilter: bpf: Only define get_proto_defrag_hook() if necessary
bpf: Fix an array-index-out-of-bounds issue in disasm.c
net: remove duplicate INDIRECT_CALLABLE_DECLARE of udp[6]_ehashfn
docs/bpf: Fix malformed documentation
bpf: selftests: Add defrag selftests
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Support not connecting client socket
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
...
====================
Link: https://lore.kernel.org/r/20230803174845.825419-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|