Age | Commit message (Collapse) | Author | Files | Lines |
|
We already annotated most accesses to sk->sk_napi_id
We missed sk_mark_napi_id() and sk_mark_napi_id_once()
which might be called without socket lock held in UDP stack.
KCSAN reported :
BUG: KCSAN: data-race in udpv6_queue_rcv_one_skb / udpv6_queue_rcv_one_skb
write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 0:
sk_mark_napi_id include/net/busy_poll.h:125 [inline]
__udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline]
udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672
udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689
udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832
__udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913
udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015
ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409
ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459
dst_input include/net/dst.h:442 [inline]
ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
process_backlog+0x1d3/0x420 net/core/dev.c:5955
napi_poll net/core/dev.c:6392 [inline]
net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 1:
sk_mark_napi_id include/net/busy_poll.h:125 [inline]
__udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline]
udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672
udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689
udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832
__udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913
udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015
ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409
ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459
dst_input include/net/dst.h:442 [inline]
ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
process_backlog+0x1d3/0x420 net/core/dev.c:5955
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 10890 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: e68b6e50fa35 ("udp: enable busy polling for all sockets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This socket field can be read and written by concurrent cpus.
Use READ_ONCE() and WRITE_ONCE() annotations to document this,
and avoid some compiler 'optimizations'.
KCSAN reported :
BUG: KCSAN: data-race in tcp_v4_rcv / tcp_v4_rcv
write to 0xffff88812220763c of 4 bytes by interrupt on cpu 0:
sk_incoming_cpu_update include/net/sock.h:953 [inline]
tcp_v4_rcv+0x1b3c/0x1bb0 net/ipv4/tcp_ipv4.c:1934
ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
process_backlog+0x1d3/0x420 net/core/dev.c:5955
napi_poll net/core/dev.c:6392 [inline]
net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
__do_softirq+0x115/0x33f kernel/softirq.c:292
do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
do_softirq kernel/softirq.c:329 [inline]
__local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189
read to 0xffff88812220763c of 4 bytes by interrupt on cpu 1:
sk_incoming_cpu_update include/net/sock.h:952 [inline]
tcp_v4_rcv+0x181a/0x1bb0 net/ipv4/tcp_ipv4.c:1934
ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
__netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
process_backlog+0x1d3/0x420 net/core/dev.c:5955
napi_poll net/core/dev.c:6392 [inline]
net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
__do_softirq+0x115/0x33f kernel/softirq.c:292
run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The union should contain the extended dest and counter list.
Remove the resevered 0x40 bits which is redundant.
This change doesn't break any functionally.
Everything works today because the code in fs_cmd.c is using
the correct structs if extended dest or the basic dest.
Fixes: 1b115498598f ("net/mlx5: Introduce extended destination fields")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
sk_page_frag() optimizes skb_frag allocations by using per-task
skb_frag cache when it knows it's the only user. The condition is
determined by seeing whether the socket allocation mask allows
blocking - if the allocation may block, it obviously owns the task's
context and ergo exclusively owns current->task_frag.
Unfortunately, this misses recursion through memory reclaim path.
Please take a look at the following backtrace.
[2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10
...
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x30/0x40
sock_xmit.isra.24+0xa1/0x170 [nbd]
nbd_send_cmd+0x1d2/0x690 [nbd]
nbd_queue_rq+0x1b5/0x3b0 [nbd]
__blk_mq_try_issue_directly+0x108/0x1b0
blk_mq_request_issue_directly+0xbd/0xe0
blk_mq_try_issue_list_directly+0x41/0xb0
blk_mq_sched_insert_requests+0xa2/0xe0
blk_mq_flush_plug_list+0x205/0x2a0
blk_flush_plug_list+0xc3/0xf0
[1] blk_finish_plug+0x21/0x2e
_xfs_buf_ioapply+0x313/0x460
__xfs_buf_submit+0x67/0x220
xfs_buf_read_map+0x113/0x1a0
xfs_trans_read_buf_map+0xbf/0x330
xfs_btree_read_buf_block.constprop.42+0x95/0xd0
xfs_btree_lookup_get_block+0x95/0x170
xfs_btree_lookup+0xcc/0x470
xfs_bmap_del_extent_real+0x254/0x9a0
__xfs_bunmapi+0x45c/0xab0
xfs_bunmapi+0x15/0x30
xfs_itruncate_extents_flags+0xca/0x250
xfs_free_eofblocks+0x181/0x1e0
xfs_fs_destroy_inode+0xa8/0x1b0
destroy_inode+0x38/0x70
dispose_list+0x35/0x50
prune_icache_sb+0x52/0x70
super_cache_scan+0x120/0x1a0
do_shrink_slab+0x120/0x290
shrink_slab+0x216/0x2b0
shrink_node+0x1b6/0x4a0
do_try_to_free_pages+0xc6/0x370
try_to_free_mem_cgroup_pages+0xe3/0x1e0
try_charge+0x29e/0x790
mem_cgroup_charge_skmem+0x6a/0x100
__sk_mem_raise_allocated+0x18e/0x390
__sk_mem_schedule+0x2a/0x40
[0] tcp_sendmsg_locked+0x8eb/0xe10
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x30/0x40
___sys_sendmsg+0x26d/0x2b0
__sys_sendmsg+0x57/0xa0
do_syscall_64+0x42/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
In [0], tcp_send_msg_locked() was using current->page_frag when it
called sk_wmem_schedule(). It already calculated how many bytes can
be fit into current->page_frag. Due to memory pressure,
sk_wmem_schedule() called into memory reclaim path which called into
xfs and then IO issue path. Because the filesystem in question is
backed by nbd, the control goes back into the tcp layer - back into
tcp_sendmsg_locked().
nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes
sense - it's in the process of freeing memory and wants to be able to,
e.g., drop clean pages to make forward progress. However, this
confused sk_page_frag() called from [2]. Because it only tests
whether the allocation allows blocking which it does, it now thinks
current->page_frag can be used again although it already was being
used in [0].
After [2] used current->page_frag, the offset would be increased by
the used amount. When the control returns to [0],
current->page_frag's offset is increased and the previously calculated
number of bytes now may overrun the end of allocated memory leading to
silent memory corruptions.
Fix it by adding gfpflags_normal_context() which tests sleepable &&
!reclaim and use it to determine whether to use current->task_frag.
v2: Eric didn't like gfp flags being tested twice. Introduce a new
helper gfpflags_normal_context() and combine the two tests.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some paths call skb_queue_empty() without holding
the queue lock. We must use a barrier in order
to not let the compiler do strange things, and avoid
KCSAN splats.
Adding a barrier in skb_queue_empty() might be overkill,
I prefer adding a new helper to clearly identify
points where the callers might be lockless. This might
help us finding real bugs.
The corresponding WRITE_ONCE() should add zero cost
for current compilers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for net:
1) Fix crash on flowtable due to race between garbage collection
and insertion.
2) Restore callback unbinding in netfilter offloads.
3) Fix races on IPVS module removal, from Davide Caratti.
4) Make old_secure_tcp per-netns to fix sysbot report,
from Eric Dumazet.
5) Validate matching length in netfilter offloads, from wenxu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2019-10-27
The following pull-request contains BPF updates for your *net* tree.
We've added 7 non-merge commits during the last 11 day(s) which contain
a total of 7 files changed, 66 insertions(+), 16 deletions(-).
The main changes are:
1) Fix two use-after-free bugs in relation to RCU in jited symbol exposure to
kallsyms, from Daniel Borkmann.
2) Fix NULL pointer dereference in AF_XDP rx-only sockets, from Magnus Karlsson.
3) Fix hang in netdev unregister for hash based devmap as well as another overflow
bug on 32 bit archs in memlock cost calculation, from Toke Høiland-Jørgensen.
4) Fix wrong memory access in LWT BPF programs on reroute due to invalid dst.
Also fix BPF selftests to use more compatible nc options, from Jiri Benc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to
rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances,
but there are a few paths calling rtnl_net_notifyid() from atomic
context or from RCU critical sections. The later also precludes the use
of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new()
call is wrong too, as it uses GFP_KERNEL unconditionally.
Therefore, we need to pass the GFP flags as parameter and propagate it
through function calls until the proper flags can be determined.
In most cases, GFP_KERNEL is fine. The exceptions are:
* openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump()
indirectly call rtnl_net_notifyid() from RCU critical section,
* rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as
parameter.
Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used
by nlmsg_new(). The function is allowed to sleep, so better make the
flags consistent with the ones used in the following
ovs_vport_cmd_fill_info() call.
Found by code inspection.
Fixes: 9a9634545c70 ("netns: notify netns id events")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If CONFIG_NET_HWBM is not set, then these stub functions in
<net/hwbm.h> should be declared static to avoid trying to
export them from any driver that includes this.
Fixes the following sparse warnings:
./include/net/hwbm.h:24:6: warning: symbol 'hwbm_buf_free' was not declared. Should it be static?
./include/net/hwbm.h:25:5: warning: symbol 'hwbm_pool_refill' was not declared. Should it be static?
./include/net/hwbm.h:26:5: warning: symbol 'hwbm_pool_add' was not declared. Should it be static?
Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes variables and callback these are related to the nested
device structure.
devices that can be nested have their own nest_level variable that
represents the depth of nested devices.
In the previous patch, new {lower/upper}_level variables are added and
they replace old private nest_level variable.
So, this patch removes all 'nest_level' variables.
In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added
to get lockdep subclass value, which is actually lower nested depth value.
But now, they use the dynamic lockdep key to avoid lockdep warning instead
of the subclass.
So, this patch removes ->ndo_get_lock_subclass() callback.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current vxlan code doesn't limit the number of nested devices.
Nested devices would be handled recursively and this routine needs
huge stack memory. So, unlimited nested devices could make
stack overflow.
In order to fix this issue, this patch adds adjacent links.
The adjacent link APIs internally check the depth level.
Test commands:
ip link add dummy0 type dummy
ip link add vxlan0 type vxlan id 0 group 239.1.1.1 dev dummy0 \
dstport 4789
for i in {1..100}
do
let A=$i-1
ip link add vxlan$i type vxlan id $i group 239.1.1.1 \
dev vxlan$A dstport 4789
done
ip link del dummy0
The top upper link is vxlan100 and the lowest link is vxlan0.
When vxlan0 is deleting, the upper devices will be deleted recursively.
It needs huge stack memory so it makes stack overflow.
Splat looks like:
[ 229.628477] =============================================================================
[ 229.629785] BUG page->ptl (Not tainted): Padding overwritten. 0x0000000026abf214-0x0000000091f6abb2
[ 229.629785] -----------------------------------------------------------------------------
[ 229.629785]
[ 229.655439] ==================================================================
[ 229.629785] INFO: Slab 0x00000000ff7cfda8 objects=19 used=19 fp=0x00000000fe33776c flags=0x200000000010200
[ 229.655688] BUG: KASAN: stack-out-of-bounds in unmap_single_vma+0x25a/0x2e0
[ 229.655688] Read of size 8 at addr ffff888113076928 by task vlan-network-in/2334
[ 229.655688]
[ 229.629785] Padding 0000000026abf214: 00 80 14 0d 81 88 ff ff 68 91 81 14 81 88 ff ff ........h.......
[ 229.629785] Padding 0000000001e24790: 38 91 81 14 81 88 ff ff 68 91 81 14 81 88 ff ff 8.......h.......
[ 229.629785] Padding 00000000b39397c8: 33 30 62 a7 ff ff ff ff ff eb 60 22 10 f1 ff 1f 30b.......`"....
[ 229.629785] Padding 00000000bc98f53a: 80 60 07 13 81 88 ff ff 00 80 14 0d 81 88 ff ff .`..............
[ 229.629785] Padding 000000002aa8123d: 68 91 81 14 81 88 ff ff f7 21 17 a7 ff ff ff ff h........!......
[ 229.629785] Padding 000000001c8c2369: 08 81 14 0d 81 88 ff ff 03 02 00 00 00 00 00 00 ................
[ 229.629785] Padding 000000004e290c5d: 21 90 a2 21 10 ed ff ff 00 00 00 00 00 fc ff df !..!............
[ 229.629785] Padding 000000000e25d731: 18 60 07 13 81 88 ff ff c0 8b 13 05 81 88 ff ff .`..............
[ 229.629785] Padding 000000007adc7ab3: b3 8a b5 41 00 00 00 00 ...A....
[ 229.629785] FIX page->ptl: Restoring 0x0000000026abf214-0x0000000091f6abb2=0x5a
[ ... ]
Fixes: acaf4e70997f ("net: vxlan: when lower dev unregisters remove vxlan dev as well")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to link an adjacent node, netdev_upper_dev_link() is used
and in order to unlink an adjacent node, netdev_upper_dev_unlink() is used.
unlink operation does not fail, but link operation can fail.
In order to exchange adjacent nodes, we should unlink an old adjacent
node first. then, link a new adjacent node.
If link operation is failed, we should link an old adjacent node again.
But this link operation can fail too.
It eventually breaks the adjacent link relationship.
This patch adds an ignore flag into the netdev_adjacent structure.
If this flag is set, netdev_upper_dev_link() ignores an old adjacent
node for a moment.
This patch also adds new functions for other modules.
netdev_adjacent_change_prepare()
netdev_adjacent_change_commit()
netdev_adjacent_change_abort()
netdev_adjacent_change_prepare() inserts new device into adjacent list
but new device is not allowed to use immediately.
If netdev_adjacent_change_prepare() fails, it internally rollbacks
adjacent list so that we don't need any other action.
netdev_adjacent_change_commit() deletes old device in the adjacent list
and allows new device to use.
netdev_adjacent_change_abort() rollbacks adjacent list.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
team interface could be nested and it's lock variable could be nested too.
But this lock uses static lockdep key and there is no nested locking
handling code such as mutex_lock_nested() and so on.
so the Lockdep would warn about the circular locking scenario that
couldn't happen.
In order to fix, this patch makes the team module to use dynamic lock key
instead of static key.
Test commands:
ip link add team0 type team
ip link add team1 type team
ip link set team0 master team1
ip link set team0 nomaster
ip link set team1 master team0
ip link set team1 nomaster
Splat that looks like:
[ 40.364352] WARNING: possible recursive locking detected
[ 40.364964] 5.4.0-rc3+ #96 Not tainted
[ 40.365405] --------------------------------------------
[ 40.365973] ip/750 is trying to acquire lock:
[ 40.366542] ffff888060b34c40 (&team->lock){+.+.}, at: team_set_mac_address+0x151/0x290 [team]
[ 40.367689]
but task is already holding lock:
[ 40.368729] ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team]
[ 40.370280]
other info that might help us debug this:
[ 40.371159] Possible unsafe locking scenario:
[ 40.371942] CPU0
[ 40.372338] ----
[ 40.372673] lock(&team->lock);
[ 40.373115] lock(&team->lock);
[ 40.373549]
*** DEADLOCK ***
[ 40.374432] May be due to missing lock nesting notation
[ 40.375338] 2 locks held by ip/750:
[ 40.375851] #0: ffffffffabcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
[ 40.376927] #1: ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team]
[ 40.377989]
stack backtrace:
[ 40.378650] CPU: 0 PID: 750 Comm: ip Not tainted 5.4.0-rc3+ #96
[ 40.379368] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 40.380574] Call Trace:
[ 40.381208] dump_stack+0x7c/0xbb
[ 40.381959] __lock_acquire+0x269d/0x3de0
[ 40.382817] ? register_lock_class+0x14d0/0x14d0
[ 40.383784] ? check_chain_key+0x236/0x5d0
[ 40.384518] lock_acquire+0x164/0x3b0
[ 40.385074] ? team_set_mac_address+0x151/0x290 [team]
[ 40.385805] __mutex_lock+0x14d/0x14c0
[ 40.386371] ? team_set_mac_address+0x151/0x290 [team]
[ 40.387038] ? team_set_mac_address+0x151/0x290 [team]
[ 40.387632] ? mutex_lock_io_nested+0x1380/0x1380
[ 40.388245] ? team_del_slave+0x60/0x60 [team]
[ 40.388752] ? rcu_read_lock_sched_held+0x90/0xc0
[ 40.389304] ? rcu_read_lock_bh_held+0xa0/0xa0
[ 40.389819] ? lock_acquire+0x164/0x3b0
[ 40.390285] ? lockdep_rtnl_is_held+0x16/0x20
[ 40.390797] ? team_port_get_rtnl+0x90/0xe0 [team]
[ 40.391353] ? __module_text_address+0x13/0x140
[ 40.391886] ? team_set_mac_address+0x151/0x290 [team]
[ 40.392547] team_set_mac_address+0x151/0x290 [team]
[ 40.393111] dev_set_mac_address+0x1f0/0x3f0
[ ... ]
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All bonding device has same lockdep key and subclass is initialized with
nest_level.
But actual nest_level value can be changed when a lower device is attached.
And at this moment, the subclass should be updated but it seems to be
unsafe.
So this patch makes bonding use dynamic lockdep key instead of the
subclass.
Test commands:
ip link add bond0 type bond
for i in {1..5}
do
let A=$i-1
ip link add bond$i type bond
ip link set bond$i master bond$A
done
ip link set bond5 master bond0
Splat looks like:
[ 307.992912] WARNING: possible recursive locking detected
[ 307.993656] 5.4.0-rc3+ #96 Tainted: G W
[ 307.994367] --------------------------------------------
[ 307.995092] ip/761 is trying to acquire lock:
[ 307.995710] ffff8880513aac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 307.997045]
but task is already holding lock:
[ 307.997923] ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 307.999215]
other info that might help us debug this:
[ 308.000251] Possible unsafe locking scenario:
[ 308.001137] CPU0
[ 308.001533] ----
[ 308.001915] lock(&(&bond->stats_lock)->rlock#2/2);
[ 308.002609] lock(&(&bond->stats_lock)->rlock#2/2);
[ 308.003302]
*** DEADLOCK ***
[ 308.004310] May be due to missing lock nesting notation
[ 308.005319] 3 locks held by ip/761:
[ 308.005830] #0: ffffffff9fcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
[ 308.006894] #1: ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 308.008243] #2: ffffffff9f9219c0 (rcu_read_lock){....}, at: bond_get_stats+0x9f/0x500 [bonding]
[ 308.009422]
stack backtrace:
[ 308.010124] CPU: 0 PID: 761 Comm: ip Tainted: G W 5.4.0-rc3+ #96
[ 308.011097] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 308.012179] Call Trace:
[ 308.012601] dump_stack+0x7c/0xbb
[ 308.013089] __lock_acquire+0x269d/0x3de0
[ 308.013669] ? register_lock_class+0x14d0/0x14d0
[ 308.014318] lock_acquire+0x164/0x3b0
[ 308.014858] ? bond_get_stats+0xb8/0x500 [bonding]
[ 308.015520] _raw_spin_lock_nested+0x2e/0x60
[ 308.016129] ? bond_get_stats+0xb8/0x500 [bonding]
[ 308.017215] bond_get_stats+0xb8/0x500 [bonding]
[ 308.018454] ? bond_arp_rcv+0xf10/0xf10 [bonding]
[ 308.019710] ? rcu_read_lock_held+0x90/0xa0
[ 308.020605] ? rcu_read_lock_sched_held+0xc0/0xc0
[ 308.021286] ? bond_get_stats+0x9f/0x500 [bonding]
[ 308.021953] dev_get_stats+0x1ec/0x270
[ 308.022508] bond_get_stats+0x1d1/0x500 [bonding]
Fixes: d3fff6c443fe ("net: add netdev_lockdep_set_classes() helper")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some interface types could be nested.
(VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
These interface types should set lockdep class because, without lockdep
class key, lockdep always warn about unexisting circular locking.
In the current code, these interfaces have their own lockdep class keys and
these manage itself. So that there are so many duplicate code around the
/driver/net and /net/.
This patch adds new generic lockdep keys and some helper functions for it.
This patch does below changes.
a) Add lockdep class keys in struct net_device
- qdisc_running, xmit, addr_list, qdisc_busylock
- these keys are used as dynamic lockdep key.
b) When net_device is being allocated, lockdep keys are registered.
- alloc_netdev_mqs()
c) When net_device is being free'd llockdep keys are unregistered.
- free_netdev()
d) Add generic lockdep key helper function
- netdev_register_lockdep_key()
- netdev_unregister_lockdep_key()
- netdev_update_lockdep_key()
e) Remove unnecessary generic lockdep macro and functions
f) Remove unnecessary lockdep code of each interfaces.
After this patch, each interface modules don't need to maintain
their lockdep keys.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current code doesn't limit the number of nested devices.
Nested devices would be handled recursively and this needs huge stack
memory. So, unlimited nested devices could make stack overflow.
This patch adds upper_level and lower_level, they are common variables
and represent maximum lower/upper depth.
When upper/lower device is attached or dettached,
{lower/upper}_level are updated. and if maximum depth is bigger than 8,
attach routine fails and returns -EMLINK.
In addition, this patch converts recursive routine of
netdev_walk_all_{lower/upper} to iterator routine.
Test commands:
ip link add dummy0 type dummy
ip link add link dummy0 name vlan1 type vlan id 1
ip link set vlan1 up
for i in {2..55}
do
let A=$i-1
ip link add vlan$i link vlan$A type vlan id $i
done
ip link del dummy0
Splat looks like:
[ 155.513226][ T908] BUG: KASAN: use-after-free in __unwind_start+0x71/0x850
[ 155.514162][ T908] Write of size 88 at addr ffff8880608a6cc0 by task ip/908
[ 155.515048][ T908]
[ 155.515333][ T908] CPU: 0 PID: 908 Comm: ip Not tainted 5.4.0-rc3+ #96
[ 155.516147][ T908] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 155.517233][ T908] Call Trace:
[ 155.517627][ T908]
[ 155.517918][ T908] Allocated by task 0:
[ 155.518412][ T908] (stack is not available)
[ 155.518955][ T908]
[ 155.519228][ T908] Freed by task 0:
[ 155.519885][ T908] (stack is not available)
[ 155.520452][ T908]
[ 155.520729][ T908] The buggy address belongs to the object at ffff8880608a6ac0
[ 155.520729][ T908] which belongs to the cache names_cache of size 4096
[ 155.522387][ T908] The buggy address is located 512 bytes inside of
[ 155.522387][ T908] 4096-byte region [ffff8880608a6ac0, ffff8880608a7ac0)
[ 155.523920][ T908] The buggy address belongs to the page:
[ 155.524552][ T908] page:ffffea0001822800 refcount:1 mapcount:0 mapping:ffff88806c657cc0 index:0x0 compound_mapcount:0
[ 155.525836][ T908] flags: 0x100000000010200(slab|head)
[ 155.526445][ T908] raw: 0100000000010200 ffffea0001813808 ffffea0001a26c08 ffff88806c657cc0
[ 155.527424][ T908] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 155.528429][ T908] page dumped because: kasan: bad access detected
[ 155.529158][ T908]
[ 155.529410][ T908] Memory state around the buggy address:
[ 155.530060][ T908] ffff8880608a6b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 155.530971][ T908] ffff8880608a6c00: fb fb fb fb fb f1 f1 f1 f1 00 f2 f2 f2 f3 f3 f3
[ 155.531889][ T908] >ffff8880608a6c80: f3 fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 155.532806][ T908] ^
[ 155.533509][ T908] ffff8880608a6d00: fb fb fb fb fb fb fb fb fb f1 f1 f1 f1 00 00 00
[ 155.534436][ T908] ffff8880608a6d80: f2 f3 f3 f3 f3 fb fb fb 00 00 00 00 00 00 00 00
[ ... ]
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot reported the following issue :
BUG: KCSAN: data-race in update_defense_level / update_defense_level
read to 0xffffffff861a6260 of 4 bytes by task 3006 on cpu 1:
update_defense_level+0x621/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:177
defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
write to 0xffffffff861a6260 of 4 bytes by task 7333 on cpu 0:
update_defense_level+0xa62/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:205
defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7333 Comm: kworker/0:5 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events defense_work_handler
Indeed, old_secure_tcp is currently a static variable, while it
needs to be a per netns variable.
Fixes: a0840e2e165a ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
UDP IPv6 packets auto flowlabels are using a 32bit secret
(static u32 hashrnd in net/core/flow_dissector.c) and
apply jhash() over fields known by the receivers.
Attackers can easily infer the 32bit secret and use this information
to identify a device and/or user, since this 32bit secret is only
set at boot time.
Really, using jhash() to generate cookies sent on the wire
is a serious security concern.
Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be
a dead end. Trying to periodically change the secret (like in sch_sfq.c)
could change paths taken in the network for long lived flows.
Let's switch to siphash, as we did in commit df453700e8d8
("inet: switch IP ID generator to siphash")
Using a cryptographically strong pseudo random function will solve this
privacy issue and more generally remove other weak points in the stack.
Packet schedulers using skb_get_hash_perturb() benefit from this change.
Fixes: b56774163f99 ("ipv6: Enable auto flow labels by default")
Fixes: 42240901f7c4 ("ipv6: Implement different admin modes for automatic flow labels")
Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
Fixes: cb1ce2ef387b ("ipv6: Implement automatic flow label generation on transmit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Berger <jonathann1@walla.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ionic driver started using dymamic_hex_dump(), but
that is not always defined:
drivers/net/ethernet/pensando/ionic/ionic_main.c:229:2: error: implicit declaration of function 'dynamic_hex_dump' [-Werror,-Wimplicit-function-declaration]
Add a dummy implementation to use when CONFIG_DYNAMIC_DEBUG
is disabled, printing nothing.
Fixes: 938962d55229 ("ionic: Add adminq action")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
syzkaller managed to trigger the following crash:
[...]
BUG: unable to handle page fault for address: ffffc90001923030
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline]
RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline]
RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline]
RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline]
RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709
[...]
Call Trace:
kernel_text_address kernel/extable.c:147 [inline]
__kernel_text_address+0x9a/0x110 kernel/extable.c:102
unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19
arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26
stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123
save_stack mm/kasan/common.c:69 [inline]
set_track mm/kasan/common.c:77 [inline]
__kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518
slab_post_alloc_hook mm/slab.h:584 [inline]
slab_alloc mm/slab.c:3319 [inline]
kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483
getname_flags+0xba/0x640 fs/namei.c:138
getname+0x19/0x20 fs/namei.c:209
do_sys_open+0x261/0x560 fs/open.c:1091
__do_sys_open fs/open.c:1115 [inline]
__se_sys_open fs/open.c:1110 [inline]
__x64_sys_open+0x87/0x90 fs/open.c:1110
do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
[...]
After further debugging it turns out that we walk kallsyms while in parallel
we tear down a BPF program which contains subprograms that have been JITed
though the program itself has not been fully exposed and is eventually bailing
out with error.
The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes
the symbols, however, bpf_prog_free() tears down the JIT memory too early via
scheduled work. Instead, it needs to properly respect RCU grace period as the
kallsyms walk for BPF is under RCU.
Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error
path where we defer final destruction when we have subprogs in the program.
Fixes: 7d1982b4e335 ("bpf: fix panic in prog load calls cleanup")
Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net
|
|
This patch removes the iph field from the state structure, which is not
properly initialized. Instead, add a new field to make the "do we want
to set DF" be the state bit and move the code to set the DF flag from
ip_frag_next().
Joint work with Pablo and Linus.
Fixes: 19c3401a917b ("net: ipv4: place control buffer handling away from fragmentation iterators")
Reported-by: Patrick Schönthaler <patrick@notvads.ovh>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
"I was battling a cold after some recent trips, so quite a bit piled up
meanwhile, sorry about that.
Highlights:
1) Fix fd leak in various bpf selftests, from Brian Vazquez.
2) Fix crash in xsk when device doesn't support some methods, from
Magnus Karlsson.
3) Fix various leaks and use-after-free in rxrpc, from David Howells.
4) Fix several SKB leaks due to confusion of who owns an SKB and who
should release it in the llc code. From Eric Biggers.
5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet.
6) Jumbo packets don't work after resume on r8169, as the BIOS resets
the chip into non-jumbo mode during suspend. From Heiner Kallweit.
7) Corrupt L2 header during MPLS push, from Davide Caratti.
8) Prevent possible infinite loop in tc_ctl_action, from Eric
Dumazet.
9) Get register bits right in bcmgenet driver, based upon chip
version. From Florian Fainelli.
10) Fix mutex problems in microchip DSA driver, from Marek Vasut.
11) Cure race between route lookup and invalidation in ipv4, from Wei
Wang.
12) Fix performance regression due to false sharing in 'net'
structure, from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits)
net: reorder 'struct net' fields to avoid false sharing
net: dsa: fix switch tree list
net: ethernet: dwmac-sun8i: show message only when switching to promisc
net: aquantia: add an error handling in aq_nic_set_multicast_list
net: netem: correct the parent's backlog when corrupted packet was dropped
net: netem: fix error path for corrupted GSO frames
macb: propagate errors when getting optional clocks
xen/netback: fix error path of xenvif_connect_data()
net: hns3: fix mis-counting IRQ vector numbers issue
net: usb: lan78xx: Connect PHY before registering MAC
vsock/virtio: discard packets if credit is not respected
vsock/virtio: send a credit update when buffer size is changed
mlxsw: spectrum_trap: Push Ethernet header before reporting trap
net: ensure correct skb->tstamp in various fragmenters
net: bcmgenet: reset 40nm EPHY on energy detect
net: bcmgenet: soft reset 40nm EPHYs before MAC init
net: phy: bcm7xxx: define soft_reset for 40nm EPHY
net: bcmgenet: don't set phydev->link from MAC
net: Update address for MediaTek ethernet driver in MAINTAINERS
ipv4: fix race condition between route lookup and invalidation
...
|
|
Intel test robot reported a ~7% regression on TCP_CRR tests
that they bisected to the cited commit.
Indeed, every time a new TCP socket is created or deleted,
the atomic counter net->count is touched (via get_net(net)
and put_net(net) calls)
So cpus might have to reload a contended cache line in
net_hash_mix(net) calls.
We need to reorder 'struct net' fields to move @hash_mix
in a read mostly cache line.
We move in the first cache line fields that can be
dirtied often.
We probably will have to address in a followup patch
the __randomize_layout that was added in linux-4.13,
since this might break our placement choices.
Fixes: 355b98553789 ("netns: provide pure entropy for net_hash_mix()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a few small fixes for the usual suspect, HD- and USB-audio:
enablement of runtime PM for Nvidia due to the recent PCI changes, a
fix for potential hangs with recent HD-audio platforms, and the rest
device-specific quirks"
* tag 'sound-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Force runtime PM on Nvidia HDMI codecs
ALSA: hda/realtek - Enable headset mic on Asus MJ401TA
ALSA: usb-audio: Disable quirks for BOSS Katana amplifiers
ALSA: hdac: clear link output stream mapping
ALSA: hda/realtek: Reduce the Headphone static noise on XPS 9350/9360
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main thing here is a long-awaited workaround for a CPU erratum on
ThunderX2 which we have developed in conjunction with engineers from
Cavium/Marvell.
At the moment, the workaround is unconditionally enabled for affected
CPUs at runtime but we may add a command-line option to disable it in
future if performance numbers show up indicating a significant cost
for real workloads.
Summary:
- Work around Cavium/Marvell ThunderX2 erratum #219
- Fix regression in mlock() ABI caused by sign-extension of TTBR1 addresses
- More fixes to the spurious kernel fault detection logic
- Fix pathological preemption race when enabling some CPU features at boot
- Drop broken kcore macros in favour of generic implementations
- Fix userspace view of ID_AA64ZFR0_EL1 when SVE is disabled
- Avoid NULL dereference on allocation failure during hibernation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: tags: Preserve tags for addresses translated via TTBR1
arm64: mm: fix inverted PAR_EL1.F check
arm64: sysreg: fix incorrect definition of SYS_PAR_EL1_F
arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
arm64: hibernate: check pgd table allocation
arm64: cpufeature: Treat ID_AA64ZFR0_EL1 as RAZ when SVE is not enabled
arm64: Fix kcore macros after 52-bit virtual addressing fallout
arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
|
|
The KSZ8795 PHY ID is in fact used by KSZ8794/KSZ8795/KSZ8765 switches.
Update the PHY ID and name to reflect that, as this family of switches
is commonly refered to as KSZ87xx
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Workaround for Cavium/Marvell ThunderX2 erratum #219.
* errata/tx2-219:
arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected
arm64: Avoid Cavium TX2 erratum 219 when switching TTBR
arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"The fixes pertain to a problem with initializing the Intel GPIO
irqchips when adding gpiochips.
Andy fixed it up elegantly by adding a hardware initialization
callback to the struct gpio_irq_chip so let's use this. Tested and
verified on the target hardware"
* tag 'gpio-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: lynxpoint: set default handler to be handle_bad_irq()
gpio: merrifield: Move hardware initialization to callback
gpio: lynxpoint: Move hardware initialization to callback
gpio: intel-mid: Move hardware initialization to callback
gpiolib: Initialize the hardware with a callback
gpio: merrifield: Restore use of irq_base
|
|
Preempting from IRQ-return means that the task has its PSTATE saved
on the stack, which will get restored when the task is resumed and does
the actual IRQ return.
However, enabling some CPU features requires modifying the PSTATE. This
means that, if a task was scheduled out during an IRQ-return before all
CPU features are enabled, the task might restore a PSTATE that does not
include the feature enablement changes once scheduled back in.
* Task 1:
PAN == 0 ---| |---------------
| |<- return from IRQ, PSTATE.PAN = 0
| <- IRQ |
+--------+ <- preempt() +--
^
|
reschedule Task 1, PSTATE.PAN == 1
* Init:
--------------------+------------------------
^
|
enable_cpu_features
set PSTATE.PAN on all CPUs
Worse than this, since PSTATE is untouched when task switching is done,
a task missing the new bits in PSTATE might affect another task, if both
do direct calls to schedule() (outside of IRQ/exception contexts).
Fix this by preventing preemption on IRQ-return until features are
enabled on all CPUs.
This way the only PSTATE values that are saved on the stack are from
synchronous exceptions. These are expected to be fatal this early, the
exception is BRK for WARN_ON(), but as this uses do_debug_exception()
which keeps IRQs masked, it shouldn't call schedule().
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[james: Replaced a really cool hack, with an even simpler static key in C.
expanded commit message with Julien's cover-letter ascii art]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
the following script:
# tc qdisc add dev eth0 clsact
# tc filter add dev eth0 egress protocol ip matchall \
> action mpls push protocol mpls_uc label 0x355aa bos 1
causes corruption of all IP packets transmitted by eth0. On TC egress, we
can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
operation will result in an overwrite of the first 4 octets in the packet
L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
error pattern is present also in the MPLS 'pop' operation. Fix this error
in act_mpls data plane, computing 'mac_len' as the difference between the
network header and the mac header (when not at TC ingress), and use it in
MPLS 'push'/'pop' core functions.
v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
in skb_mpls_pop(), reported by kbuild test robot
CC: Lorenzo Bianconi <lorenzo@kernel.org>
Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Five changes, two in drivers (qla2xxx, zfcp), one to MAINTAINERS
(qla2xxx) and two in the core.
The last two are mostly about removing incorrect messages from the
kernel log: the resid message is definitely wrong and the sync cache
on protected drive problem is arguably wrong"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: MAINTAINERS: Update qla2xxx driver
scsi: zfcp: fix reaction on bit error threshold notification
scsi: core: save/restore command resid for error handling
scsi: qla2xxx: Remove WARN_ON_ONCE in qla2x00_status_cont_entry()
scsi: sd: Ignore a failure to sync cache due to lack of authorization
|
|
After changing the drivers to use GPIO core to add an IRQ chip
it appears that some of them requires a hardware initialization
before adding the IRQ chip.
Add an optional callback ->init_hw() to allow that drivers
to initialize hardware if needed.
This change is a part of the fix NULL pointer dereference
brought to the several drivers recently.
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Fix (Sphinx) kernel-doc warning in <linux/xarray.h>:
include/linux/xarray.h:232: WARNING: Unexpected indentation.
Link: http://lkml.kernel.org/r/89ba2134-ce23-7c10-5ee1-ef83b35aa984@infradead.org
Fixes: a3e4d3f97ec8 ("XArray: Redesign xa_alloc API")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix kernel-doc warning in <linux/bitmap.h>:
include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal'
Also fix small typo (bitnaps).
Link: http://lkml.kernel.org/r/0729ea7a-2c0d-b2c5-7dd3-3629ee0803e2@infradead.org
Fixes: b9fa6442f704 ("cpumask: Implement cpumask_or_equal()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 37389167a281 ("mm, page_owner: keep owner info when freeing the
page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page
is tracked as being allocated. Kirril suggested naming it
PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat
loaded term for a page".
Link: http://lkml.kernel.org/r/20190930122916.14969-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "followups to debug_pagealloc improvements through
page_owner", v3.
These are followups to [1] which made it to Linus meanwhile. Patches 1
and 3 are based on Kirill's review, patch 2 on KASAN request [2]. It
would be nice if all of this made it to 5.4 with [1] already there (or
at least Patch 1).
This patch (of 3):
As noted by Kirill, commit 7e2f2a0cd17c ("mm, page_owner: record page
owner for each subpage") has introduced an off-by-one error in
__set_page_owner_handle() when looking up page_ext for subpages. As a
result, the head page page_owner info is set twice, while for the last
tail page, it's not set at all.
Fix this and also make the code more efficient by advancing the page_ext
pointer we already have, instead of calling lookup_page_ext() for each
subpage. Since the full size of struct page_ext is not known at compile
time, we can't use a simple page_ext++ statement, so introduce a
page_ext_next() inline function for that.
Link: http://lkml.kernel.org/r/20190930122916.14969-2-vbabka@suse.cz
Fixes: 7e2f2a0cd17c ("mm, page_owner: record page owner for each subpage")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Reported-by: Miles Chen <miles.chen@mediatek.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_wmem_queued while this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.
sk_wmem_queued_add() helper is added so that we can in
the future convert to ADD_ONCE() or equivalent if/when
available.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_sndbuf while this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.
Note that other transports probably need similar fixes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For the sake of tcp_poll(), there are few places where we fetch
sk->sk_rcvbuf while this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.
Note that other transports probably need similar fixes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are few places where we fetch tp->snd_nxt while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are few places where we fetch tp->write_seq while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to avoid
store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Both tcp_v4_err() and tcp_v6_err() do the following operations
while they do not own the socket lock :
fastopen = tp->fastopen_rsk;
snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;
The problem is that without appropriate barrier, the compiler
might reload tp->fastopen_rsk and trigger a NULL deref.
request sockets are protected by RCU, we can simply add
the missing annotations and barriers to solve the issue.
Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Update/fix inspur-ipsps1 and k10temp Documentation
- Fix nct7904 driver
- Fix HWMON_P_MIN_ALARM mask in hwmon core
* tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: docs: Extend inspur-ipsps1 title underline
hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in nct7904_data struct.
docs: hwmon: Include 'inspur-ipsps1.rst' into docs
hwmon: Fix HWMON_P_MIN_ALARM mask
hwmon: (k10temp) Update documentation and add temp2_input info
hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data struct
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 5.4-rc3 that
resolve a number of reported issues and regressions.
None of these are huge, full details are in the shortlog. There's also
a MAINTAINERS update that I think you might have already taken in your
tree already, but git should handle that merge easily.
All have been in linux-next with no reported issues"
* tag 'tty-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
MAINTAINERS: kgdb: Add myself as a reviewer for kgdb/kdb
tty: serial: imx: Use platform_get_irq_optional() for optional IRQs
serial: fix kernel-doc warning in comments
serial: 8250_omap: Fix gpio check for auto RTS/CTS
serial: mctrl_gpio: Check for NULL pointer
tty: serial: fsl_lpuart: Fix lpuart_flush_buffer()
tty: serial: Fix PORT_LINFLEXUART definition
tty: n_hdlc: fix build on SPARC
serial: uartps: Fix uartps_major handling
serial: uartlite: fix exit path null pointer
tty: serial: linflexuart: Fix magic SysRq handling
serial: sh-sci: Use platform_get_irq_optional() for optional interrupts
dt-bindings: serial: sh-sci: Document r8a774b1 bindings
serial/sifive: select SERIAL_EARLYCON
tty: serial: rda: Fix the link time qualifier of 'rda_uart_exit()'
tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a lot of small USB driver fixes for 5.4-rc3.
syzbot has stepped up its testing of the USB driver stack, now able to
trigger fun race conditions between disconnect and probe functions.
Because of that we have a lot of fixes in here from Johan and others
fixing these reported issues that have been around since almost all
time.
We also are just deleting the rio500 driver, making all of the syzbot
bugs found in it moot as it turns out no one has been using it for
years as there is a userspace version that is being used instead.
There are also a number of other small fixes in here, all resolving
reported issues or regressions.
All have been in linux-next without any reported issues"
* tag 'usb-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (65 commits)
USB: yurex: fix NULL-derefs on disconnect
USB: iowarrior: use pr_err()
USB: iowarrior: drop redundant iowarrior mutex
USB: iowarrior: drop redundant disconnect mutex
USB: iowarrior: fix use-after-free after driver unbind
USB: iowarrior: fix use-after-free on release
USB: iowarrior: fix use-after-free on disconnect
USB: chaoskey: fix use-after-free on release
USB: adutux: fix use-after-free on release
USB: ldusb: fix NULL-derefs on driver unbind
USB: legousbtower: fix use-after-free on release
usb: cdns3: Fix for incorrect DMA mask.
usb: cdns3: fix cdns3_core_init_role()
usb: cdns3: gadget: Fix full-speed mode
USB: usb-skeleton: drop redundant in-urb check
USB: usb-skeleton: fix use-after-free after driver unbind
USB: usb-skeleton: fix NULL-deref on disconnect
usb:cdns3: Fix for CV CH9 running with g_zero driver.
usb: dwc3: Remove dev_err() on platform_get_irq() failure
usb: dwc3: Switch to platform_get_irq_byname_optional()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Misc EFI fixes all across the map: CPER error report fixes, fixes to
TPM event log parsing, fix for a kexec hang, a Sparse fix and other
fixes"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/tpm: Fix sanity check of unsigned tbl_size being less than zero
efi/x86: Do not clean dummy variable in kexec path
efi: Make unexported efi_rci2_sysfs_init() static
efi/tpm: Only set 'efi_tpm_final_log_size' after successful event log parsing
efi/tpm: Don't traverse an event log with no events
efi/tpm: Don't access event->count when it isn't mapped
efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified
efi/cper: Fix endianness of PCIe class code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"A handful of fixes: a kexec linking fix, an AMD MWAITX fix, a vmware
guest support fix when built under Clang, and new CPU model number
definitions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add Comet Lake to the Intel CPU models header
lib/string: Make memzero_explicit() inline instead of external
x86/cpu/vmware: Use the full form of INL in VMWARE_PORT
x86/asm: Fix MWAITX C-state hint value
|
|
Pull NFS client bugfixes from Anna Schumaker:
"Stable bugfixes:
- Fix O_DIRECT accounting of number of bytes read/written # v4.1+
Other fixes:
- Fix nfsi->nrequests count error on nfs_inode_remove_request()
- Remove redundant mirror tracking in O_DIRECT
- Fix leak of clp->cl_acceptor string
- Fix race to sk_err after xs_error_report"
* tag 'nfs-for-5.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: fix race to sk_err after xs_error_report
NFSv4: Fix leak of clp->cl_acceptor string
NFS: Remove redundant mirror tracking in O_DIRECT
NFS: Fix O_DIRECT accounting of number of bytes read/written
nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module fixes from Jessica Yu:
"Code cleanups and kbuild/namespace related fixups from Masahiro.
Most importantly, it fixes a namespace-related modpost issue for
external module builds
- Fix broken external module builds due to a modpost bug in
read_dump(), where the namespace was not being strdup'd and
sym->namespace would be set to bogus data.
- Various namespace-related kbuild fixes and cleanups thanks to
Masahiro Yamada"
* tag 'modules-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
doc: move namespaces.rst from kbuild/ to core-api/
nsdeps: make generated patches independent of locale
nsdeps: fix hashbang of scripts/nsdeps
kbuild: fix build error of 'make nsdeps' in clean tree
module: rename __kstrtab_ns_* to __kstrtabns_* to avoid symbol conflict
modpost: fix broken sym->namespace for external module builds
module: swap the order of symbol.namespace
scripts: add_namespace: Fix coccicheck failed
|
|
Reserve the pseudo keyword 'fallthrough' for the ability to convert the
various case block /* fallthrough */ style comments to appear to be an
actual reserved word with the same gcc case block missing fallthrough
warning capability.
All switch/case blocks now should end in one of:
break;
fallthrough;
goto <label>;
return [expression];
continue;
In C mode, GCC supports the __fallthrough__ attribute since 7.1,
the same time the warning and the comment parsing were introduced.
fallthrough devolves to an empty "do {} while (0)" if the compiler
version (any version less than gcc 7) does not support the attribute.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|