Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit b1c17a9a353878602fd5bfe9103e4afe5e9a3f96 ]
Various things in eBPF really require us to disable preemption
before running an eBPF program.
syzbot reported :
BUG: assuming atomic context at net/core/flow_dissector.c:737
in_atomic(): 0, irqs_disabled(): 0, pid: 24710, name: syz-executor.3
2 locks held by syz-executor.3/24710:
#0: 00000000e81a4bf1 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x168e/0x3ff0 drivers/net/tun.c:1850
#1: 00000000254afebd (rcu_read_lock){....}, at: __skb_flow_dissect+0x1e1/0x4bb0 net/core/flow_dissector.c:822
CPU: 1 PID: 24710 Comm: syz-executor.3 Not tainted 5.1.0+ #6
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
__cant_sleep kernel/sched/core.c:6165 [inline]
__cant_sleep.cold+0xa3/0xbb kernel/sched/core.c:6142
bpf_flow_dissect+0xfe/0x390 net/core/flow_dissector.c:737
__skb_flow_dissect+0x362/0x4bb0 net/core/flow_dissector.c:853
skb_flow_dissect_flow_keys_basic include/linux/skbuff.h:1322 [inline]
skb_probe_transport_header include/linux/skbuff.h:2500 [inline]
skb_probe_transport_header include/linux/skbuff.h:2493 [inline]
tun_get_user+0x2cfe/0x3ff0 drivers/net/tun.c:1940
tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037
call_write_iter include/linux/fs.h:1872 [inline]
do_iter_readv_writev+0x5fd/0x900 fs/read_write.c:693
do_iter_write fs/read_write.c:970 [inline]
do_iter_write+0x184/0x610 fs/read_write.c:951
vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
do_writev+0x15b/0x330 fs/read_write.c:1058
__do_sys_writev fs/read_write.c:1131 [inline]
__se_sys_writev fs/read_write.c:1128 [inline]
__x64_sys_writev+0x75/0xb0 fs/read_write.c:1128
do_syscall_64+0x103/0x670 arch/x86/entry/common.c:298
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Petar Penkov <ppenkov@google.com>
Cc: Stanislav Fomichev <sdf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
not supplied
[ Upstream commit e9919a24d3022f72bcadc407e73a6ef17093a849 ]
With commit 153380ec4b9 ("fib_rules: Added NLM_F_EXCL support to
fib_nl_newrule") we now able to check if a rule already exists. But this
only works with iproute2. For other tools like libnl, NetworkManager,
it still could add duplicate rules with only NLM_F_CREATE flag, like
[localhost ~ ]# ip rule
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
100000: from 192.168.7.5 lookup 5
100000: from 192.168.7.5 lookup 5
As it doesn't make sense to create two duplicate rules, let's just return
0 if the rule exists.
Fixes: 153380ec4b9 ("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule")
Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4c3024debf62de4c6ac6d3cb4c0063be21d4f652 ]
BPF can adjust gso only for tcp bytestreams. Fail on other gso types.
But only on gso packets. It does not touch this field if !gso_size.
Fixes: b90efd225874 ("bpf: only adjust gso_size on bytestream protocols")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
|
|
[ Upstream commit d85e8be2a5a02869f815dd0ac2d743deb4cd7957 ]
skb_reorder_vlan_header() should move XDP meta data with ethernet header
if XDP meta data exists.
Fixes: de8f3a83b0a0 ("bpf: add meta pointer for direct access")
Signed-off-by: Yuya Kusakabe <yuya.kusakabe@gmail.com>
Signed-off-by: Takeru Hayasaka <taketarou2@gmail.com>
Co-developed-by: Takeru Hayasaka <taketarou2@gmail.com>
Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8065a779f17e94536a1c4dcee4f9d88011672f97 ]
When a netdev appears through hot plug then gets enslaved by a failover
master that is already up and running, the slave will be opened
right away after getting enslaved. Today there's a race that userspace
(udev) may fail to rename the slave if the kernel (net_failover)
opens the slave earlier than when the userspace rename happens.
Unlike bond or team, the primary slave of failover can't be renamed by
userspace ahead of time, since the kernel initiated auto-enslavement is
unable to, or rather, is never meant to be synchronized with the rename
request from userspace.
As the failover slave interfaces are not designed to be operated
directly by userspace apps: IP configuration, filter rules with
regard to network traffic passing and etc., should all be done on master
interface. In general, userspace apps only care about the
name of master interface, while slave names are less important as long
as admin users can see reliable names that may carry
other information describing the netdev. For e.g., they can infer that
"ens3nsby" is a standby slave of "ens3", while for a
name like "eth0" they can't tell which master it belongs to.
Historically the name of IFF_UP interface can't be changed because
there might be admin script or management software that is already
relying on such behavior and assumes that the slave name can't be
changed once UP. But failover is special: with the in-kernel
auto-enslavement mechanism, the userspace expectation for device
enumeration and bring-up order is already broken. Previously initramfs
and various userspace config tools were modified to bypass failover
slaves because of auto-enslavement and duplicate MAC address. Similarly,
in case that users care about seeing reliable slave name, the new type
of failover slaves needs to be taken care of specifically in userspace
anyway.
It's less risky to lift up the rename restriction on failover slave
which is already UP. Although it's possible this change may potentially
break userspace component (most likely configuration scripts or
management software) that assumes slave name can't be changed while
UP, it's relatively a limited and controllable set among all userspace
components, which can be fixed specifically to listen for the rename
events on failover slaves. Userspace component interacting with slaves
is expected to be changed to operate on failover master interface
instead, as the failover slave is dynamic in nature which may come and
go at any point. The goal is to make the role of failover slaves less
relevant, and userspace components should only deal with failover master
in the long run.
Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9a5a90d167b0e5fe3d47af16b68fd09ce64085cd ]
__netif_receive_skb_list_ptype() leaves skb->next poisoned before passing
it to pt_prev->func handler, what may produce (in certain cases, e.g. DSA
setup) crashes like:
[ 88.606777] CPU 0 Unable to handle kernel paging request at virtual address 0000000e, epc == 80687078, ra == 8052cc7c
[ 88.618666] Oops[#1]:
[ 88.621196] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc2-dlink-00206-g4192a172-dirty #1473
[ 88.630885] $ 0 : 00000000 10000400 00000002 864d7850
[ 88.636709] $ 4 : 87c0ddf0 864d7800 87c0ddf0 00000000
[ 88.642526] $ 8 : 00000000 49600000 00000001 00000001
[ 88.648342] $12 : 00000000 c288617b dadbee27 25d17c41
[ 88.654159] $16 : 87c0ddf0 85cff080 80790000 fffffffd
[ 88.659975] $20 : 80797b20 ffffffff 00000001 864d7800
[ 88.665793] $24 : 00000000 8011e658
[ 88.671609] $28 : 80790000 87c0dbc0 87cabf00 8052cc7c
[ 88.677427] Hi : 00000003
[ 88.680622] Lo : 7b5b4220
[ 88.683840] epc : 80687078 vlan_dev_hard_start_xmit+0x1c/0x1a0
[ 88.690532] ra : 8052cc7c dev_hard_start_xmit+0xac/0x188
[ 88.696734] Status: 10000404 IEp
[ 88.700422] Cause : 50000008 (ExcCode 02)
[ 88.704874] BadVA : 0000000e
[ 88.708069] PrId : 0001a120 (MIPS interAptiv (multi))
[ 88.713005] Modules linked in:
[ 88.716407] Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000)
[ 88.725219] Stack : 85f61c28 00000000 0000000e 80780000 87c0ddf0 85cff080 80790000 8052cc7c
[ 88.734529] 87cabf00 00000000 00000001 85f5fb40 807b0000 864d7850 87cabf00 807d0000
[ 88.743839] 864d7800 8655f600 00000000 85cff080 87c1c000 0000006a 00000000 8052d96c
[ 88.753149] 807a0000 8057adb8 87c0dcc8 87c0dc50 85cfff08 00000558 87cabf00 85f58c50
[ 88.762460] 00000002 85f58c00 864d7800 80543308 fffffff4 00000001 85f58c00 864d7800
[ 88.771770] ...
[ 88.774483] Call Trace:
[ 88.777199] [<80687078>] vlan_dev_hard_start_xmit+0x1c/0x1a0
[ 88.783504] [<8052cc7c>] dev_hard_start_xmit+0xac/0x188
[ 88.789326] [<8052d96c>] __dev_queue_xmit+0x6e8/0x7d4
[ 88.794955] [<805a8640>] ip_finish_output2+0x238/0x4d0
[ 88.800677] [<805ab6a0>] ip_output+0xc8/0x140
[ 88.805526] [<805a68f4>] ip_forward+0x364/0x560
[ 88.810567] [<805a4ff8>] ip_rcv+0x48/0xe4
[ 88.815030] [<80528d44>] __netif_receive_skb_one_core+0x44/0x58
[ 88.821635] [<8067f220>] dsa_switch_rcv+0x108/0x1ac
[ 88.827067] [<80528f80>] __netif_receive_skb_list_core+0x228/0x26c
[ 88.833951] [<8052ed84>] netif_receive_skb_list+0x1d4/0x394
[ 88.840160] [<80355a88>] lunar_rx_poll+0x38c/0x828
[ 88.845496] [<8052fa78>] net_rx_action+0x14c/0x3cc
[ 88.850835] [<806ad300>] __do_softirq+0x178/0x338
[ 88.856077] [<8012a2d4>] irq_exit+0xbc/0x100
[ 88.860846] [<802f8b70>] plat_irq_dispatch+0xc0/0x144
[ 88.866477] [<80105974>] handle_int+0x14c/0x158
[ 88.871516] [<806acfb0>] r4k_wait+0x30/0x40
[ 88.876462] Code: afb10014 8c8200a0 00803025 <9443000c> 94a20468 00000000 10620042 00a08025 9605046a
[ 88.887332]
[ 88.888982] ---[ end trace eb863d007da11cf1 ]---
[ 88.894122] Kernel panic - not syncing: Fatal exception in interrupt
[ 88.901202] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Fix this by pulling skb off the sublist and zeroing skb->next pointer
before calling ptype callback.
Fixes: 88eb1944e18c ("net: core: propagate SKB lists through packet_type lookup")
Reviewed-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]
net_hash_mix() currently uses kernel address of a struct net,
and is used in many places that could be used to reveal this
address to a patient attacker, thus defeating KASLR, for
the typical case (initial net namespace, &init_net is
not dynamically allocated)
I believe the original implementation tried to avoid spending
too many cycles in this function, but security comes first.
Also provide entropy regardless of CONFIG_NET_NS.
Fixes: 0b4419162aa6 ("netns: introduce the net_hash_mix "salt" for hashes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0ab03f353d3613ea49d1f924faf98559003670a8 ]
Currently we may merge incorrectly a received GSO packet
or a packet with frag_list into a packet sitting in the
gro_hash list. skb_segment() may crash case because
the assumptions on the skb layout are not met.
The correct behaviour would be to flush the packet in the
gro_hash list and send the received GSO packet directly
afterwards. Commit d61d072e87c8e ("net-gro: avoid reorders")
sets NAPI_GRO_CB(skb)->flush in this case, but this is not
checked before merging. This patch makes sure to check this
flag and to not merge in that case.
Fixes: d61d072e87c8e ("net-gro: avoid reorders")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3d8830266ffc28c16032b859e38a0252e014b631 ]
NULL or ZERO_SIZE_PTR will be returned for zero sized memory
request, and derefencing them will lead to a segfault
so it is unnecessory to call vzalloc for zero sized memory
request and not call functions which maybe derefence the
NULL allocated memory
this also fixes a possible memory leak if phy_ethtool_get_stats
returns error, memory should be freed before exit
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Wang Li <wangli39@baidu.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a3e23f719f5c4a38ffb3d30c8d7632a4ed8ccd9e ]
In netdev_queue_add_kobject and rx_queue_add_kobject,
if sysfs_create_group failed, kobject_put will call
netdev_queue_release to decrease dev refcont, however
dev_hold has not be called. So we will see this while
unregistering dev:
unregister_netdevice: waiting for bcsh0 to become free. Usage count = -1
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: d0d668371679 ("net: don't decrement kobj reference count on init failure")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0b91bce1ebfc797ff3de60c8f4a1e6219a8a3187 ]
Christoph reported a stall while peeking datagram with an offset when
busy polling is enabled. __skb_try_recv_datagram() uses as the loop
termination condition 'queue empty'. When peeking, the socket
queue can be not empty, even when no additional packets are received.
Address the issue explicitly checking for receive queue changes,
as currently done by __skb_wait_for_more_packets().
Fixes: 2b5cd0dfa384 ("net: Change return type of sk_busy_loop from bool to void")
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2a5ff07a0eb945f291e361aa6f6becca8340ba46 ]
We keep receiving syzbot reports [1] that show that tunnels do not play
the rcu/IFF_UP rules properly.
At device dismantle phase, gro_cells_destroy() will be called
only after a full rcu grace period is observed after IFF_UP
has been cleared.
This means that IFF_UP needs to be tested before queueing packets
into netif_rx() or gro_cells.
This patch implements the test in gro_cells_receive() because
too many callers do not seem to bother enough.
[1]
BUG: unable to handle kernel paging request at fffff4ca0b9ffffe
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.0.0+ #97
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
RIP: 0010:__skb_unlink include/linux/skbuff.h:1929 [inline]
RIP: 0010:__skb_dequeue include/linux/skbuff.h:1945 [inline]
RIP: 0010:__skb_queue_purge include/linux/skbuff.h:2656 [inline]
RIP: 0010:gro_cells_destroy net/core/gro_cells.c:89 [inline]
RIP: 0010:gro_cells_destroy+0x19d/0x360 net/core/gro_cells.c:78
Code: 03 42 80 3c 20 00 0f 85 53 01 00 00 48 8d 7a 08 49 8b 47 08 49 c7 07 00 00 00 00 48 89 f9 49 c7 47 08 00 00 00 00 48 c1 e9 03 <42> 80 3c 21 00 0f 85 10 01 00 00 48 89 c1 48 89 42 08 48 c1 e9 03
RSP: 0018:ffff8880aa3f79a8 EFLAGS: 00010a02
RAX: 00ffffffffffffe8 RBX: ffffe8ffffc64b70 RCX: 1ffff8ca0b9ffffe
RDX: ffffc6505cffffe8 RSI: ffffffff858410ca RDI: ffffc6505cfffff0
RBP: ffff8880aa3f7a08 R08: ffff8880aa3e8580 R09: fffffbfff1263645
R10: fffffbfff1263644 R11: ffffffff8931b223 R12: dffffc0000000000
R13: 0000000000000000 R14: ffffe8ffffc64b80 R15: ffffe8ffffc64b75
kobject: 'loop2' (000000004bd7d84a): kobject_uevent_env
FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffff4ca0b9ffffe CR3: 0000000094941000 CR4: 00000000001406f0
Call Trace:
kobject: 'loop2' (000000004bd7d84a): fill_kobj_path: path = '/devices/virtual/block/loop2'
ip_tunnel_dev_free+0x19/0x60 net/ipv4/ip_tunnel.c:1010
netdev_run_todo+0x51c/0x7d0 net/core/dev.c:8970
rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:116
ip_tunnel_delete_nets+0x423/0x5f0 net/ipv4/ip_tunnel.c:1124
vti_exit_batch_net+0x23/0x30 net/ipv4/ip_vti.c:495
ops_exit_list.isra.0+0x105/0x160 net/core/net_namespace.c:156
cleanup_net+0x3fb/0x960 net/core/net_namespace.c:551
process_one_work+0x98e/0x1790 kernel/workqueue.c:2173
worker_thread+0x98/0xe40 kernel/workqueue.c:2319
kthread+0x357/0x430 kernel/kthread.c:246
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Modules linked in:
CR2: fffff4ca0b9ffffe
[ end trace 513fc9c1338d1cb3 ]
RIP: 0010:__skb_unlink include/linux/skbuff.h:1929 [inline]
RIP: 0010:__skb_dequeue include/linux/skbuff.h:1945 [inline]
RIP: 0010:__skb_queue_purge include/linux/skbuff.h:2656 [inline]
RIP: 0010:gro_cells_destroy net/core/gro_cells.c:89 [inline]
RIP: 0010:gro_cells_destroy+0x19d/0x360 net/core/gro_cells.c:78
Code: 03 42 80 3c 20 00 0f 85 53 01 00 00 48 8d 7a 08 49 8b 47 08 49 c7 07 00 00 00 00 48 89 f9 49 c7 47 08 00 00 00 00 48 c1 e9 03 <42> 80 3c 21 00 0f 85 10 01 00 00 48 89 c1 48 89 42 08 48 c1 e9 03
RSP: 0018:ffff8880aa3f79a8 EFLAGS: 00010a02
RAX: 00ffffffffffffe8 RBX: ffffe8ffffc64b70 RCX: 1ffff8ca0b9ffffe
RDX: ffffc6505cffffe8 RSI: ffffffff858410ca RDI: ffffc6505cfffff0
RBP: ffff8880aa3f7a08 R08: ffff8880aa3e8580 R09: fffffbfff1263645
R10: fffffbfff1263644 R11: ffffffff8931b223 R12: dffffc0000000000
kobject: 'loop3' (00000000e4ee57a6): kobject_uevent_env
R13: 0000000000000000 R14: ffffe8ffffc64b80 R15: ffffe8ffffc64b75
FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffff4ca0b9ffffe CR3: 0000000094941000 CR4: 00000000001406f0
Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e8e3437762ad938880dd48a3c52d702e7cf3c124 upstream.
We might have never enabled (started) the psock's parser, in which case it
will not get stopped when destroying the psock. This leads to a warning
when trying to cancel parser's work from psock's deferred destructor:
[ 405.325769] WARNING: CPU: 1 PID: 3216 at net/strparser/strparser.c:526 strp_done+0x3c/0x40
[ 405.326712] Modules linked in: [last unloaded: test_bpf]
[ 405.327359] CPU: 1 PID: 3216 Comm: kworker/1:164 Tainted: G W 5.0.0 #42
[ 405.328294] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
[ 405.329712] Workqueue: events sk_psock_destroy_deferred
[ 405.330254] RIP: 0010:strp_done+0x3c/0x40
[ 405.330706] Code: 28 e8 b8 d5 6b ff 48 8d bb 80 00 00 00 e8 9c d5 6b ff 48 8b 7b 18 48 85 ff 74 0d e8 1e a5 e8 ff 48 c7 43 18 00 00 00 00 5b c3 <0f> 0b eb cf 66 66 66 66 90 55 89 f5 53 48 89 fb 48 83 c7 28 e8 0b
[ 405.332862] RSP: 0018:ffffc900026bbe50 EFLAGS: 00010246
[ 405.333482] RAX: ffffffff819323e0 RBX: ffff88812cb83640 RCX: ffff88812cb829e8
[ 405.334228] RDX: 0000000000000001 RSI: ffff88812cb837e8 RDI: ffff88812cb83640
[ 405.335366] RBP: ffff88813fd22680 R08: 0000000000000000 R09: 000073746e657665
[ 405.336472] R10: 8080808080808080 R11: 0000000000000001 R12: ffff88812cb83600
[ 405.337760] R13: 0000000000000000 R14: ffff88811f401780 R15: ffff88812cb837e8
[ 405.338777] FS: 0000000000000000(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
[ 405.339903] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 405.340821] CR2: 00007fb11489a6b8 CR3: 000000012d4d6000 CR4: 00000000000406e0
[ 405.341981] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 405.343131] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 405.344415] Call Trace:
[ 405.344821] sk_psock_destroy_deferred+0x23/0x1b0
[ 405.345585] process_one_work+0x1ae/0x3e0
[ 405.346110] worker_thread+0x3c/0x3b0
[ 405.346576] ? pwq_unbound_release_workfn+0xd0/0xd0
[ 405.347187] kthread+0x11d/0x140
[ 405.347601] ? __kthread_parkme+0x80/0x80
[ 405.348108] ret_from_fork+0x35/0x40
[ 405.348566] ---[ end trace a4a3af4026a327d4 ]---
Stop psock's parser just before canceling its work.
Fixes: 1d79895aef18 ("sk_msg: Always cancel strp work before freeing the psock")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 895a5e96dbd6386c8e78e5b78e067dcc67b7f0ab ]
syzkaller report this:
BUG: memory leak
unreferenced object 0xffff88837a71a500 (size 256):
comm "syz-executor.2", pid 9770, jiffies 4297825125 (age 17.843s)
hex dump (first 32 bytes):
00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
ff ff ff ff ff ff ff ff 20 c0 ef 86 ff ff ff ff ........ .......
backtrace:
[<00000000db12624b>] netdev_register_kobject+0x124/0x2e0 net/core/net-sysfs.c:1751
[<00000000dc49a994>] register_netdevice+0xcc1/0x1270 net/core/dev.c:8516
[<00000000e5f3fea0>] tun_set_iff drivers/net/tun.c:2649 [inline]
[<00000000e5f3fea0>] __tun_chr_ioctl+0x2218/0x3d20 drivers/net/tun.c:2883
[<000000001b8ac127>] vfs_ioctl fs/ioctl.c:46 [inline]
[<000000001b8ac127>] do_vfs_ioctl+0x1a5/0x10e0 fs/ioctl.c:690
[<0000000079b269f8>] ksys_ioctl+0x89/0xa0 fs/ioctl.c:705
[<00000000de649beb>] __do_sys_ioctl fs/ioctl.c:712 [inline]
[<00000000de649beb>] __se_sys_ioctl fs/ioctl.c:710 [inline]
[<00000000de649beb>] __x64_sys_ioctl+0x74/0xb0 fs/ioctl.c:710
[<000000007ebded1e>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
[<00000000db315d36>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<00000000115be9bb>] 0xffffffffffffffff
It should call kset_unregister to free 'dev->queues_kset'
in error path of register_queue_kobjects, otherwise will cause a mem leak.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 1d24eb4815d1 ("xps: Transmit Packet Steering")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 46b1c18f9deb326a7e18348e668e4c7ab7c7458b ]
In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'")
John made the assumption that the data path had no need to read
the qdisc qlen (number of packets in the qdisc).
It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
children.
But pfifo_fast can be used as leaf in class full qdiscs, and existing
logic needs to access the child qlen in an efficient way.
HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
htb_activate() -> WARN_ON()
htb_dequeue_tree() to decide if a class can be htb_deactivated
when it has no more packets.
HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
qdisc_tree_reduce_backlog() also read q.qlen directly.
Using qdisc_qlen_sum() (which iterates over all possible cpus)
in the data path is a non starter.
It seems we have to put back qlen in a central location,
at least for stable kernels.
For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
so the existing q.qlen{++|--} are correct.
For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
because the spinlock might be not held (for example from
pfifo_fast_enqueue() and pfifo_fast_dequeue())
This patch adds atomic_qlen (in the same location than qlen)
and renames the following helpers, since we want to express
they can be used without qdisc lock, and that qlen is no longer percpu.
- qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
- qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()
Later (net-next) we might revert this patch by tracking all these
qlen uses and replace them by a more efficient method (not having
to access a precise qlen, but an empty/non_empty status that might
be less expensive to maintain/track).
Another possibility is to have a legacy pfifo_fast version that would
be used when used a a child qdisc, since the parent qdisc needs
a spinlock anyway. But then, future lockless qdiscs would also
have the same problem.
Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This patch addresses the fact that there are drivers, specifically tun,
that will call into the network page fragment allocators with buffer sizes
that are not cache aligned. Doing this could result in data alignment
and DMA performance issues as these fragment pools are also shared with the
skb allocator and any other devices that will use napi_alloc_frags or
netdev_alloc_frags.
Fixes: ffde7328a36d ("net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2019-02-16
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix lockdep false positive in bpf_get_stackid(), from Alexei.
2) several AF_XDP fixes, from Bjorn, Magnus, Davidlohr.
3) fix narrow load from struct bpf_sock, from Martin.
4) mips JIT fixes, from Paul.
5) gso handling fix in bpf helpers, from Willem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The features attribute is of type u64 and stored in the native endianes on
the system. The for_each_set_bit() macro takes a pointer to a 32 bit array
and goes over the bits in this area. On little Endian systems this also
works with an u64 as the most significant bit is on the highest address,
but on big endian the words are swapped. When we expect bit 15 here we get
bit 47 (15 + 32).
This patch converts it more or less to its own for_each_set_bit()
implementation which works on 64 bit integers directly. This is then
completely in host endianness and should work like expected.
Fixes: fd867d51f ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: Hauke Mehrtens <hauke.mehrtens@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With many active TCP sockets, fat TCP sockets could fool
__sk_mem_raise_allocated() thanks to an overflow.
They would increase their share of the memory, instead
of decreasing it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bpf_skb_change_proto and bpf_skb_adjust_room change skb header length.
For GSO packets they adjust gso_size to maintain the same MTU.
The gso size can only be safely adjusted on bytestream protocols.
Commit d02f51cbcf12 ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
to deal with gso sctp skbs") excluded SKB_GSO_SCTP.
Since then type SKB_GSO_UDP_L4 has been added, whose contents are one
gso_size unit per datagram. Also exclude these.
Move from a blacklist to a whitelist check to future proof against
additional such new GSO types, e.g., for fraglist based GRO.
Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2019-01-31
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) disable preemption in sender side of socket filters, from Alexei.
2) fix two potential deadlocks in syscall bpf lookup and prog_register,
from Martin and Alexei.
3) fix BTF to allow typedef on func_proto, from Yonghong.
4) two bpftool fixes, from Jiri and Paolo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Assign a default net namespace to netdevs created by init_dummy_netdev().
Fixes a NULL pointer dereference caused by busy-polling a socket bound to
an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
if napi_poll() received packets:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
IP: napi_busy_loop+0xd6/0x200
Call Trace:
sock_poll+0x5e/0x80
do_sys_poll+0x324/0x5a0
SyS_poll+0x6c/0xf0
do_syscall_64+0x6b/0x1f0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: 7db6b048da3b ("net: Commonize busy polling code to focus on napi_id instead of socket")
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Despite having stopped the parser, we still need to deinitialize it
by calling strp_done so that it cancels its work. Otherwise the worker
thread can run after we have freed the parser, and attempt to access
its workqueue resulting in a use-after-free:
==================================================================
BUG: KASAN: use-after-free in pwq_activate_delayed_work+0x1b/0x1d0
Read of size 8 at addr ffff888069975240 by task kworker/u2:2/93
CPU: 0 PID: 93 Comm: kworker/u2:2 Not tainted 5.0.0-rc2-00335-g28f9d1a3d4fe-dirty #14
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
Workqueue: (null) (kstrp)
Call Trace:
print_address_description+0x6e/0x2b0
? pwq_activate_delayed_work+0x1b/0x1d0
kasan_report+0xfd/0x177
? pwq_activate_delayed_work+0x1b/0x1d0
? pwq_activate_delayed_work+0x1b/0x1d0
pwq_activate_delayed_work+0x1b/0x1d0
? process_one_work+0x4aa/0x660
pwq_dec_nr_in_flight+0x9b/0x100
worker_thread+0x82/0x680
? process_one_work+0x660/0x660
kthread+0x1b9/0x1e0
? __kthread_create_on_node+0x250/0x250
ret_from_fork+0x1f/0x30
Allocated by task 111:
sk_psock_init+0x3c/0x1b0
sock_map_link.isra.2+0x103/0x4b0
sock_map_update_common+0x94/0x270
sock_map_update_elem+0x145/0x160
__se_sys_bpf+0x152e/0x1e10
do_syscall_64+0xb2/0x3e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 112:
kfree+0x7f/0x140
process_one_work+0x40b/0x660
worker_thread+0x82/0x680
kthread+0x1b9/0x1e0
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888069975180
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 192 bytes inside of
512-byte region [ffff888069975180, ffff888069975380)
The buggy address belongs to the page:
page:ffffea0001a65d00 count:1 mapcount:0 mapping:ffff88806d401280 index:0x0 compound_mapcount: 0
flags: 0x4000000000010200(slab|head)
raw: 4000000000010200 dead000000000100 dead000000000200 ffff88806d401280
raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888069975100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888069975180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888069975200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888069975280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888069975300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Reported-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/netdev/CAJPywTLwgXNEZ2dZVoa=udiZmtrWJ0q5SuBW64aYs0Y1khXX3A@mail.gmail.com
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
When sock recvbuff is set by bpf_setsockopt(), the value must by
limited by rmem_max. It is the same with sendbuff.
Fixes: 8c4b4c7e9ff0 ("bpf: Add setsockopt helper function to bpf")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2019-01-20
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a out-of-bounds access in __bpf_redirect_no_mac, from Willem.
2) Fix bpf_setsockopt to reset sock dst on SO_MARK changes, from Peter.
3) Fix map in map masking to prevent out-of-bounds access under
speculative execution, from Daniel.
4) Fix bpf_setsockopt's SO_MAX_PACING_RATE to support TCP internal
pacing, from Yuchung.
5) Fix json writer license in bpftool, from Thomas.
6) Fix AF_XDP to check if an actually queue exists during umem
setup, from Krzysztof.
7) Several fixes to BPF stackmap's build id handling. Another fix
for bpftool build to account for libbfd variations wrt linking
requirements, from Stanislav.
8) Fix BPF samples build with clang by working around missing asm
goto, from Yonghong.
9) Fix libbpf to retry program load on signal interrupt, from Lorenz.
10) Various minor compile warning fixes in BPF code, from Mathieu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Syzkaller was able to construct a packet of negative length by
redirecting from bpf_prog_test_run_skb with BPF_PROG_TYPE_LWT_XMIT:
BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:345 [inline]
BUG: KASAN: slab-out-of-bounds in skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline]
BUG: KASAN: slab-out-of-bounds in __pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395
Read of size 4294967282 at addr ffff8801d798009c by task syz-executor2/12942
kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
check_memory_region_inline mm/kasan/kasan.c:260 [inline]
check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
memcpy+0x23/0x50 mm/kasan/kasan.c:302
memcpy include/linux/string.h:345 [inline]
skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline]
__pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395
__pskb_copy include/linux/skbuff.h:1053 [inline]
pskb_copy include/linux/skbuff.h:2904 [inline]
skb_realloc_headroom+0xe7/0x120 net/core/skbuff.c:1539
ipip6_tunnel_xmit net/ipv6/sit.c:965 [inline]
sit_tunnel_xmit+0xe1b/0x30d0 net/ipv6/sit.c:1029
__netdev_start_xmit include/linux/netdevice.h:4325 [inline]
netdev_start_xmit include/linux/netdevice.h:4334 [inline]
xmit_one net/core/dev.c:3219 [inline]
dev_hard_start_xmit+0x295/0xc90 net/core/dev.c:3235
__dev_queue_xmit+0x2f0d/0x3950 net/core/dev.c:3805
dev_queue_xmit+0x17/0x20 net/core/dev.c:3838
__bpf_tx_skb net/core/filter.c:2016 [inline]
__bpf_redirect_common net/core/filter.c:2054 [inline]
__bpf_redirect+0x5cf/0xb20 net/core/filter.c:2061
____bpf_clone_redirect net/core/filter.c:2094 [inline]
bpf_clone_redirect+0x2f6/0x490 net/core/filter.c:2066
bpf_prog_41f2bcae09cd4ac3+0xb25/0x1000
The generated test constructs a packet with mac header, network
header, skb->data pointing to network header and skb->len 0.
Redirecting to a sit0 through __bpf_redirect_no_mac pulls the
mac length, even though skb->data already is at skb->network_header.
bpf_prog_test_run_skb has already pulled it as LWT_XMIT !is_l2.
Update the offset calculation to pull only if skb->data differs
from skb->network_header, which is not true in this case.
The test itself can be run only from commit 1cf1cae963c2 ("bpf:
introduce BPF_PROG_TEST_RUN command"), but the same type of packets
with skb at network header could already be built from lwt xmit hooks,
so this fix is more relevant to that commit.
Also set the mac header on redirect from LWT_XMIT, as even after this
change to __bpf_redirect_no_mac that field is expected to be set, but
is not yet in ip_finish_output2.
Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Replace the kfree_skb() by consume_skb() to be drop monitor(dropwatch,
perf) friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If sch_fq packet scheduler is not used, TCP can fallback to
internal pacing, but this requires sk_pacing_status to
be properly set.
Fixes: 8c4b4c7e9ff0 ("bpf: Add setsockopt helper function to bpf")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
In sock_setsockopt() (net/core/sock.h), when SO_MARK option is used
to change sk_mark, sk_dst_reset(sk) is called. The same should be
done in bpf_setsockopt().
Fixes: 8c4b4c7e9ff0 ("bpf: Add setsockopt helper function to bpf")
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
There is a plan to build the kernel with -Wimplicit-fallthrough and
this place in the code produced a warnings (W=1).
To preserve as much of the existing comment only change a ‘:’ into a ‘,’.
This is enough change, to match the regular expression expected by GCC.
This commit removes the following warning:
net/core/filter.c:5310:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This should be 1 for normal allocations, 0 disables leak reporting.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Fixes: 85704cb8dcfd ("net/core/neighbour: tell kmemleak about hash tables")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
1) Fix regression in multi-SKB responses to RTM_GETADDR, from Arthur
Gautier.
2) Fix ipv6 frag parsing in openvswitch, from Yi-Hung Wei.
3) Unbounded recursion in ipv4 and ipv6 GUE tunnels, from Stefano
Brivio.
4) Use after free in hns driver, from Yonglong Liu.
5) icmp6_send() needs to handle the case of NULL skb, from Eric
Dumazet.
6) Missing rcu read lock in __inet6_bind() when operating on mapped
addresses, from David Ahern.
7) Memory leak in tipc-nl_compat_publ_dump(), from Gustavo A. R. Silva.
8) Fix PHY vs r8169 module loading ordering issues, from Heiner
Kallweit.
9) Fix bridge vlan memory leak, from Ido Schimmel.
10) Dev refcount leak in AF_PACKET, from Jason Gunthorpe.
11) Infoleak in ipv6_local_error(), flow label isn't completely
initialized. From Eric Dumazet.
12) Handle mv88e6390 errata, from Andrew Lunn.
13) Making vhost/vsock CID hashing consistent, from Zha Bin.
14) Fix lack of UMH cleanup when it unexpectedly exits, from Taehee Yoo.
15) Bridge forwarding must clear skb->tstamp, from Paolo Abeni.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
bnxt_en: Fix context memory allocation.
bnxt_en: Fix ring checking logic on 57500 chips.
mISDN: hfcsusb: Use struct_size() in kzalloc()
net: clear skb->tstamp in bridge forwarding path
net: bpfilter: disallow to remove bpfilter module while being used
net: bpfilter: restart bpfilter_umh when error occurred
net: bpfilter: use cleanup callback to release umh_info
umh: add exit routine for UMH process
isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
vhost/vsock: fix vhost vsock cid hashing inconsistent
net: stmmac: Prevent RX starvation in stmmac_napi_poll()
net: stmmac: Fix the logic of checking if RX Watchdog must be enabled
net: stmmac: Check if CBS is supported before configuring
net: stmmac: dwxgmac2: Only clear interrupts that are active
net: stmmac: Fix PCI module removal leak
tools/bpf: fix bpftool map dump with bitfields
tools/bpf: test btf bitfield with >=256 struct member offset
bpf: fix bpffs bitfield pretty print
net: ethernet: mediatek: fix warning in phy_start_aneg
tcp: change txhash on SYN-data timeout
...
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2019-01-11
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix TCP-BPF support for correctly setting the initial window
via TCP_BPF_IW on an active TFO sender, from Yuchung.
2) Fix a panic in BPF's stack_map_get_build_id()'s ELF parsing on
32 bit archs caused by page_address() returning NULL, from Song.
3) Fix BTF pretty print in kernel and bpftool when bitfield member
offset is greater than 256. Also add test cases, from Yonghong.
4) Fix improper argument handling in xdp1 sample, from Ioana.
5) Install missing tcp_server.py and tcp_client.py files from
BPF selftests, from Anders.
6) Add test_libbpf to gitignore in libbpf and BPF selftests,
from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This fixes false-positive kmemleak reports about leaked neighbour entries:
unreferenced object 0xffff8885c6e4d0a8 (size 1024):
comm "softirq", pid 0, jiffies 4294922664 (age 167640.804s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 20 2c f3 83 ff ff ff ff ........ ,......
08 c0 ef 5f 84 88 ff ff 01 8c 7d 02 01 00 00 00 ..._......}.....
backtrace:
[<00000000748509fe>] ip6_finish_output2+0x887/0x1e40
[<0000000036d7a0d8>] ip6_output+0x1ba/0x600
[<0000000027ea7dba>] ip6_send_skb+0x92/0x2f0
[<00000000d6e2111d>] udp_v6_send_skb.isra.24+0x680/0x15e0
[<000000000668a8be>] udpv6_sendmsg+0x18c9/0x27a0
[<000000004bd5fa90>] sock_sendmsg+0xb3/0xf0
[<000000008227b29f>] ___sys_sendmsg+0x745/0x8f0
[<000000008698009d>] __sys_sendmsg+0xde/0x170
[<00000000889dacf1>] do_syscall_64+0x9b/0x400
[<0000000081cdb353>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<000000005767ed39>] 0xffffffffffffffff
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The existing BPF TCP initial congestion window (TCP_BPF_IW) does not
to work on (active) Fast Open sender. This is because it changes the
(initial) window only if data_segs_out is zero -- but data_segs_out
is also incremented on SYN-data. This patch fixes the issue by
proerly accounting for SYN-data additionally.
Fixes: fc7478103c84 ("bpf: Adds support for setting initial cwnd")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:
#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
# define HAVE_JUMP_LABEL
#endif
We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
|
|
Commit dcda9b04713c ("mm, tree wide: replace __GFP_REPEAT by
__GFP_RETRY_MAYFAIL with more useful semantic") replaced __GFP_REPEAT in
alloc_skb_with_frags() with __GFP_RETRY_MAYFAIL when the allocation may
directly reclaim.
The previous behavior would require reclaim up to 1 << order pages for
skb aligned header_len of order > PAGE_ALLOC_COSTLY_ORDER before failing,
otherwise the allocations in alloc_skb() would loop in the page allocator
looking for memory. __GFP_RETRY_MAYFAIL makes both allocations failable
under memory pressure, including for the HEAD allocation.
This can cause, among many other things, write() to fail with ENOTCONN
during RPC when under memory pressure.
These allocations should succeed as they did previous to dcda9b04713c
even if it requires calling the oom killer and additional looping in the
page allocator to find memory. There is no way to specify the previous
behavior of __GFP_REPEAT, but it's unlikely to be necessary since the
previous behavior only guaranteed that 1 << order pages would be reclaimed
before failing for order > PAGE_ALLOC_COSTLY_ORDER. That reclaim is not
guaranteed to be contiguous memory, so repeating for such large orders is
usually not beneficial.
Removing the setting of __GFP_RETRY_MAYFAIL to restore the previous
behavior, specifically not allowing alloc_skb() to fail for small orders
and oom kill if necessary rather than allowing RPCs to fail.
Fixes: dcda9b04713c ("mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic")
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
"Several fixes here. Basically split down the line between newly
introduced regressions and long existing problems:
1) Double free in tipc_enable_bearer(), from Cong Wang.
2) Many fixes to nf_conncount, from Florian Westphal.
3) op->get_regs_len() can throw an error, check it, from Yunsheng
Lin.
4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman
driver, from Scott Wood.
5) Inifnite loop in fib_empty_table(), from Yue Haibing.
6) Use after free in ax25_fillin_cb(), from Cong Wang.
7) Fix socket locking in nr_find_socket(), also from Cong Wang.
8) Fix WoL wakeup enable in r8169, from Heiner Kallweit.
9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani.
10) Fix ptr_ring wrap during queue swap, from Cong Wang.
11) Missing shutdown callback in hinic driver, from Xue Chaojing.
12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano
Brivio.
13) BPF out of bounds speculation fixes from Daniel Borkmann"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
ipv6: Consider sk_bound_dev_if when binding a socket to an address
ipv6: Fix dump of specific table with strict checking
bpf: add various test cases to selftests
bpf: prevent out of bounds speculation on pointer arithmetic
bpf: fix check_map_access smin_value test when pointer contains offset
bpf: restrict unknown scalars of mixed signed bounds for unprivileged
bpf: restrict stack pointer arithmetic for unprivileged
bpf: restrict map value pointer arithmetic for unprivileged
bpf: enable access to ax register also from verifier rewrite
bpf: move tmp variable into ax register in interpreter
bpf: move {prev_,}insn_idx into verifier env
isdn: fix kernel-infoleak in capi_unlocked_ioctl
ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error
net/hamradio/6pack: use mod_timer() to rearm timers
net-next/hinic:add shutdown callback
net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT
ip: validate header length on virtual device xmit
tap: call skb_probe_transport_header after setting skb->dev
ptr_ring: wrap back ->producer in __ptr_ring_swap_queue()
net: rds: remove unnecessary NULL check
...
|
|
Al Viro mentioned (Message-ID
<20170626041334.GZ10672@ZenIV.linux.org.uk>)
that there is probably a race condition
lurking in accesses of sk_stamp on 32-bit machines.
sock->sk_stamp is of type ktime_t which is always an s64.
On a 32 bit architecture, we might run into situations of
unsafe access as the access to the field becomes non atomic.
Use seqlocks for synchronization.
This allows us to avoid using spinlocks for readers as
readers do not need mutual exclusion.
Another approach to solve this is to require sk_lock for all
modifications of the timestamps. The current approach allows
for timestamps to have their own lock: sk_stamp_lock.
This allows for the patch to not compete with already
existing critical sections, and side effects are limited
to the paths in the patch.
The addition of the new field maintains the data locality
optimizations from
commit 9115e8cd2a0c ("net: reorganize struct sock for better data
locality")
Note that all the instances of the sk_stamp accesses
are either through the ioctl or the syscall recvmsg.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We must have an address to lookup otherwise we'll derefence a null
pointer in the ndo_fdb_get callbacks.
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: David Ahern <dsa@cumulusnetworks.com>
Reported-by: syzbot+017b1f61c82a1c3e7efd@syzkaller.appspotmail.com
Fixes: 5b2f94b27622 ("net: rtnetlink: support for fdb get")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The return type for get_regs_len in struct ethtool_ops is int,
the hns3 driver may return error when failing to get the regs
len by sending cmd to firmware.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull block updates from Jens Axboe:
"This is the main pull request for block/storage for 4.21.
Larger than usual, it was a busy round with lots of goodies queued up.
Most notable is the removal of the old IO stack, which has been a long
time coming. No new features for a while, everything coming in this
week has all been fixes for things that were previously merged.
This contains:
- Use atomic counters instead of semaphores for mtip32xx (Arnd)
- Cleanup of the mtip32xx request setup (Christoph)
- Fix for circular locking dependency in loop (Jan, Tetsuo)
- bcache (Coly, Guoju, Shenghui)
* Optimizations for writeback caching
* Various fixes and improvements
- nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith)
* host and target support for NVMe over TCP
* Error log page support
* Support for separate read/write/poll queues
* Much improved polling
* discard OOM fallback
* Tracepoint improvements
- lightnvm (Hans, Hua, Igor, Matias, Javier)
* Igor added packed metadata to pblk. Now drives without metadata
per LBA can be used as well.
* Fix from Geert on uninitialized value on chunk metadata reads.
* Fixes from Hans and Javier to pblk recovery and write path.
* Fix from Hua Su to fix a race condition in the pblk recovery
code.
* Scan optimization added to pblk recovery from Zhoujie.
* Small geometry cleanup from me.
- Conversion of the last few drivers that used the legacy path to
blk-mq (me)
- Removal of legacy IO path in SCSI (me, Christoph)
- Removal of legacy IO stack and schedulers (me)
- Support for much better polling, now without interrupts at all.
blk-mq adds support for multiple queue maps, which enables us to
have a map per type. This in turn enables nvme to have separate
completion queues for polling, which can then be interrupt-less.
Also means we're ready for async polled IO, which is hopefully
coming in the next release.
- Killing of (now) unused block exports (Christoph)
- Unification of the blk-rq-qos and blk-wbt wait handling (Josef)
- Support for zoned testing with null_blk (Masato)
- sx8 conversion to per-host tag sets (Christoph)
- IO priority improvements (Damien)
- mq-deadline zoned fix (Damien)
- Ref count blkcg series (Dennis)
- Lots of blk-mq improvements and speedups (me)
- sbitmap scalability improvements (me)
- Make core inflight IO accounting per-cpu (Mikulas)
- Export timeout setting in sysfs (Weiping)
- Cleanup the direct issue path (Jianchao)
- Export blk-wbt internals in block debugfs for easier debugging
(Ming)
- Lots of other fixes and improvements"
* tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits)
kyber: use sbitmap add_wait_queue/list_del wait helpers
sbitmap: add helpers for add/del wait queue handling
block: save irq state in blkg_lookup_create()
dm: don't reuse bio for flushes
nvme-pci: trace SQ status on completions
nvme-rdma: implement polling queue map
nvme-fabrics: allow user to pass in nr_poll_queues
nvme-fabrics: allow nvmf_connect_io_queue to poll
nvme-core: optionally poll sync commands
block: make request_to_qc_t public
nvme-tcp: fix spelling mistake "attepmpt" -> "attempt"
nvme-tcp: fix endianess annotations
nvmet-tcp: fix endianess annotations
nvme-pci: refactor nvme_poll_irqdisable to make sparse happy
nvme-pci: only set nr_maps to 2 if poll queues are supported
nvmet: use a macro for default error location
nvmet: fix comparison of a u16 with -1
blk-mq: enable IO poll if .nr_queues of type poll > 0
blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
blk-mq: skip zero-queue maps in blk_mq_map_swqueue
...
|
|
Pull networking updates from David Miller:
1) New ipset extensions for matching on destination MAC addresses, from
Stefano Brivio.
2) Add ipv4 ttl and tos, plus ipv6 flow label and hop limit offloads to
nfp driver. From Stefano Brivio.
3) Implement GRO for plain UDP sockets, from Paolo Abeni.
4) Lots of work from Michał Mirosław to eliminate the VLAN_TAG_PRESENT
bit so that we could support the entire vlan_tci value.
5) Rework the IPSEC policy lookups to better optimize more usecases,
from Florian Westphal.
6) Infrastructure changes eliminating direct manipulation of SKB lists
wherever possible, and to always use the appropriate SKB list
helpers. This work is still ongoing...
7) Lots of PHY driver and state machine improvements and
simplifications, from Heiner Kallweit.
8) Various TSO deferral refinements, from Eric Dumazet.
9) Add ntuple filter support to aquantia driver, from Dmitry Bogdanov.
10) Batch dropping of XDP packets in tuntap, from Jason Wang.
11) Lots of cleanups and improvements to the r8169 driver from Heiner
Kallweit, including support for ->xmit_more. This driver has been
getting some much needed love since he started working on it.
12) Lots of new forwarding selftests from Petr Machata.
13) Enable VXLAN learning in mlxsw driver, from Ido Schimmel.
14) Packed ring support for virtio, from Tiwei Bie.
15) Add new Aquantia AQtion USB driver, from Dmitry Bezrukov.
16) Add XDP support to dpaa2-eth driver, from Ioana Ciocoi Radulescu.
17) Implement coalescing on TCP backlog queue, from Eric Dumazet.
18) Implement carrier change in tun driver, from Nicolas Dichtel.
19) Support msg_zerocopy in UDP, from Willem de Bruijn.
20) Significantly improve garbage collection of neighbor objects when
the table has many PERMANENT entries, from David Ahern.
21) Remove egdev usage from nfp and mlx5, and remove the facility
completely from the tree as it no longer has any users. From Oz
Shlomo and others.
22) Add a NETDEV_PRE_CHANGEADDR so that drivers can veto the change and
therefore abort the operation before the commit phase (which is the
NETDEV_CHANGEADDR event). From Petr Machata.
23) Add indirect call wrappers to avoid retpoline overhead, and use them
in the GRO code paths. From Paolo Abeni.
24) Add support for netlink FDB get operations, from Roopa Prabhu.
25) Support bloom filter in mlxsw driver, from Nir Dotan.
26) Add SKB extension infrastructure. This consolidates the handling of
the auxiliary SKB data used by IPSEC and bridge netfilter, and is
designed to support the needs to MPTCP which could be integrated in
the future.
27) Lots of XDP TX optimizations in mlx5 from Tariq Toukan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1845 commits)
net: dccp: fix kernel crash on module load
drivers/net: appletalk/cops: remove redundant if statement and mask
bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw
net/net_namespace: Check the return value of register_pernet_subsys()
net/netlink_compat: Fix a missing check of nla_parse_nested
ieee802154: lowpan_header_create check must check daddr
net/mlx4_core: drop useless LIST_HEAD
mlxsw: spectrum: drop useless LIST_HEAD
net/mlx5e: drop useless LIST_HEAD
iptunnel: Set tun_flags in the iptunnel_metadata_reply from src
net/mlx5e: fix semicolon.cocci warnings
staging: octeon: fix build failure with XFRM enabled
net: Revert recent Spectre-v1 patches.
can: af_can: Fix Spectre v1 vulnerability
packet: validate address length if non-zero
nfc: af_nfc: Fix Spectre v1 vulnerability
phonet: af_phonet: Fix Spectre v1 vulnerability
net: core: Fix Spectre v1 vulnerability
net: minor cleanup in skb_ext_add()
net: drop the unused helper skb_ext_get()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The biggest RCU changes in this cycle were:
- Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.
- Replace calls of RCU-bh and RCU-sched update-side functions to
their vanilla RCU counterparts. This series is a step towards
complete removal of the RCU-bh and RCU-sched update-side functions.
( Note that some of these conversions are going upstream via their
respective maintainers. )
- Documentation updates, including a number of flavor-consolidation
updates from Joel Fernandes.
- Miscellaneous fixes.
- Automate generation of the initrd filesystem used for rcutorture
testing.
- Convert spin_is_locked() assertions to instead use lockdep.
( Note that some of these conversions are going upstream via their
respective maintainers. )
- SRCU updates, especially including a fix from Dennis Krein for a
bag-on-head-class bug.
- RCU torture-test updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
rcutorture: Don't do busted forward-progress testing
rcutorture: Use 100ms buckets for forward-progress callback histograms
rcutorture: Recover from OOM during forward-progress tests
rcutorture: Print forward-progress test age upon failure
rcutorture: Print time since GP end upon forward-progress failure
rcutorture: Print histogram of CB invocation at OOM time
rcutorture: Print GP age upon forward-progress failure
rcu: Print per-CPU callback counts for forward-progress failures
rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings
rcutorture: Dump grace-period diagnostics upon forward-progress OOM
rcutorture: Prepare for asynchronous access to rcu_fwd_startat
torture: Remove unnecessary "ret" variables
rcutorture: Affinity forward-progress test to avoid housekeeping CPUs
rcutorture: Break up too-long rcu_torture_fwd_prog() function
rcutorture: Remove cbflood facility
torture: Bring any extra CPUs online during kernel startup
rcutorture: Add call_rcu() flooding forward-progress tests
rcutorture/formal: Replace synchronize_sched() with synchronize_rcu()
tools/kernel.h: Replace synchronize_sched() with synchronize_rcu()
net/decnet: Replace rcu_barrier_bh() with rcu_barrier()
...
|
|
Pull in bug fixes before respinning my net-next pull
request.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In net_ns_init(), register_pernet_subsys() could fail while registering
network namespace subsystems. The fix checks the return value and
sends a panic() on failure.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts:
50d5258634ae ("net: core: Fix Spectre v1 vulnerability")
d686026b1e6e ("phonet: af_phonet: Fix Spectre v1 vulnerability")
a95386f0390a ("nfc: af_nfc: Fix Spectre v1 vulnerability")
a3ac5817ffe8 ("can: af_can: Fix Spectre v1 vulnerability")
After some discussion with Alexei Starovoitov these all seem to
be completely unnecessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
flen is indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
net/core/filter.c:1101 bpf_check_classic() warn: potential spectre issue 'filter' [w]
Fix this by sanitizing flen before using it to index filter at line 1101:
switch (filter[flen - 1].code) {
and through pc at line 1040:
const struct sock_filter *ftest = &filter[pc];
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
When the extension to be added is already present, the only
skb field we may need to update is 'extensions': we can reorder
the code and avoid a branch.
v1 -> v2:
- be sure to flag the newly added extension as active
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|