| Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 082f2300cfa1a3d9d5221c38c5eba85d4ab98bd8 upstream.
Local random address needs to be updated before creating connection if
RPA from LE Direct Advertising Report was resolved in host. Otherwise
remote device might ignore connection request due to address mismatch.
This was affecting following qualification test cases:
GAP/CONN/SCEP/BV-03-C, GAP/CONN/GCEP/BV-05-C, GAP/CONN/DCEP/BV-05-C
Before patch:
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6 #11350 [hci0] 84680.231216
Address: 56:BC:E8:24:11:68 (Resolvable)
Identity type: Random (0x01)
Identity: F2:F1:06:3D:9C:42 (Static)
> HCI Event: Command Complete (0x0e) plen 4 #11351 [hci0] 84680.246022
LE Set Random Address (0x08|0x0005) ncmd 1
Status: Success (0x00)
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7 #11352 [hci0] 84680.246417
Type: Passive (0x00)
Interval: 60.000 msec (0x0060)
Window: 30.000 msec (0x0030)
Own address type: Random (0x01)
Filter policy: Accept all advertisement, inc. directed unresolved RPA (0x02)
> HCI Event: Command Complete (0x0e) plen 4 #11353 [hci0] 84680.248854
LE Set Scan Parameters (0x08|0x000b) ncmd 1
Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #11354 [hci0] 84680.249466
Scanning: Enabled (0x01)
Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4 #11355 [hci0] 84680.253222
LE Set Scan Enable (0x08|0x000c) ncmd 1
Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 18 #11356 [hci0] 84680.458387
LE Direct Advertising Report (0x0b)
Num reports: 1
Event type: Connectable directed - ADV_DIRECT_IND (0x01)
Address type: Random (0x01)
Address: 53:38:DA:46:8C:45 (Resolvable)
Identity type: Public (0x00)
Identity: 11:22:33:44:55:66 (OUI 11-22-33)
Direct address type: Random (0x01)
Direct address: 7C:D6:76:8C:DF:82 (Resolvable)
Identity type: Random (0x01)
Identity: F2:F1:06:3D:9C:42 (Static)
RSSI: -74 dBm (0xb6)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #11357 [hci0] 84680.458737
Scanning: Disabled (0x00)
Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4 #11358 [hci0] 84680.469982
LE Set Scan Enable (0x08|0x000c) ncmd 1
Status: Success (0x00)
< HCI Command: LE Create Connection (0x08|0x000d) plen 25 #11359 [hci0] 84680.470444
Scan interval: 60.000 msec (0x0060)
Scan window: 60.000 msec (0x0060)
Filter policy: White list is not used (0x00)
Peer address type: Random (0x01)
Peer address: 53:38:DA:46:8C:45 (Resolvable)
Identity type: Public (0x00)
Identity: 11:22:33:44:55:66 (OUI 11-22-33)
Own address type: Random (0x01)
Min connection interval: 30.00 msec (0x0018)
Max connection interval: 50.00 msec (0x0028)
Connection latency: 0 (0x0000)
Supervision timeout: 420 msec (0x002a)
Min connection length: 0.000 msec (0x0000)
Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4 #11360 [hci0] 84680.474971
LE Create Connection (0x08|0x000d) ncmd 1
Status: Success (0x00)
< HCI Command: LE Create Connection Cancel (0x08|0x000e) plen 0 #11361 [hci0] 84682.545385
> HCI Event: Command Complete (0x0e) plen 4 #11362 [hci0] 84682.551014
LE Create Connection Cancel (0x08|0x000e) ncmd 1
Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 19 #11363 [hci0] 84682.551074
LE Connection Complete (0x01)
Status: Unknown Connection Identifier (0x02)
Handle: 0
Role: Master (0x00)
Peer address type: Public (0x00)
Peer address: 00:00:00:00:00:00 (OUI 00-00-00)
Connection interval: 0.00 msec (0x0000)
Connection latency: 0 (0x0000)
Supervision timeout: 0 msec (0x0000)
Master clock accuracy: 0x00
After patch:
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7 #210 [hci0] 667.152459
Type: Passive (0x00)
Interval: 60.000 msec (0x0060)
Window: 30.000 msec (0x0030)
Own address type: Random (0x01)
Filter policy: Accept all advertisement, inc. directed unresolved RPA (0x02)
> HCI Event: Command Complete (0x0e) plen 4 #211 [hci0] 667.153613
LE Set Scan Parameters (0x08|0x000b) ncmd 1
Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #212 [hci0] 667.153704
Scanning: Enabled (0x01)
Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4 #213 [hci0] 667.154584
LE Set Scan Enable (0x08|0x000c) ncmd 1
Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 18 #214 [hci0] 667.182619
LE Direct Advertising Report (0x0b)
Num reports: 1
Event type: Connectable directed - ADV_DIRECT_IND (0x01)
Address type: Random (0x01)
Address: 50:52:D9:A6:48:A0 (Resolvable)
Identity type: Public (0x00)
Identity: 11:22:33:44:55:66 (OUI 11-22-33)
Direct address type: Random (0x01)
Direct address: 7C:C1:57:A5:B7:A8 (Resolvable)
Identity type: Random (0x01)
Identity: F4:28:73:5D:38:B0 (Static)
RSSI: -70 dBm (0xba)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #215 [hci0] 667.182704
Scanning: Disabled (0x00)
Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4 #216 [hci0] 667.183599
LE Set Scan Enable (0x08|0x000c) ncmd 1
Status: Success (0x00)
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6 #217 [hci0] 667.183645
Address: 7C:C1:57:A5:B7:A8 (Resolvable)
Identity type: Random (0x01)
Identity: F4:28:73:5D:38:B0 (Static)
> HCI Event: Command Complete (0x0e) plen 4 #218 [hci0] 667.184590
LE Set Random Address (0x08|0x0005) ncmd 1
Status: Success (0x00)
< HCI Command: LE Create Connection (0x08|0x000d) plen 25 #219 [hci0] 667.184613
Scan interval: 60.000 msec (0x0060)
Scan window: 60.000 msec (0x0060)
Filter policy: White list is not used (0x00)
Peer address type: Random (0x01)
Peer address: 50:52:D9:A6:48:A0 (Resolvable)
Identity type: Public (0x00)
Identity: 11:22:33:44:55:66 (OUI 11-22-33)
Own address type: Random (0x01)
Min connection interval: 30.00 msec (0x0018)
Max connection interval: 50.00 msec (0x0028)
Connection latency: 0 (0x0000)
Supervision timeout: 420 msec (0x002a)
Min connection length: 0.000 msec (0x0000)
Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4 #220 [hci0] 667.186558
LE Create Connection (0x08|0x000d) ncmd 1
Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 19 #221 [hci0] 667.485824
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Master (0x00)
Peer address type: Random (0x01)
Peer address: 50:52:D9:A6:48:A0 (Resolvable)
Identity type: Public (0x00)
Identity: 11:22:33:44:55:66 (OUI 11-22-33)
Connection interval: 50.00 msec (0x0028)
Connection latency: 0 (0x0000)
Supervision timeout: 420 msec (0x002a)
Master clock accuracy: 0x07
@ MGMT Event: Device Connected (0x000b) plen 13 {0x0002} [hci0] 667.485996
LE Address: 11:22:33:44:55:66 (OUI 11-22-33)
Flags: 0x00000000
Data length: 0
Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3f01ddb962dc506916c243f9524e8bef97119b77 ]
On receiving a packet the state index points to the rstate which must be
used to fill up IP and TCP headers. But if the state index points to a
rstate which is unitialized, i.e. filled with zeros, it gets stuck in an
infinite loop inside ip_fast_csum trying to compute the ip checsum of a
header with zero length.
89.666953: <2> [<ffffff9dd3e94d38>] slhc_uncompress+0x464/0x468
89.666965: <2> [<ffffff9dd3e87d88>] ppp_receive_nonmp_frame+0x3b4/0x65c
89.666978: <2> [<ffffff9dd3e89dd4>] ppp_receive_frame+0x64/0x7e0
89.666991: <2> [<ffffff9dd3e8a708>] ppp_input+0x104/0x198
89.667005: <2> [<ffffff9dd3e93868>] pppopns_recv_core+0x238/0x370
89.667027: <2> [<ffffff9dd4428fc8>] __sk_receive_skb+0xdc/0x250
89.667040: <2> [<ffffff9dd3e939e4>] pppopns_recv+0x44/0x60
89.667053: <2> [<ffffff9dd4426848>] __sock_queue_rcv_skb+0x16c/0x24c
89.667065: <2> [<ffffff9dd4426954>] sock_queue_rcv_skb+0x2c/0x38
89.667085: <2> [<ffffff9dd44f7358>] raw_rcv+0x124/0x154
89.667098: <2> [<ffffff9dd44f7568>] raw_local_deliver+0x1e0/0x22c
89.667117: <2> [<ffffff9dd44c8ba0>] ip_local_deliver_finish+0x70/0x24c
89.667131: <2> [<ffffff9dd44c92f4>] ip_local_deliver+0x100/0x10c
./scripts/faddr2line vmlinux slhc_uncompress+0x464/0x468 output:
ip_fast_csum at arch/arm64/include/asm/checksum.h:40
(inlined by) slhc_uncompress at drivers/net/slip/slhc.c:615
Adding a variable to indicate if the current rstate is initialized. If
such a packet arrives, move to toss state.
Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 35d889d10b649fda66121891ec05eca88150059d ]
When we exceed current packets limit and we have more than one
segment in the list returned by skb_gso_segment(), netem drops
only the first one, skipping the rest, hence kmemleak reports:
unreferenced object 0xffff880b5d23b600 (size 1024):
comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s)
hex dump (first 32 bytes):
00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00 ..#]............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000d8a19b9d>] __alloc_skb+0xc9/0x520
[<000000001709b32f>] skb_segment+0x8c8/0x3710
[<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830
[<00000000c921cba1>] inet_gso_segment+0x476/0x1370
[<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510
[<000000002182660a>] __skb_gso_segment+0x1dd/0x620
[<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem]
[<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120
[<00000000fc5f7327>] ip_finish_output2+0x998/0xf00
[<00000000d309e9d3>] ip_output+0x1aa/0x2c0
[<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670
[<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0
[<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540
[<0000000013d06d02>] tasklet_action+0x1ca/0x250
[<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3
[<00000000e7ed027c>] irq_exit+0x1e2/0x210
Fix it by adding the rest of the segments, if any, to skb 'to_free'
list. Add new __qdisc_drop_all() and qdisc_drop_all() functions
because they can be useful in the future if we need to drop segmented
GSO packets in other places.
Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
tcp_write_queue_purge clears all the SKBs in the write queue
but does not reset the sk_send_head. As a result, we can have
a NULL pointer dereference anywhere that we use tcp_send_head
instead of the tcp_write_queue_tail.
For example, after a27fd7a8ed38 (tcp: purge write queue upon RST),
we can purge the write queue on RST. Prior to
75c119afe14f (tcp: implement rb-tree based retransmit queue),
tcp_push will only check tcp_send_head and then accesses
tcp_write_queue_tail to send the actual SKB. As a result, it will
dereference a NULL pointer.
This has been reported twice for 4.14 where we don't have
75c119afe14f:
By Timofey Titovets:
[ 422.081094] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000038
[ 422.081254] IP: tcp_push+0x42/0x110
[ 422.081314] PGD 0 P4D 0
[ 422.081364] Oops: 0002 [#1] SMP PTI
By Yongjian Xu:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: tcp_push+0x48/0x120
PGD 80000007ff77b067 P4D 80000007ff77b067 PUD 7fd989067 PMD 0
Oops: 0002 [#18] SMP PTI
Modules linked in: tcp_diag inet_diag tcp_bbr sch_fq iTCO_wdt
iTCO_vendor_support pcspkr ixgbe mdio i2c_i801 lpc_ich joydev input_leds shpchp
e1000e igb dca ptp pps_core hwmon mei_me mei ipmi_si ipmi_msghandler sg ses
scsi_transport_sas enclosure ext4 jbd2 mbcache sd_mod ahci libahci megaraid_sas
wmi ast ttm dm_mirror dm_region_hash dm_log dm_mod dax
CPU: 6 PID: 14156 Comm: [ET_NET 6] Tainted: G D 4.14.26-1.el6.x86_64 #1
Hardware name: LENOVO ThinkServer RD440 /ThinkServer RD440, BIOS A0TS80A
09/22/2014
task: ffff8807d78d8140 task.stack: ffffc9000e944000
RIP: 0010:tcp_push+0x48/0x120
RSP: 0018:ffffc9000e947a88 EFLAGS: 00010246
RAX: 00000000000005b4 RBX: ffff880f7cce9c00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff8807d00f5000
RBP: ffffc9000e947aa8 R08: 0000000000001c84 R09: 0000000000000000
R10: ffff8807d00f5158 R11: 0000000000000000 R12: ffff8807d00f5000
R13: 0000000000000020 R14: 00000000000256d4 R15: 0000000000000000
FS: 00007f5916de9700(0000) GS:ffff88107fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000038 CR3: 00000007f8226004 CR4: 00000000001606e0
Call Trace:
tcp_sendmsg_locked+0x33d/0xe50
tcp_sendmsg+0x37/0x60
inet_sendmsg+0x39/0xc0
sock_sendmsg+0x49/0x60
sock_write_iter+0xb6/0x100
do_iter_readv_writev+0xec/0x130
? rw_verify_area+0x49/0xb0
do_iter_write+0x97/0xd0
vfs_writev+0x7e/0xe0
? __wake_up_common_lock+0x80/0xa0
? __fget_light+0x2c/0x70
? __do_page_fault+0x1e7/0x530
do_writev+0x60/0xf0
? inet_shutdown+0xac/0x110
SyS_writev+0x10/0x20
do_syscall_64+0x6f/0x140
? prepare_exit_to_usermode+0x8b/0xa0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x3135ce0c57
RSP: 002b:00007f5916de4b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000003135ce0c57
RDX: 0000000000000002 RSI: 00007f5916de4b90 RDI: 000000000000606f
RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f5916de8c38
R10: 0000000000000000 R11: 0000000000000293 R12: 00000000000464cc
R13: 00007f5916de8c30 R14: 00007f58d8bef080 R15: 0000000000000002
Code: 48 8b 97 60 01 00 00 4c 8d 97 58 01 00 00 41 b9 00 00 00 00 41 89 f3 4c 39
d2 49 0f 44 d1 41 81 e3 00 80 00 00 0f 85 b0 00 00 00 <80> 4a 38 08 44 8b 8f 74
06 00 00 44 89 8f 7c 06 00 00 83 e6 01
RIP: tcp_push+0x48/0x120 RSP: ffffc9000e947a88
CR2: 0000000000000038
---[ end trace 8d545c2e93515549 ]---
Fixes: a27fd7a8ed38 (tcp: purge write queue upon RST)
Reported-by: Timofey Titovets <nefelim4ag@gmail.com>
Reported-by: Yongjian Xu <yongjianchn@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Yongjian Xu <yongjianchn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 15f35d49c93f4fa9875235e7bf3e3783d2dd7a1b ]
Since UDP-Lite is always using checksum, the following path is
triggered when calculating pseudo header for it:
udp4_csum_init() or udp6_csum_init()
skb_checksum_init_zero_check()
__skb_checksum_validate_complete()
The problem can appear if skb->len is less than CHECKSUM_BREAK. In
this particular case __skb_checksum_validate_complete() also invokes
__skb_checksum_complete(skb). If UDP-Lite is using partial checksum
that covers only part of a packet, the function will return bad
checksum and the packet will be dropped.
It can be fixed if we skip skb_checksum_init_zero_check() and only
set the required pseudo header checksum for UDP-Lite with partial
checksum before udp4_csum_init()/udp6_csum_init() functions return.
Fixes: ed70fcfcee95 ("net: Call skb_checksum_init in IPv4")
Fixes: e4f45b7f40bd ("net: Call skb_checksum_init in IPv6")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit efbf78973978b0d25af59bc26c8013a942af6e64 upstream.
Both Eric and Paolo noticed the rcu_barrier() we use in
tcf_block_put_ext() could be a performance bottleneck when
we have a lot of tc classes.
Paolo provided the following to demonstrate the issue:
tc qdisc add dev lo root htb
for I in `seq 1 1000`; do
tc class add dev lo parent 1: classid 1:$I htb rate 100kbit
tc qdisc add dev lo parent 1:$I handle $((I + 1)): htb
for J in `seq 1 10`; do
tc filter add dev lo parent $((I + 1)): u32 match ip src 1.1.1.$J
done
done
time tc qdisc del dev root
real 0m54.764s
user 0m0.023s
sys 0m0.000s
The rcu_barrier() there is to ensure we free the block after all chains
are gone, that is, to queue tcf_block_put_final() at the tail of workqueue.
We can achieve this ordering requirement by refcnt'ing tcf block instead,
that is, the tcf block is freed only when the last chain in this block is
gone. This also simplifies the code.
Paolo reported after this patch we get:
real 0m0.017s
user 0m0.000s
sys 0m0.017s
Tested-by: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit acf568ee859f098279eadf551612f103afdacb4e ]
This is an old bugbear of mine:
https://www.mail-archive.com/netdev@vger.kernel.org/msg03894.html
By crafting special packets, it is possible to cause recursion
in our kernel when processing transport-mode packets at levels
that are only limited by packet size.
The easiest one is with DNAT, but an even worse one is where
UDP encapsulation is used in which case you just have to insert
an UDP encapsulation header in between each level of recursion.
This patch avoids this problem by reinjecting tranport-mode packets
through a tasklet.
Fixes: b05e106698d9 ("[IPV4/6]: Netfilter IPsec input hooks")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8afa10cbe281b10371fee5a87ab266e48d71a7f9 ]
Check the qmin & qmax values doesn't overflow for the given Wlog value.
Check that qmin <= qmax.
Fixes: a783474591f2 ("[PKT_SCHED]: Generic RED layer")
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5c472203421ab4f928aa1ae9e1dbcfdd80324148 ]
Do not allow delta value to be zero since it is used as a divisor.
Fixes: 8af2a218de38 ("sch_red: Adaptative RED AQM")
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ecca8f88da5c4260cc2bccfefd2a24976704c366 upstream.
Now in sctp_setsockopt_maxseg user_frag or frag_point can be set with
val >= 8 and val <= SCTP_MAX_CHUNK_LEN. But both checks are incorrect.
val >= 8 means frag_point can even be less than SCTP_DEFAULT_MINSEGMENT.
Then in sctp_datamsg_from_user(), when it's value is greater than cookie
echo len and trying to bundle with cookie echo chunk, the first_len will
overflow.
The worse case is when it's value is equal as cookie echo len, first_len
becomes 0, it will go into a dead loop for fragment later on. In Hangbin
syzkaller testing env, oom was even triggered due to consecutive memory
allocation in that loop.
Besides, SCTP_MAX_CHUNK_LEN is the max size of the whole chunk, it should
deduct the data header for frag_point or user_frag check.
This patch does a proper check with SCTP_DEFAULT_MINSEGMENT subtracting
the sctphdr and datahdr, SCTP_MAX_CHUNK_LEN subtracting datahdr when
setting frag_point via sockopt. It also improves sctp_setsockopt_maxseg
codes.
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 51a1aaa631c90223888d8beac4d649dc11d2ca55 upstream.
When creating a new radio on the fly, hwsim allows this
to be done with an arbitrary number of channels, but
cfg80211 only supports a limited number of simultaneous
channels, leading to a warning.
Fix this by validating the number - this requires moving
the define for the maximum out to a visible header file.
Reported-by: syzbot+8dd9051ff19940290931@syzkaller.appspotmail.com
Fixes: b59ec8dd4394 ("mac80211_hwsim: fix number of channels in interface combinations")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4950276672fce5c241857540f8561c440663673d upstream.
Patch series "kmemcheck: kill kmemcheck", v2.
As discussed at LSF/MM, kill kmemcheck.
KASan is a replacement that is able to work without the limitation of
kmemcheck (single CPU, slow). KASan is already upstream.
We are also not aware of any users of kmemcheck (or users who don't
consider KASan as a suitable replacement).
The only objection was that since KASAN wasn't supported by all GCC
versions provided by distros at that time we should hold off for 2
years, and try again.
Now that 2 years have passed, and all distros provide gcc that supports
KASAN, kill kmemcheck again for the very same reasons.
This patch (of 4):
Remove kmemcheck annotations, and calls to kmemcheck from the kernel.
[alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7b6ddeaf27eca72795ceeae2f0f347db1b5f9a30 ]
When connected to a QoS/WMM AP, mac80211 should use a QoS NDP
for probing it, instead of a regular non-QoS one, fix this.
Change all the drivers to *not* allow QoS NDP for now, even
though it looks like most of them should be OK with that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 30be8f8dba1bd2aff73e8447d59228471233a3d4 ]
sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
Socket error codes must be inverted by Kernel TLS before returning because
they are stored with positive sign. If returned non-inverted they are
interpreted as number of bytes sent, causing endless looping of the
splice mechanic behind sendfile().
Signed-off-by: Robert Hering <r.hering@avm.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]
When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.
For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open. However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open. In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence. The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:
unregister_netdevice: waiting for lo to become free. Usage count = 1
These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.
After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e9191ffb65d8e159680ce0ad2224e1acbde6985c ]
Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after
sysctl setting") removed the initialisation of
ipv6_pinfo::autoflowlabel and added a second flag to indicate
whether this field or the net namespace default should be used.
The getsockopt() handling for this case was not updated, so it
currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
not explicitly enabled. Fix it to return the effective value, whether
that has been set at the socket or net namespace level.
Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit cd9ff4de0107c65d69d02253bb25d6db93c3dbc1 ]
Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.
This used the be the old behavior but became broken in a263b3093641f
(ipv4: Make neigh lookups directly in output packet path) and later removed
in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.
Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b6c5734db07079c9410147b32407f2366d584e6c ]
syzbot reported a hang involving SCTP, on which it kept flooding dmesg
with the message:
[ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
low, using default minimum of 512
That happened because whenever SCTP hits an ICMP Frag Needed, it tries
to adjust to the new MTU and triggers an immediate retransmission. But
it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
allowed (512) would not cause the PMTU to change, and issued the
retransmission anyway (thus leading to another ICMP Frag Needed, and so
on).
As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
are higher than that, sctp_transport_update_pmtu() is changed to
re-fetch the PMTU that got set after our request, and with that, detect
if there was an actual change or not.
The fix, thus, skips the immediate retransmission if the received ICMP
resulted in no change, in the hope that SCTP will select another path.
Note: The value being used for the minimum MTU (512,
SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
SCTP_MIN_PMTU), but such change belongs to another patch.
Changes from v1:
- do not disable PMTU discovery, in the light of commit
06ad391919b2 ("[SCTP] Don't disable PMTU discovery when mtu is small")
and as suggested by Xin Long.
- changed the way to break the rtx loop by detecting if the icmp
resulted in a change or not
Changes from v2:
none
See-also: https://lkml.org/lkml/2017/12/22/811
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d4761754b4fb2ef8d9a1e9d121c4bec84e1fe292 ]
Mark tcp_sock during a SACK reneging event and invalidate rate samples
while marked. Such rate samples may overestimate bw by including packets
that were SACKed before reneging.
< ack 6001 win 10000 sack 7001:38001
< ack 7001 win 0 sack 8001:38001 // Reneg detected
> seq 7001:8001 // RTO, SACK cleared.
< ack 38001 win 10000
In above example the rate sample taken after the last ack will count
7001-38001 as delivered while the actual delivery rate likely could
be much lower i.e. 7001-8001.
This patch adds a new field tcp_sock.sack_reneg and marks it when we
declare SACK reneging and entering TCP_CA_Loss, and unmarks it after
the last rate sample was taken before moving back to TCP_CA_Open. This
patch also invalidates rate samples taken while tcp_sock.is_sack_reneg
is set.
Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection")
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b5476022bbada3764609368f03329ca287528dc8 ]
IPv4 stack reacts to changes to small MTU, by disabling itself under
RTNL.
But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in igmp code where it is
assumed the mtu is suitable.
Fix this by reading device mtu once and checking IPv4 minimal MTU.
This patch adds missing IPV4_MIN_MTU define, to not abuse
ETH_MIN_MTU anymore.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0c19f846d582af919db66a5914a0189f9f92c936 ]
Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.
Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.
It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
("net: avoid skb_warn_bad_offload false positives on UFO").
(*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.
Tested
Booted a v4.13 guest kernel with QEMU. On a host kernel before this
patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
enabled, same as on a v4.13 host kernel.
A UFO packet sent from the guest appears on the tap device:
host:
nc -l -p -u 8000 &
tcpdump -n -i tap0
guest:
dd if=/dev/zero of=payload.txt bs=1 count=2000
nc -u 192.16.1.1 8000 < payload.txt
Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
packets arriving fragmented:
./with_tap_pair.sh ./tap_send_ufo tap0 tap1
(from https://github.com/wdebruij/kerneltools/tree/master/tests)
Changes
v1 -> v2
- simplified set_offload change (review comment)
- documented test procedure
Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ed66dfaf236c04d414de1d218441296e57fb2bd2 ]
Fix the TLP scheduling logic so that when scheduling a TLP probe, we
ensure that the estimated time at which an RTO would fire accounts for
the fact that ACKs indicating forward progress should push back RTO
times.
After the following fix:
df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
we had an unintentional behavior change in the following kind of
scenario: suppose the RTT variance has been very low recently. Then
suppose we send out a flight of N packets and our RTT is 100ms:
t=0: send a flight of N packets
t=100ms: receive an ACK for N-1 packets
The response before df92c8394e6e that was:
-> schedule a TLP for now + RTO_interval
The response after df92c8394e6e is:
-> schedule a TLP for t=0 + RTO_interval
Since RTO_interval = srtt + RTT_variance, this means that we have
scheduled a TLP timer at a point in the future that only accounts for
RTT_variance. If the RTT_variance term is small, this means that the
timer fires soon.
Before df92c8394e6e this would not happen, because in that code, when
we receive an ACK for a prefix of flight, we did:
1) Near the top of tcp_ack(), switch from TLP timer to RTO
at write_queue_head->paket_tx_time + RTO_interval:
if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
tcp_rearm_rto(sk);
2) In tcp_clean_rtx_queue(), update the RTO to now + RTO_interval:
if (flag & FLAG_ACKED) {
tcp_rearm_rto(sk);
3) In tcp_ack() after tcp_fastretrans_alert() switch from RTO
to TLP at now + RTO_interval:
if (icsk->icsk_pending == ICSK_TIME_RETRANS)
tcp_schedule_loss_probe(sk);
In df92c8394e6e we removed that 3-phase dance, and instead directly
set the TLP timer once: we set the TLP timer in cases like this to
write_queue_head->packet_tx_time + RTO_interval. So if the RTT
variance is small, then this means that this is setting the TLP timer
to fire quite soon. This means if the ACK for the tail of the flight
takes longer than an RTT to arrive (often due to delayed ACKs), then
the TLP timer fires too quickly.
Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Usptream commit b4d1605a8ea608fd7dc45b926a05d75d340bde4b ]
After this fix : ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()"),
socket lookups happen while skb->cb[] has not been mangled yet by TCP.
Fixes: a04a480d4392 ("net: Require exact match for TCP socket lookups if dif is l3mdev")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d7efc6c11b277d9d80b99b1334a78bfe7d7edf10 ]
Alexander Potapenko reported use of uninitialized memory [1]
This happens when inserting a request socket into TCP ehash,
in __sk_nulls_add_node_rcu(), since sk_reuseport is not initialized.
Bug was added by commit d894ba18d4e4 ("soreuseport: fix ordering for
mixed v4/v6 sockets")
Note that d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6
ordering fix") missed the opportunity to get rid of
hlist_nulls_add_tail_rcu() :
Both UDP sockets and TCP/DCCP listeners no longer use
__sk_nulls_add_node_rcu() for their hash insertion.
Since all other sockets have unique 4-tuple, the reuseport status
has no special meaning, so we can always use hlist_nulls_add_head_rcu()
for them and save few cycles/instructions.
[1]
==================================================================
BUG: KMSAN: use of uninitialized memory in inet_ehash_insert+0xd40/0x1050
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3288
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16
dump_stack+0x185/0x1d0 lib/dump_stack.c:52
kmsan_report+0x13f/0x1c0 mm/kmsan/kmsan.c:1016
__msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:766
__sk_nulls_add_node_rcu ./include/net/sock.h:684
inet_ehash_insert+0xd40/0x1050 net/ipv4/inet_hashtables.c:413
reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:754
inet_csk_reqsk_queue_hash_add+0x1cc/0x300 net/ipv4/inet_connection_sock.c:765
tcp_conn_request+0x31e7/0x36f0 net/ipv4/tcp_input.c:6414
tcp_v4_conn_request+0x16d/0x220 net/ipv4/tcp_ipv4.c:1314
tcp_rcv_state_process+0x42a/0x7210 net/ipv4/tcp_input.c:5917
tcp_v4_do_rcv+0xa6a/0xcd0 net/ipv4/tcp_ipv4.c:1483
tcp_v4_rcv+0x3de0/0x4ab0 net/ipv4/tcp_ipv4.c:1763
ip_local_deliver_finish+0x6bb/0xcb0 net/ipv4/ip_input.c:216
NF_HOOK ./include/linux/netfilter.h:248
ip_local_deliver+0x3fa/0x480 net/ipv4/ip_input.c:257
dst_input ./include/net/dst.h:477
ip_rcv_finish+0x6fb/0x1540 net/ipv4/ip_input.c:397
NF_HOOK ./include/linux/netfilter.h:248
ip_rcv+0x10f6/0x15c0 net/ipv4/ip_input.c:488
__netif_receive_skb_core+0x36f6/0x3f60 net/core/dev.c:4298
__netif_receive_skb net/core/dev.c:4336
netif_receive_skb_internal+0x63c/0x19c0 net/core/dev.c:4497
napi_skb_finish net/core/dev.c:4858
napi_gro_receive+0x629/0xa50 net/core/dev.c:4889
e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4018
e1000_clean_rx_irq+0x1492/0x1d30
drivers/net/ethernet/intel/e1000/e1000_main.c:4474
e1000_clean+0x43aa/0x5970 drivers/net/ethernet/intel/e1000/e1000_main.c:3819
napi_poll net/core/dev.c:5500
net_rx_action+0x73c/0x1820 net/core/dev.c:5566
__do_softirq+0x4b4/0x8dd kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364
irq_exit+0x203/0x240 kernel/softirq.c:405
exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:638
do_IRQ+0x15e/0x1a0 arch/x86/kernel/irq.c:263
common_interrupt+0x86/0x86
Fixes: d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets")
Fixes: d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexander Potapenko <glider@google.com>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b9f3eb499d84f8d4adcb2f9212ec655700b28228 upstream.
Move inclusion of a private kernel header <net/tcp.h>
from uapi/linux/tls.h to its only user - net/tls.h,
to fix the following linux/tls.h userspace compilation error:
/usr/include/linux/tls.h:41:21: fatal error: net/tcp.h: No such file or directory
As to this point uapi/linux/tls.h was totaly unusuable for userspace,
cleanup this header file further by moving other redundant includes
to net/tls.h.
Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Instead of holding netns refcnt in tc actions, we can minimize
the holding time by saving it in struct tcf_exts instead. This
means we can just hold netns refcnt right before call_rcu() and
release it after tcf_exts_destroy() is done.
However, because on netns cleanup path we call tcf_proto_destroy()
too, obviously we can not hold netns for a zero refcnt, in this
case we have to do cleanup synchronously. It is fine for RCU too,
the caller cleanup_net() already waits for a grace period.
For other cases, refcnt is non-zero and we can safely grab it as
normal and release it after we are done.
This patch provides two new API for each filter to use:
tcf_exts_get_net() and tcf_exts_put_net(). And all filters now can
use the following pattern:
void __destroy_filter() {
tcf_exts_destroy();
tcf_exts_put_net(); // <== release netns refcnt
kfree();
}
void some_work() {
rtnl_lock();
__destroy_filter();
rtnl_unlock();
}
void some_rcu_callback() {
tcf_queue_work(some_work);
}
if (tcf_exts_get_net()) // <== hold netns refcnt
call_rcu(some_rcu_callback);
else
__destroy_filter();
Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit ceffcc5e254b450e6159f173e4538215cebf1b59.
If we hold that refcnt, the netns can never be destroyed until
all actions are destroyed by user, this breaks our netns design
which we expect all actions are destroyed when we destroy the
whole netns.
Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
"Hopefully this is the last batch of networking fixes for 4.14
Fingers crossed...
1) Fix stmmac to use the proper sized OF property read, from Bhadram
Varka.
2) Fix use after free in net scheduler tc action code, from Cong
Wang.
3) Fix SKB control block mangling in tcp_make_synack().
4) Use proper locking in fib_dump_info(), from Florian Westphal.
5) Fix IPG encodings in systemport driver, from Florian Fainelli.
6) Fix division by zero in NV TCP congestion control module, from
Konstantin Khlebnikov.
7) Fix use after free in nf_reject_ipv4, from Tejaswi Tanikella"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: systemport: Correct IPG length settings
tcp: do not mangle skb->cb[] in tcp_make_synack()
fib: fib_dump_info can no longer use __in_dev_get_rtnl
stmmac: use of_property_read_u32 instead of read_u8
net_sched: hold netns refcnt for each action
net_sched: acquire RTNL in tc_action_net_exit()
net: vrf: correct FRA_L3MDEV encode type
tcp_nv: fix division by zero in tcpnv_acked()
netfilter: nf_reject_ipv4: Fix use-after-free in send_reset
netfilter: nft_set_hash: disable fast_ops for 2-len keys
|
|
TC actions have been destroyed asynchronously for a long time,
previously in a RCU callback and now in a workqueue. If we
don't hold a refcnt for its netns, we could use the per netns
data structure, struct tcf_idrinfo, after it has been freed by
netns workqueue.
Hold refcnt to ensure netns destroy happens after all actions
are gone.
Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Reported-by: Lucas Bates <lucasb@mojatatu.com>
Tested-by: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I forgot to acquire RTNL in tc_action_net_exit()
which leads that action ops->cleanup() is not always
called with RTNL. This usually is not a big deal because
this function is called after all netns refcnt are gone,
but given RTNL protects more than just actions, add it
for safety and consistency.
Also add an assertion to catch other potential bugs.
Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Reported-by: Lucas Bates <lucasb@mojatatu.com>
Tested-by: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull initial SPDX identifiers from Greg KH:
"License cleanup: add SPDX license identifiers to some files
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the
'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
binding shorthand, which can be used instead of the full boiler plate
text.
This patch is based on work done by Thomas Gleixner and Kate Stewart
and Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset
of the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to
license had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied
to a file was done in a spreadsheet of side by side results from of
the output of two independent scanners (ScanCode & Windriver)
producing SPDX tag:value files created by Philippe Ombredanne.
Philippe prepared the base worksheet, and did an initial spot review
of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537
files assessed. Kate Stewart did a file by file comparison of the
scanner results in the spreadsheet to determine which SPDX license
identifier(s) to be applied to the file. She confirmed any
determination that was not immediately clear with lawyers working with
the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained
>5 lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that
became the concluded license(s).
- when there was disagreement between the two scanners (one detected
a license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply
(and which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases,
confirmation by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.
The Windriver scanner is based on an older version of FOSSology in
part, so they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot
checks in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect
the correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial
patch version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch
license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
applied SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
License cleanup: add SPDX license identifier to uapi header files with a license
License cleanup: add SPDX license identifier to uapi header files with no license
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
|
|
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Based on SNMP values provided by Roman, Yuchung made the observation
that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
Looking at tcp_mtu_probe(), I found that when a new skb was placed
in front of the write queue, we were not updating tcp highest sack.
If one skb is freed because all its content was copied to the new skb
(for MTU probing), then tp->highest_sack could point to a now freed skb.
Bad things would then happen, including infinite loops.
This patch renames tcp_highest_sack_combine() and uses it
from tcp_mtu_probe() to fix the bug.
Note that I also removed one test against tp->sacked_out,
since we want to replace tp->highest_sack regardless of whatever
condition, since keeping a stale pointer to freed skb is a recipe
for disaster.
Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: Roman Gushchin <guro@fb.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch introduces a dedicated workqueue for tc filters
so that each tc filter's RCU callback could defer their
action destroy work to this workqueue. The helper
tcf_queue_work() is introduced for them to use.
Because we hold RTNL lock when calling tcf_block_put(), we
can not simply flush works inside it, therefore we have to
defer it again to this workqueue and make sure all flying RCU
callbacks have already queued their work before this one, in
other words, to ensure this is the last one to execute to
prevent any use-after-free.
On the other hand, this makes tcf_block_put() ugly and
harder to understand. Since David and Eric strongly dislike
adding synchronize_rcu(), this is probably the only
solution that could make everyone happy.
Please also see the code comments below.
Reported-by: Chris Mi <chrism@mellanox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These warnings were found by running 'make C=2 M=net/sctp/'.
They are introduced by not aware of Endian when coding stream
reconf patches.
Since commit c0d8bab6ae51 ("sctp: add get and set sockopt for
reconf_enable") enabled stream reconf feature for users, the
Fixes tag below would use it.
Fixes: c0d8bab6ae51 ("sctp: add get and set sockopt for reconf_enable")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SK_SKB program types use bpf_compute_data to store the end of the
packet data. However, bpf_compute_data assumes the cb is stored in the
qdisc layer format. But, for SK_SKB this is the wrong layer of the
stack for this type.
It happens to work (sort of!) because in most cases nothing happens
to be overwritten today. This is very fragile and error prone.
Fortunately, we have another hole in tcp_skb_cb we can use so lets
put the data_end value there.
Note, SK_SKB program types do not use data_meta, they are failed by
sk_skb_is_valid_access().
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
pull-request: mac80211 2017-10-25
Here are:
* follow-up fixes for the WoWLAN security issue, to fix a
partial TKIP key material problem and to use crypto_memneq()
* a change for better enforcement of FQ's memory limit
* a disconnect/connect handling fix, and
* a user rate mask validation fix
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In my first attempt to fix the lockdep splat, I forgot we could
enter inet_csk_route_req() with a freshly allocated request socket,
for which refcount has not yet been elevated, due to complex
SLAB_TYPESAFE_BY_RCU rules.
We either are in rcu_read_lock() section _or_ we own a refcount on the
request.
Correct RCU verb to use here is rcu_dereference_check(), although it is
not possible to prove we actually own a reference on a shared
refcount :/
In v2, I added ireq_opt_deref() helper and use in three places, to fix other
possible splats.
[ 49.844590] lockdep_rcu_suspicious+0xea/0xf3
[ 49.846487] inet_csk_route_req+0x53/0x14d
[ 49.848334] tcp_v4_route_req+0xe/0x10
[ 49.850174] tcp_conn_request+0x31c/0x6a0
[ 49.851992] ? __lock_acquire+0x614/0x822
[ 49.854015] tcp_v4_conn_request+0x5a/0x79
[ 49.855957] ? tcp_v4_conn_request+0x5a/0x79
[ 49.858052] tcp_rcv_state_process+0x98/0xdcc
[ 49.859990] ? sk_filter_trim_cap+0x2f6/0x307
[ 49.862085] tcp_v4_do_rcv+0xfc/0x145
[ 49.864055] ? tcp_v4_do_rcv+0xfc/0x145
[ 49.866173] tcp_v4_rcv+0x5ab/0xaf9
[ 49.868029] ip_local_deliver_finish+0x1af/0x2e7
[ 49.870064] ip_local_deliver+0x1b2/0x1c5
[ 49.871775] ? inet_del_offload+0x45/0x45
[ 49.873916] ip_rcv_finish+0x3f7/0x471
[ 49.875476] ip_rcv+0x3f1/0x42f
[ 49.876991] ? ip_local_deliver_finish+0x2e7/0x2e7
[ 49.878791] __netif_receive_skb_core+0x6d3/0x950
[ 49.880701] ? process_backlog+0x7e/0x216
[ 49.882589] __netif_receive_skb+0x1d/0x5e
[ 49.884122] process_backlog+0x10c/0x216
[ 49.885812] net_rx_action+0x147/0x3df
Fixes: a6ca7abe53633 ("tcp/dccp: fix lockdep splat in inet_csk_route_req()")
Fixes: c92e8c02fe66 ("tcp/dccp: fix ireq->opt races")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <fengguang.wu@intel.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sock lock may be taken in the message timer function which is a
problem since timers run in BH. Instead of timers use delayed_work.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: bbb03029a899 ("strparser: Generalize strparser")
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzkaller found another bug in DCCP/TCP stacks [1]
For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
ireq->opt unless we own the request sock.
Note the opt field is renamed to ireq_opt to ease grep games.
[1]
BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295
CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x25b/0x340 mm/kasan/report.c:409
__asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
__tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x40c341
RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341
RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1
R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000
Allocated by task 3295:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
__do_kmalloc mm/slab.c:3725 [inline]
__kmalloc+0x162/0x760 mm/slab.c:3734
kmalloc include/linux/slab.h:498 [inline]
tcp_v4_save_options include/net/tcp.h:1962 [inline]
tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
Freed by task 3306:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kfree+0xca/0x250 mm/slab.c:3820
inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
__sk_destruct+0xfd/0x910 net/core/sock.c:1560
sk_destruct+0x47/0x80 net/core/sock.c:1595
__sk_free+0x57/0x230 net/core/sock.c:1603
sk_free+0x2a/0x40 net/core/sock.c:1614
sock_put include/net/sock.h:1652 [inline]
inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959
tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765
tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SK_SKB BPF programs are run from the socket/tcp context but early in
the stack before much of the TCP metadata is needed in tcp_skb_cb. So
we can use some unused fields to place BPF metadata needed for SK_SKB
programs when implementing the redirect function.
This allows us to drop the preempt disable logic. It does however
require an API change so sk_redirect_map() has been updated to
additionally provide ctx_ptr to skb. Note, we do however continue to
disable/enable preemption around actual BPF program running to account
for map updates.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The fq structure would fail to properly enforce the memory limit in the case
where the packet being enqueued was bigger than the packet being removed to
bring the memory usage down. So keep dropping packets until the memory usage is
back below the limit. Also, fix the statistics for memory limit violations.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The UDP early demux can leverate the rx dst cache even for
multicast unconnected sockets.
In such scenario the ipv4 source address is validated only on
the first packet in the given flow. After that, when we fetch
the dst entry from the socket rx cache, we stop enforcing
the rp_filter and we even start accepting any kind of martian
addresses.
Disabling the dst cache for unconnected multicast socket will
cause large performace regression, nearly reducing by half the
max ingress tput.
Instead we factor out a route helper to completely validate an
skb source address for multicast packets and we call it from
the UDP early demux for mcast packets landing on unconnected
sockets, after successful fetching the related cached dst entry.
This still gives a measurable, but limited performance
regression:
rp_filter = 0 rp_filter = 1
edmux disabled: 1182 Kpps 1127 Kpps
edmux before: 2238 Kpps 2238 Kpps
edmux after: 2037 Kpps 2019 Kpps
The above figures are on top of current net tree.
Applying the net-next commit 6e617de84e87 ("net: avoid a full
fib lookup when rp_filter is disabled.") the delta with
rp_filter == 0 will decrease even more.
Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently no error is emitted, but this infrastructure will
used by the next patch to allow source address validation
for mcast sockets.
Since early demux can do a route lookup and an ipv4 route
lookup can return an error code this is consistent with the
current ipv4 route infrastructure.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When CONFIG_KASAN is enabled, the "--param asan-stack=1" causes rather large
stack frames in some functions. This goes unnoticed normally because
CONFIG_FRAME_WARN is disabled with CONFIG_KASAN by default as of commit
3f181b4d8652 ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with
KASAN=y").
The kernelci.org build bot however has the warning enabled and that led
me to investigate it a little further, as every build produces these warnings:
net/wireless/nl80211.c:4389:1: warning: the frame size of 2240 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/wireless/nl80211.c:1895:1: warning: the frame size of 3776 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/wireless/nl80211.c:1410:1: warning: the frame size of 2208 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/bridge/br_netlink.c:1282:1: warning: the frame size of 2544 bytes is larger than 2048 bytes [-Wframe-larger-than=]
Most of this problem is now solved in gcc-8, which can consolidate
the stack slots for the inline function arguments. On older compilers
we can add a workaround by declaring a local variable in each function
to pass the inline function argument.
Cc: stable@vger.kernel.org
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In linux-4.13, Wei worked hard to convert dst to a traditional
refcounted model, removing GC.
We now want to make sure a dst refcount can not transition from 0 back
to 1.
The problem here is that input path attached a not refcounted dst to an
skb. Then later, because packet is forwarded and hits skb_dst_force()
before exiting RCU section, we might try to take a refcount on one dst
that is about to be freed, if another cpu saw 1 -> 0 transition in
dst_release() and queued the dst for freeing after one RCU grace period.
Lets unify skb_dst_force() and skb_dst_force_safe(), since we should
always perform the complete check against dst refcount, and not assume
it is not zero.
Bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=197005
[ 989.919496] skb_dst_force+0x32/0x34
[ 989.919498] __dev_queue_xmit+0x1ad/0x482
[ 989.919501] ? eth_header+0x28/0xc6
[ 989.919502] dev_queue_xmit+0xb/0xd
[ 989.919504] neigh_connected_output+0x9b/0xb4
[ 989.919507] ip_finish_output2+0x234/0x294
[ 989.919509] ? ipt_do_table+0x369/0x388
[ 989.919510] ip_finish_output+0x12c/0x13f
[ 989.919512] ip_output+0x53/0x87
[ 989.919513] ip_forward_finish+0x53/0x5a
[ 989.919515] ip_forward+0x2cb/0x3e6
[ 989.919516] ? pskb_trim_rcsum.part.9+0x4b/0x4b
[ 989.919518] ip_rcv_finish+0x2e2/0x321
[ 989.919519] ip_rcv+0x26f/0x2eb
[ 989.919522] ? vlan_do_receive+0x4f/0x289
[ 989.919523] __netif_receive_skb_core+0x467/0x50b
[ 989.919526] ? tcp_gro_receive+0x239/0x239
[ 989.919529] ? inet_gro_receive+0x226/0x238
[ 989.919530] __netif_receive_skb+0x4d/0x5f
[ 989.919532] netif_receive_skb_internal+0x5c/0xaf
[ 989.919533] napi_gro_receive+0x45/0x81
[ 989.919536] ixgbe_poll+0xc8a/0xf09
[ 989.919539] ? kmem_cache_free_bulk+0x1b6/0x1f7
[ 989.919540] net_rx_action+0xf4/0x266
[ 989.919543] __do_softirq+0xa8/0x19d
[ 989.919545] irq_exit+0x5d/0x6b
[ 989.919546] do_IRQ+0x9c/0xb5
[ 989.919548] common_interrupt+0x93/0x93
[ 989.919548] </IRQ>
Similarly dst_clone() can use dst_hold() helper to have additional
debugging, as a follow up to commit 44ebe79149ff ("net: add debug
atomic_inc_not_zero() in dst_hold()")
In net-next we will convert dst atomic_t to refcount_t for peace of
mind.
Fixes: a4c2fd7f7891 ("net: remove DST_NOCACHE flag")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
Bisected-by: Paweł Staszewski <pstaszewski@itcare.pl>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
remove tcp_may_send_now and tcp_snd_test that are no longer used
Fixes: 840a3cbe8969 ("tcp: remove forward retransmit feature")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the
dump") tried to fix an use-after-free issue by checking !sctp_sk(sk)->ep
with holding sock and sock lock.
But Paolo noticed that endpoint could be destroyed in sctp_rcv without
sock lock protection. It means the use-after-free issue still could be
triggered when sctp_rcv put and destroy ep after sctp_sock_dump checks
!ep, although it's pretty hard to reproduce.
I could reproduce it by mdelay in sctp_rcv while msleep in sctp_close
and sctp_sock_dump long time.
This patch is to add another param cb_done to sctp_for_each_transport
and dump ep->assocs with holding tsp after jumping out of transport's
traversal in it to avoid this issue.
It can also improve sctp diag dump to make it run faster, as no need
to save sk into cb->args[5] and keep calling sctp_for_each_transport
any more.
This patch is also to use int * instead of int for the pos argument
in sctp_for_each_transport, which could make postion increment only
in sctp_for_each_transport and no need to keep changing cb->args[2]
in sctp_sock_filter and sctp_sock_dump any more.
Fixes: 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the dump")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This code causes a static checker warning because Smatch doesn't trust
anything that comes from skb->data. I've reviewed this code and I do
think skb->data can be controlled by the user here.
The sctp_event_subscribe struct has 13 __u8 fields and we want to see
if ours is non-zero. sn_type can be any value in the 0-USHRT_MAX range.
We're subtracting SCTP_SN_TYPE_BASE which is 1 << 15 so we could read
either before the start of the struct or after the end.
This is a very old bug and it's surprising that it would go undetected
for so long but my theory is that it just doesn't have a big impact so
it would be hard to notice.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
gen estimator has been rewritten in commit 1c0d32fde5bd
("net_sched: gen_estimator: complete rewrite of rate estimators"),
the caller is no longer needed to wait for a grace period.
So this patch gets rid of it.
This also completely closes a race condition between action free
path and filter chain add/remove path for the following patch.
Because otherwise the nested RCU callback can't be caught by
rcu_barrier().
Please see also the comments in code.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|