Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 01ea306f2ac2baff98d472da719193e738759d93 upstream.
The Syzbot reported a possible deadlock in the netfilter area caused by
rtnl lock, xt lock and socket lock being acquired with a different order
on different code paths, leading to the following backtrace:
Reviewed-by: Xin Long <lucien.xin@gmail.com>
======================================================
WARNING: possible circular locking dependency detected
4.15.0+ #301 Not tainted
------------------------------------------------------
syzkaller233489/4179 is trying to acquire lock:
(rtnl_mutex){+.+.}, at: [<0000000048e996fd>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
but task is already holding lock:
(&xt[i].mutex){+.+.}, at: [<00000000328553a2>]
xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041
which lock already depends on the new lock.
===
Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock
only in the required scope"), we already acquire the socket lock in
the innermost scope, where needed. In such commit I forgot to remove
the outer-most socket lock from the getsockopt() path, this commit
addresses the issues dropping it now.
v1 -> v2: fix bad subj, added relavant 'fixes' tag
Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the required scope")
Reported-by: syzbot+ddde1c7b7ff7442d7f2d@syzkaller.appspotmail.com
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 3f34cfae1238848fd53f25e5c8fd59da57901f4b upstream.
Syzbot reported several deadlocks in the netfilter area caused by
rtnl lock and socket lock being acquired with a different order on
different code paths, leading to backtraces like the following one:
======================================================
WARNING: possible circular locking dependency detected
4.15.0-rc9+ #212 Not tainted
------------------------------------------------------
syzkaller041579/3682 is trying to acquire lock:
(sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] lock_sock
include/net/sock.h:1463 [inline]
(sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>]
do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
but task is already holding lock:
(rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (rtnl_mutex){+.+.}:
__mutex_lock_common kernel/locking/mutex.c:756 [inline]
__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607
tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106
xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845
check_target net/ipv6/netfilter/ip6_tables.c:538 [inline]
find_check_entry.isra.7+0x935/0xcf0
net/ipv6/netfilter/ip6_tables.c:580
translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749
do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline]
do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691
nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928
udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
SYSC_setsockopt net/socket.c:1849 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1828
entry_SYSCALL_64_fastpath+0x29/0xa0
-> #0 (sk_lock-AF_INET6){+.+.}:
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
lock_sock_nested+0xc2/0x110 net/core/sock.c:2780
lock_sock include/net/sock.h:1463 [inline]
do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
SYSC_setsockopt net/socket.c:1849 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1828
entry_SYSCALL_64_fastpath+0x29/0xa0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(rtnl_mutex);
lock(sk_lock-AF_INET6);
lock(rtnl_mutex);
lock(sk_lock-AF_INET6);
*** DEADLOCK ***
1 lock held by syzkaller041579/3682:
#0: (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
The problem, as Florian noted, is that nf_setsockopt() is always
called with the socket held, even if the lock itself is required only
for very tight scopes and only for some operation.
This patch addresses the issues moving the lock_sock() call only
where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt()
does not need anymore to acquire both locks.
Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
Reported-by: syzbot+a4c2dc980ac1af699b36@syzkaller.appspotmail.com
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.2: Drop changes to ipv6_getorigdst()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit bcfd09f7837f5240c30fd2f52ee7293516641faa upstream.
Currently esp will happily create an xfrm state with an unknown
encap type for IPv4, without setting the necessary state parameters.
This patch fixes it by returning -EINVAL.
There is a similar problem in IPv6 where if the mode is unknown
we will skip initialisation while returning zero. However, this
is harmless as the mode has already been checked further up the
stack. This patch removes this anomaly by aligning the IPv6
behaviour with IPv4 and treating unknown modes (which cannot
actually happen) as transport mode.
Fixes: 38320c70d282 ("[IPSEC]: Use crypto_aead and authenc in ESP")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit acf568ee859f098279eadf551612f103afdacb4e upstream.
This is an old bugbear of mine:
https://www.mail-archive.com/netdev@vger.kernel.org/msg03894.html
By crafting special packets, it is possible to cause recursion
in our kernel when processing transport-mode packets at levels
that are only limited by packet size.
The easiest one is with DNAT, but an even worse one is where
UDP encapsulation is used in which case you just have to insert
an UDP encapsulation header in between each level of recursion.
This patch avoids this problem by reinjecting tranport-mode packets
through a tasklet.
Fixes: b05e106698d9 ("[IPV4/6]: Netfilter IPsec input hooks")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
[bwh: Backported to 3.2:
- netfilter finish callbacks only receive an sk_buff pointer
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 30791ac41927ebd3e75486f9504b6d2280463bf0 upstream.
The MD5-key that belongs to a connection is identified by the peer's
IP-address. When we are in tcp_v4(6)_reqsk_send_ack(), we are replying
to an incoming segment from tcp_check_req() that failed the seq-number
checks.
Thus, to find the correct key, we need to use the skb's saddr and not
the daddr.
This bug seems to have been there since quite a while, but probably got
unnoticed because the consequences are not catastrophic. We will call
tcp_v4_reqsk_send_ack only to send a challenge-ACK back to the peer,
thus the connection doesn't really fail.
Fixes: 9501f9722922 ("tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 32a805baf0fb70b6dbedefcd7249ac7f580f9e3b upstream.
IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ.
Fixes: ba1cc08d9488 ("ipv6: fix memory leak with multiple tables during netns destruction")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit ba1cc08d9488c94cb8d94f545305688b72a2a300 upstream.
fib6_net_exit only frees the main and local tables. If another table was
created with fib6_alloc_table, we leak it when the netns is destroyed.
Fix this in the same way ip_fib_net_exit cleans up tables, by walking
through the whole hashtable of fib6_table's. We can get rid of the
special cases for local and main, since they're also part of the
hashtable.
Reproducer:
ip netns add x
ip -net x -6 rule add from 6003:1::/64 table 100
ip netns del x
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 58f09b78b730 ("[NETNS][IPV6] ip6_fib - make it per network namespace")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- No need to call inetpeer_invalidate_tree()
- Add the extra iterator variable needed by hlist_for_each_entry_safe()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 4e587ea71bf924f7dac621f1351653bd41e446cb upstream.
Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This
generates a new sparse warning on rt->rt6i_node related code:
net/ipv6/route.c:1394:30: error: incompatible types in comparison
expression (different address spaces)
./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
expression (different address spaces)
This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
rcu API is used for it.
After this fix, sparse no longer generates the above warning.
Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- fib6_add_rt2node() has only one assignment to update
- Drop changes in rt6_cache_allowed_for_pmtu()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 3614364527daa870264f6dde77f02853cdecd02c upstream.
rt_cookie might be used uninitialized, fix this by
initializing it.
Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit c5cff8561d2d0006e972bd114afd51f082fee77c upstream.
We currently keep rt->rt6i_node pointing to the fib6_node for the route.
And some functions make use of this pointer to dereference the fib6_node
from rt structure, e.g. rt6_check(). However, as there is neither
refcount nor rcu taken when dereferencing rt->rt6i_node, it could
potentially cause crashes as rt->rt6i_node could be set to NULL by other
CPUs when doing a route deletion.
This patch introduces an rcu grace period before freeing fib6_node and
makes sure the functions that dereference it takes rcu_read_lock().
Note: there is no "Fixes" tag because this bug was there in a very
early stage.
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit b197df4f0f3782782e9ea8996e91b65ae33e8dd9 upstream.
Instead of doing the rt6->rt6i_node check whenever we need
to get the route's cookie. Refactor it into rt6_get_cookie().
It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also
percpu rt6_info later.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Drop changes in inet6_sk_rx_dst_set(), sctp_v6_get_dst()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 3de33e1ba0506723ab25734e098cf280ecc34756 upstream.
A packet length of exactly IPV6_MAXPLEN is allowed, we should
refuse parsing options only if the size is 64KiB or more.
While at it, remove one extra variable and one assignment which
were also introduced by the commit that introduced the size
check. Checking the sum 'offset + len' and only later adding
'len' to 'offset' doesn't provide any advantage over directly
summing to 'offset' and checking it.
Fixes: 6399f1fae4ec ("ipv6: avoid overflow of offset in ip6_find_1stfragopt")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit ec8add2a4c9df723c94a863b8fcd6d93c472deed upstream.
Currently, when the link for $DEV is down, this command succeeds but the
address is removed immediately by DAD (1):
ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800
In the same situation, this will succeed and not remove the address (2):
ip addr add 1111::12/64 dev $DEV
ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800
The comment in addrconf_dad_begin() when !IF_READY makes it look like
this is the intended behavior, but doesn't explain why:
* If the device is not ready:
* - keep it tentative if it is a permanent address.
* - otherwise, kill it.
We clearly cannot prevent userspace from doing (2), but we can make (1)
work consistently with (2).
addrconf_dad_stop() is only called in two cases: if DAD failed, or to
skip DAD when the link is down. In that second case, the fix is to avoid
deleting the address, like we already do for permanent addresses.
Fixes: 3c21edbd1137 ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 3d171f3907329d4b1ce31d5ec9c852c5f0269578 upstream.
The userspace needs to know why is the address being removed so that it can
perhaps obtain a new address.
Without the DADFAILED flag it's impossible to distinguish removal of a
temporary and tentative address due to DAD failure from other reasons (device
removed, manual address removal).
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 60abc0be96e00ca71bac083215ac91ad2e575096 upstream.
The per netns loopback_dev->ip6_ptr is unregistered and set to
NULL when its mtu is set to smaller than IPV6_MIN_MTU, this
leads to that we could set rt->rt6i_idev NULL after a
rt6_uncached_list_flush_dev() and then crash after another
call.
In this case we should just bring its inet6_dev down, rather
than unregistering it, at least prior to commit 176c39af29bc
("netns: fix addrconf_ifdown kernel panic") we always
override the case for loopback.
Thanks a lot to Andrey for finding a reliable reproducer.
Fixes: 176c39af29bc ("netns: fix addrconf_ifdown kernel panic")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: the NETDEV_CHANGEMTU case used to fall-through to the
NETDEV_DOWN case here, so replace that with a separate call to addrconf_ifdown()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit e3e86b5119f81e5e2499bea7ea1ebe8ac6aab789 upstream.
If ip6_find_1stfragopt() fails and we return an error we have to free
up 'segs' because nobody else is going to.
Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 6e80ac5cc992ab6256c3dae87f7e57db15e1a58c upstream.
xfrm6_find_1stfragopt() may now return an error code and we must
not treat it as a length.
Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit 6399f1fae4ec29fab5ec76070435555e256ca3a6 upstream.
In some cases, offset can overflow and can cause an infinite loop in
ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.
This problem has been here since before the beginning of git history.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 5b8481fa42ac58484d633b558579e302aead64c1 upstream.
Since that change also made the nfrag function not necessary
for exports, remove it.
Fixes: 89a23c8b528b ("ip6_tunnel: Fix missing tunnel encapsulation limit option")
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 89a23c8b528bd2c89f3981573d6cd7d23840c8a6 upstream.
The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
IPV6_TLV_PADN options when an encapsulation limit is defined (the
default is a limit of 4). An MTU adjustment is done to account for
these options as well. However, the options are never present in the
generated packets.
The issue appears to be a subtlety between IPV6_DSTOPTS and
IPV6_RTHDRDSTOPTS defined in RFC 3542. When the IPIP tunnel driver was
written, the encap limit options were included as IPV6_RTHDRDSTOPTS in
dst0opt of struct ipv6_txoptions. Later, ipv6_push_nfrags_opts was
(correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS
are to be used. This caused the options to no longer be included in v6
encapsulated packets.
The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions)
instead. IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement.
Fixes: 1df64a8569c7: ("[IPV6]: Add ip6ip6 tunnel driver.")
Fixes: 333fad5364d6: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)")
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 4ee39733fbecf04cf9f346de2d64788c35028079 upstream.
Anycast routes have the RTF_ANYCAST flag set, but when dumping routes
for userspace the route type is not set to RTN_ANYCAST. Make it so.
Fixes: 58c4fb86eabcb ("[IPV6]: Flag RTF_ANYCAST for anycast routes")
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 4b3b45edba9222e518a1ec72df841eba3609fe34 upstream.
commit c146066ab802 ("ipv4: Don't use ufo handling on later transformed
packets") and commit f89c56ce710a ("ipv6: Don't use ufo handling on
later transformed packets") added a check that 'rt->dst.header_len' isn't
zero in order to skip UFO, but it doesn't include IPcomp in transport mode
where it equals zero.
Packets, after payload compression, may not require further fragmentation,
and if original length exceeds MTU, later compressed packets will be
transmitted incorrectly. This can be reproduced with LTP udp_ipsec.sh test
on veth device with enabled UFO, MTU is 1500 and UDP payload is 2000:
* IPv4 case, offset is wrong + unnecessary fragmentation
udp_ipsec.sh -p comp -m transport -s 2000 &
tcpdump -ni ltp_ns_veth2
...
IP (tos 0x0, ttl 64, id 45203, offset 0, flags [+],
proto Compressed IP (108), length 49)
10.0.0.2 > 10.0.0.1: IPComp(cpi=0x1000)
IP (tos 0x0, ttl 64, id 45203, offset 1480, flags [none],
proto UDP (17), length 21) 10.0.0.2 > 10.0.0.1: ip-proto-17
* IPv6 case, sending small fragments
udp_ipsec.sh -6 -p comp -m transport -s 2000 &
tcpdump -ni ltp_ns_veth2
...
IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
payload length: 37) fd00::2 > fd00::1: IPComp(cpi=0x1000)
IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
payload length: 21) fd00::2 > fd00::1: IPComp(cpi=0x1000)
Fix it by checking 'rt->dst.xfrm' pointer to 'xfrm_state' struct, skip UFO
if xfrm is set. So the new check will include both cases: IPcomp and IPsec.
Fixes: c146066ab802 ("ipv4: Don't use ufo handling on later transformed packets")
Fixes: f89c56ce710a ("ipv6: Don't use ufo handling on later transformed packets")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 232cd35d0804cc241eb887bb8d4d9b3b9881c64a upstream.
Andrey Konovalov and idaifish@gmail.com reported crashes caused by
one skb shared_info being overwritten from __ip6_append_data()
Andrey program lead to following state :
copy -4200 datalen 2000 fraglen 2040
maxfraglen 2040 alloclen 2048 transhdrlen 0 offset 0 fraggap 6200
The skb_copy_and_csum_bits(skb_prev, maxfraglen, data + transhdrlen,
fraggap, 0); is overwriting skb->head and skb_shared_info
Since we apparently detect this rare condition too late, move the
code earlier to even avoid allocating skb and risking crashes.
Once again, many thanks to Andrey and syzkaller team.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: <idaifish@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 83eaddab4378db256d00d295bda6ca997cd13a52 upstream.
Like commit 657831ffc38e ("dccp/tcp: do not inherit mc_list from parent")
we should clear ipv6_mc_list etc. for IPv6 sockets too.
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 7dd7eb9513bd02184d45f000ab69d78cb1fa1531 upstream.
Do not use unsigned variables to see if it returns a negative
error or not.
Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filenames, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 2423496af35d94a87156b063ea5cedffc10a70a1 upstream.
The KASAN warning repoted below was discovered with a syzkaller
program. The reproducer is basically:
int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
send(s, &one_byte_of_data, 1, MSG_MORE);
send(s, &more_than_mtu_bytes_data, 2000, 0);
The socket() call sets the nexthdr field of the v6 header to
NEXTHDR_HOP, the first send call primes the payload with a non zero
byte of data, and the second send call triggers the fragmentation path.
The fragmentation code tries to parse the header options in order
to figure out where to insert the fragment option. Since nexthdr points
to an invalid option, the calculation of the size of the network header
can made to be much larger than the linear section of the skb and data
is read outside of it.
This fix makes ip6_find_1stfrag return an error if it detects
running out-of-bounds.
[ 42.361487] ==================================================================
[ 42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730
[ 42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789
[ 42.366469]
[ 42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41
[ 42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[ 42.368824] Call Trace:
[ 42.369183] dump_stack+0xb3/0x10b
[ 42.369664] print_address_description+0x73/0x290
[ 42.370325] kasan_report+0x252/0x370
[ 42.370839] ? ip6_fragment+0x11c8/0x3730
[ 42.371396] check_memory_region+0x13c/0x1a0
[ 42.371978] memcpy+0x23/0x50
[ 42.372395] ip6_fragment+0x11c8/0x3730
[ 42.372920] ? nf_ct_expect_unregister_notifier+0x110/0x110
[ 42.373681] ? ip6_copy_metadata+0x7f0/0x7f0
[ 42.374263] ? ip6_forward+0x2e30/0x2e30
[ 42.374803] ip6_finish_output+0x584/0x990
[ 42.375350] ip6_output+0x1b7/0x690
[ 42.375836] ? ip6_finish_output+0x990/0x990
[ 42.376411] ? ip6_fragment+0x3730/0x3730
[ 42.376968] ip6_local_out+0x95/0x160
[ 42.377471] ip6_send_skb+0xa1/0x330
[ 42.377969] ip6_push_pending_frames+0xb3/0xe0
[ 42.378589] rawv6_sendmsg+0x2051/0x2db0
[ 42.379129] ? rawv6_bind+0x8b0/0x8b0
[ 42.379633] ? _copy_from_user+0x84/0xe0
[ 42.380193] ? debug_check_no_locks_freed+0x290/0x290
[ 42.380878] ? ___sys_sendmsg+0x162/0x930
[ 42.381427] ? rcu_read_lock_sched_held+0xa3/0x120
[ 42.382074] ? sock_has_perm+0x1f6/0x290
[ 42.382614] ? ___sys_sendmsg+0x167/0x930
[ 42.383173] ? lock_downgrade+0x660/0x660
[ 42.383727] inet_sendmsg+0x123/0x500
[ 42.384226] ? inet_sendmsg+0x123/0x500
[ 42.384748] ? inet_recvmsg+0x540/0x540
[ 42.385263] sock_sendmsg+0xca/0x110
[ 42.385758] SYSC_sendto+0x217/0x380
[ 42.386249] ? SYSC_connect+0x310/0x310
[ 42.386783] ? __might_fault+0x110/0x1d0
[ 42.387324] ? lock_downgrade+0x660/0x660
[ 42.387880] ? __fget_light+0xa1/0x1f0
[ 42.388403] ? __fdget+0x18/0x20
[ 42.388851] ? sock_common_setsockopt+0x95/0xd0
[ 42.389472] ? SyS_setsockopt+0x17f/0x260
[ 42.390021] ? entry_SYSCALL_64_fastpath+0x5/0xbe
[ 42.390650] SyS_sendto+0x40/0x50
[ 42.391103] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 42.391731] RIP: 0033:0x7fbbb711e383
[ 42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383
[ 42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003
[ 42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018
[ 42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad
[ 42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00
[ 42.397257]
[ 42.397411] Allocated by task 3789:
[ 42.397702] save_stack_trace+0x16/0x20
[ 42.398005] save_stack+0x46/0xd0
[ 42.398267] kasan_kmalloc+0xad/0xe0
[ 42.398548] kasan_slab_alloc+0x12/0x20
[ 42.398848] __kmalloc_node_track_caller+0xcb/0x380
[ 42.399224] __kmalloc_reserve.isra.32+0x41/0xe0
[ 42.399654] __alloc_skb+0xf8/0x580
[ 42.400003] sock_wmalloc+0xab/0xf0
[ 42.400346] __ip6_append_data.isra.41+0x2472/0x33d0
[ 42.400813] ip6_append_data+0x1a8/0x2f0
[ 42.401122] rawv6_sendmsg+0x11ee/0x2db0
[ 42.401505] inet_sendmsg+0x123/0x500
[ 42.401860] sock_sendmsg+0xca/0x110
[ 42.402209] ___sys_sendmsg+0x7cb/0x930
[ 42.402582] __sys_sendmsg+0xd9/0x190
[ 42.402941] SyS_sendmsg+0x2d/0x50
[ 42.403273] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 42.403718]
[ 42.403871] Freed by task 1794:
[ 42.404146] save_stack_trace+0x16/0x20
[ 42.404515] save_stack+0x46/0xd0
[ 42.404827] kasan_slab_free+0x72/0xc0
[ 42.405167] kfree+0xe8/0x2b0
[ 42.405462] skb_free_head+0x74/0xb0
[ 42.405806] skb_release_data+0x30e/0x3a0
[ 42.406198] skb_release_all+0x4a/0x60
[ 42.406563] consume_skb+0x113/0x2e0
[ 42.406910] skb_free_datagram+0x1a/0xe0
[ 42.407288] netlink_recvmsg+0x60d/0xe40
[ 42.407667] sock_recvmsg+0xd7/0x110
[ 42.408022] ___sys_recvmsg+0x25c/0x580
[ 42.408395] __sys_recvmsg+0xd6/0x190
[ 42.408753] SyS_recvmsg+0x2d/0x50
[ 42.409086] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 42.409513]
[ 42.409665] The buggy address belongs to the object at ffff88000969e780
[ 42.409665] which belongs to the cache kmalloc-512 of size 512
[ 42.410846] The buggy address is located 24 bytes inside of
[ 42.410846] 512-byte region [ffff88000969e780, ffff88000969e980)
[ 42.411941] The buggy address belongs to the page:
[ 42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
[ 42.413298] flags: 0x100000000008100(slab|head)
[ 42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c
[ 42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000
[ 42.415074] page dumped because: kasan: bad access detected
[ 42.415604]
[ 42.415757] Memory state around the buggy address:
[ 42.416222] ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 42.416904] ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 42.418273] ^
[ 42.418588] ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 42.419273] ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 42.419882] ==================================================================
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filenames, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 9c8bb163ae784be4f79ae504e78c862806087c54 upstream.
In function igmpv3/mld_add_delrec() we allocate pmc and put it in
idev->mc_tomb, so we should free it when we don't need it in del_delrec().
But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.
Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when ...")
Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when ...")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 1666d49e1d416fcc2cce708242a52fe3317ea8ba upstream.
This is an IPv6 version of commit 24803f38a5c0 ("igmp: do not remove igmp
souce list..."). In mld_del_delrec(), we will restore back all source filter
info instead of flush them.
Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
we should not remove source list info when set link down. Remove
igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
ipv6_mc_down().
Also clear all source info after igmp6_group_dropped() instead of in it
because ipv6_mc_down() will call igmp6_group_dropped().
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Timer code moved around in ipv6_mc_down() is different
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
[ Upstream commit 63117f09c768be05a0bf465911297dc76394f686 ]
Casting is a high precedence operation but "off" and "i" are in terms of
bytes so we need to have some parenthesis here.
Fixes: fbfa743a9d2a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
[ Upstream commit fbfa743a9d2a0ffa24251764f10afc13eb21e739 ]
This function suffers from multiple issues.
First one is that pskb_may_pull() may reallocate skb->head,
so the 'raw' pointer needs either to be reloaded or not used at all.
Second issue is that NEXTHDR_DEST handling does not validate
that the options are present in skb->data, so we might read
garbage or access non existent memory.
With help from Willem de Bruijn.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
[ Upstream commit 03e4deff4987f79c34112c5ba4eb195d4f9382b0 ]
Just like commit 4acd4945cd1e ("ipv6: addrconf: Avoid calling
netdevice notifiers with RCU read-side lock"), it is unnecessary
to make addrconf_disable_change() use RCU iteration over the
netdev list, since it already holds the RTNL lock, or we may meet
Illegal context switch in RCU read-side critical section.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
[ Upstream commit a98f91758995cb59611e61318dddd8a6956b52c3 ]
By setting certain socket options on ipv6 raw sockets, we can confuse the
length calculation in rawv6_push_pending_frames triggering a BUG_ON.
RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282
RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
Call Trace:
[<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
[<ffffffff81772697>] inet_sendmsg+0x67/0xa0
[<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
[<ffffffff816d982f>] SYSC_sendto+0xef/0x170
[<ffffffff816da27e>] SyS_sendto+0xe/0x10
[<ffffffff81002910>] do_syscall_64+0x50/0xa0
[<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.
Reproducer:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#define LEN 504
int main(int argc, char* argv[])
{
int fd;
int zero = 0;
char buf[LEN];
memset(buf, 0, LEN);
fd = socket(AF_INET6, SOCK_RAW, 7);
setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);
sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
[ Upstream commit b5c2d49544e5930c96e2632a7eece3f4325a1888 ]
If an ip6 tunnel is configured to inherit the traffic class from
the inner header, the dst_cache must be disabled or it will foul
the policy routing.
The issue is apprently there since at leat Linux-2.6.12-rc2.
Reported-by: Liam McBirnie <liam.mcbirnie@boeing.com>
Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
[ Upstream commit 70a0dec45174c976c64b4c8c1d0898581f759948 ]
This fixes wrong-interface signaling on 32-bit platforms for entries
created when jiffies > 2^31 + MFC_ASSERT_THRESH.
Signed-off-by: Tom Goff <thomas.goff@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 57ea52a865144aedbcd619ee0081155e658b6f7d upstream.
The GRO fast path caches the frag0 address. This address becomes
invalid if frag0 is modified by pskb_may_pull or its variants.
So whenever that happens we must disable the frag0 optimization.
This is usually done through the combination of gro_header_hard
and gro_header_slow, however, the IPv6 extension header path did
the pulling directly and would continue to use the GRO fast path
incorrectly.
This patch fixes it by disabling the fast path when we enter the
IPv6 extension header path.
Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit ac6e780070e30e4c35bd395acfe9191e6268bdd3 upstream.
With syzkaller help, Marco Grassi found a bug in TCP stack,
crashing in tcp_collapse()
Root cause is that sk_filter() can truncate the incoming skb,
but TCP stack was not really expecting this to happen.
It probably was expecting a simple DROP or ACCEPT behavior.
We first need to make sure no part of TCP header could be removed.
Then we need to adjust TCP_SKB_CB(skb)->end_seq
Many thanks to syzkaller team and Marco for giving us a reproducer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Marco Grassi <marco.gra@gmail.com>
Reported-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 23f4ffedb7d751c7e298732ba91ca75d224bc1a6 upstream.
skb->cb may contain data from previous layers. In the observed scenario,
the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
that small packets sent through the tunnel are mistakenly fragmented.
This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: apply to ip6_tnl_xmit2()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit f89c56ce710afa65e1b2ead555b52c4807f34ff7 upstream.
Similar to commit c146066ab802 ("ipv4: Don't use ufo handling on later
transformed packets"), don't perform UFO on packets that will be IPsec
transformed. To detect it we rely on the fact that headerlen in
dst_entry is non-zero only for transformation bundles (xfrm_dst
objects).
Unwanted segmentation can be observed with a NETIF_F_UFO capable device,
such as a dummy device:
DEV=dum0 LEN=1493
ip li add $DEV type dummy
ip addr add fc00::1/64 dev $DEV nodad
ip link set $DEV up
ip xfrm policy add dir out src fc00::1 dst fc00::2 \
tmpl src fc00::1 dst fc00::2 proto esp spi 1
ip xfrm state add src fc00::1 dst fc00::2 \
proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
tcpdump -n -nn -i $DEV -t &
socat /dev/zero,readbytes=$LEN udp6:[fc00::2]:$LEN
tcpdump output before:
IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448
IP6 fc00::1 > fc00::2: frag (1448|48)
IP6 fc00::1 > fc00::2: ESP(spi=0x00000001,seq=0x2), length 88
... and after:
IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448
IP6 fc00::1 > fc00::2: frag (1448|80)
Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 2cf750704bb6d7ed8c7d732e071dd1bc890ea5e8 upstream.
Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
instead of the previous dst_pid which was copied from in_skb's portid.
Since the skb is new the portid is 0 at that point so the packets are sent
to the kernel and we get scheduling while atomic or a deadlock (depending
on where it happens) by trying to acquire rtnl two times.
Also since this is RTM_GETROUTE, it can be triggered by a normal user.
Here's the sleeping while atomic trace:
[ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
[ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[ 7858.212881] 2 locks held by swapper/0/0:
[ 7858.213013] #0: (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
[ 7858.213422] #1: (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
[ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
[ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 7858.214108] 0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
[ 7858.214412] ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
[ 7858.214716] 000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
[ 7858.215251] Call Trace:
[ 7858.215412] <IRQ> [<ffffffff813a7804>] dump_stack+0x85/0xc1
[ 7858.215662] [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
[ 7858.215868] [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
[ 7858.216072] [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
[ 7858.216279] [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
[ 7858.216487] [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
[ 7858.216687] [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
[ 7858.216900] [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
[ 7858.217128] [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
[ 7858.217351] [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
[ 7858.217581] [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
[ 7858.217785] [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
[ 7858.217990] [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
[ 7858.218192] [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
[ 7858.218415] [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
[ 7858.218656] [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
[ 7858.218865] [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
[ 7858.219068] [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
[ 7858.219269] [<ffffffff8107a948>] irq_exit+0xb8/0xc0
[ 7858.219463] [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
[ 7858.219678] [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
[ 7858.219897] <EOI> [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
[ 7858.220165] [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
[ 7858.220373] [<ffffffff810298e3>] default_idle+0x23/0x190
[ 7858.220574] [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
[ 7858.220790] [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
[ 7858.221016] [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
[ 7858.221257] [<ffffffff8164f995>] rest_init+0x135/0x140
[ 7858.221469] [<ffffffff81f83014>] start_kernel+0x50e/0x51b
[ 7858.221670] [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
[ 7858.221894] [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
[ 7858.222113] [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a
Fixes: 2942e9005056 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Use 'pid' instead of 'portid' where necessary
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 54d83fc74aa9ec72794373cb47432c5f7fb1a309 upstream.
Ben Hawkes says:
In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
is possible for a user-supplied ipt_entry structure to have a large
next_offset field. This field is not bounds checked prior to writing a
counter value at the supplied offset.
Problem is that mark_source_chains should not have been called --
the rule doesn't have a next entry, so its supposed to return
an absolute verdict of either ACCEPT or DROP.
However, the function conditional() doesn't work as the name implies.
It only checks that the rule is using wildcard address matching.
However, an unconditional rule must also not be using any matches
(no -m args).
The underflow validator only checked the addresses, therefore
passing the 'unconditional absolute verdict' test, while
mark_source_chains also tested for presence of matches, and thus
proceeeded to the next (not-existent) rule.
Unify this so that all the callers have same idea of 'unconditional rule'.
Reported-by: Ben Hawkes <hawkes@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
[ Upstream commit 3ba3458fb9c050718b95275a3310b74415e767e2 ]
When sending a UDPv6 message longer than MTU, account for the length
of fragmentable IPv6 extension headers in skb->network_header offset.
Same as we do in alloc_new_skb path in __ip6_append_data().
This ensures that later on __ip6_make_skb() will make space in
headroom for fragmentable extension headers:
/* move skb->data to ip header from ext header */
if (skb->data < skb_network_header(skb))
__skb_pull(skb, skb_network_offset(skb));
Prevents a splat due to skb_under_panic:
skbuff: skb_under_panic: text:ffffffff8143397b len:2126 put:14 \
head:ffff880005bacf50 data:ffff880005bacf4a tail:0x48 end:0xc0 dev:lo
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] KASAN
CPU: 0 PID: 160 Comm: reproducer Not tainted 4.6.0-rc2 #65
[...]
Call Trace:
[<ffffffff813eb7b9>] skb_push+0x79/0x80
[<ffffffff8143397b>] eth_header+0x2b/0x100
[<ffffffff8141e0d0>] neigh_resolve_output+0x210/0x310
[<ffffffff814eab77>] ip6_finish_output2+0x4a7/0x7c0
[<ffffffff814efe3a>] ip6_output+0x16a/0x280
[<ffffffff815440c1>] ip6_local_out+0xb1/0xf0
[<ffffffff814f1115>] ip6_send_skb+0x45/0xd0
[<ffffffff81518836>] udp_v6_send_skb+0x246/0x5d0
[<ffffffff8151985e>] udpv6_sendmsg+0xa6e/0x1090
[...]
Reported-by: Ji Jianwen <jiji@redhat.com>
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
[ Upstream commit 8013d1d7eafb0589ca766db6b74026f76b7f5cb4 ]
Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.
RFC 4861, 6.3.4. Processing Received Router Advertisements
A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
and Retrans Timer) may contain a value denoting that it is
unspecified. In such cases, the parameter should be ignored and the
host should continue using whatever value it is already using.
If the received Cur Hop Limit value is non-zero, the host SHOULD set
its CurHopLimit variable to the received value.
So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust filename, context
- Number DEVCONF enumerators explicitly to match upstream]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
[ Upstream commit 1cdda91871470f15e79375991bd2eddc6e86ddb1 ]
Currently, the egress interface index specified via IPV6_PKTINFO
is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
can be subverted when the user space application calls connect()
before sendmsg().
Fix it by initializing properly flowi6_oif in connect() before
performing the route lookup.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
[ Upstream commit 34ae6a1aa0540f0f781dd265366036355fdc8930 ]
When a tunnel decapsulates the outer header, it has to comply
with RFC 6080 and eventually propagate CE mark into inner header.
It turns out IP6_ECN_set_ce() does not correctly update skb->csum
for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
messages and stack traces.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust context
- Add skb argument to other callers of IP6_ECN_set_ce()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 6e94e0cfb0887e4013b3b930fa6ab1fe6bb6ba91 upstream.
Otherwise this function may read data beyond the ruleset blob.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit bdf533de6968e9686df777dc178486f600c6e617 upstream.
We should check that e->target_offset is sane before
mark_source_chains gets called since it will fetch the target entry
for loop detection.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 1837b2e2bcd23137766555a63867e649c0b637f0 upstream.
The current reserved_tailroom calculation fails to take hlen and tlen into
account.
skb:
[__hlen__|__data____________|__tlen___|__extra__]
^ ^
head skb_end_offset
In this representation, hlen + data + tlen is the size passed to alloc_skb.
"extra" is the extra space made available in __alloc_skb because of
rounding up by kmalloc. We can reorder the representation like so:
[__hlen__|__data____________|__extra__|__tlen___]
^ ^
head skb_end_offset
The maximum space available for ip headers and payload without
fragmentation is min(mtu, data + extra). Therefore,
reserved_tailroom
= data + extra + tlen - min(mtu, data + extra)
= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
Compare the second line to the current expression:
reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
and we can see that hlen and tlen are not taken into account.
The min() in the third line can be expanded into:
if mtu < skb_tailroom - tlen:
reserved_tailroom = skb_tailroom - mtu
else:
reserved_tailroom = tlen
Depending on hlen, tlen, mtu and the number of multicast address records,
the current code may output skbs that have less tailroom than
dev->needed_tailroom or it may output more skbs than needed because not all
space available is used.
Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 3e4006f0b86a5ae5eb0e8215f9a9e1db24506977 upstream.
When first SYNACK is sent, we already hold rcu_read_lock(), but this
is not true if a SYNACK is retransmitted, as a timer (soft) interrupt
does not hold rcu_read_lock()
Fixes: 45f6fad84cc30 ("ipv6: add complete rcu protection around np->opt")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 197c949e7798fbf28cfadc69d9ca0c2abbf93191 upstream.
Backport of this upstream commit into stable kernels :
89c22d8c3b27 ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.
In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
msg->msg_iov);
returns -EFAULT.
This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.
For the time being, instead reverting Herbert Xu patch and add back
skb->ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.
This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
This reverts commit 127500d724f8c43f452610c9080444eedb5eaa6c. That fixed
the problem of buffer over-reads introduced by backporting commit
89c22d8c3b27 ("net: Fix skb csum races when peeking"), but resulted in
incorrect checksumming for short reads. It will be replaced with a
complete fix.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|