summaryrefslogtreecommitdiff
path: root/net/ipv6/addrconf.c
AgeCommit message (Collapse)AuthorFilesLines
2017-10-12ipv6: addrconf: don't use rtnl mutex in RTM_GETADDRFlorian Westphal1-7/+7
Similar to the previous patch, use the device lookup functions that bump device refcount and flag this as DOIT_UNLOCKED to avoid rtnl mutex. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12ipv6: addrconf: don't use rtnl mutex in RTM_GETNETCONFFlorian Westphal1-10/+14
Instead of relying on rtnl mutex bump device reference count. After this change, values reported can change in parallel, but thats not much different from current state, as anyone can change the settings right after rtnl_unlock (and before userspace processed reply). While at it, switch to GFP_KERNEL allocation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+2
2017-10-09ipv6: avoid cache line dirtying in ipv6_dev_get_saddr()Eric Dumazet1-11/+6
By extending the rcu section a bit, we can avoid these very expensive in6_ifa_put()/in6_ifa_hold() calls done in __ipv6_dev_get_saddr() and ipv6_dev_get_saddr() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09ipv6: __ipv6_dev_get_saddr() rcu conversionEric Dumazet1-3/+1
Callers hold rcu_read_lock(), so we do not need the rcu_read_lock()/rcu_read_unlock() pair. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09ipv6: ipv6_chk_prefix() rcu conversionEric Dumazet1-5/+3
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09ipv6: ipv6_chk_custom_prefix() rcu conversionEric Dumazet1-5/+3
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09ipv6: ipv6_count_addresses() rcu conversionEric Dumazet1-6/+6
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09ipv6: prepare RCU lookups for idev->addr_listEric Dumazet1-15/+6
inet6_ifa_finish_destroy() already uses kfree_rcu() to free inet6_ifaddr structs. We need to use proper list additions/deletions in order to allow readers to use RCU instead of idev->lock rwlock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08ipv6: fix net.ipv6.conf.all.accept_dad behaviour for realMatteo Croce1-2/+2
Commit 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers") was intended to affect accept_dad flag handling in such a way that DAD operation and mode on a given interface would be selected according to the maximum value of conf/{all,interface}/accept_dad. However, addrconf_dad_begin() checks for particular cases in which we need to skip DAD, and this check was modified in the wrong way. Namely, it was modified so that, if the accept_dad flag is 0 for the given interface *or* for all interfaces, DAD would be skipped. We have instead to skip DAD if accept_dad is 0 for the given interface *and* for all interfaces. Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers") Acked-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Reported-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-07ipv6: replace rwlock with rcu and spinlock in fib6_tableWei Wang1-6/+5
With all the preparation work before, we are now ready to replace rwlock with rcu and spinlock in fib6_table. That means now all fib6_node in fib6_table are protected by rcu. And when freeing fib6_node, call_rcu() is used to wait for the rcu grace period before releasing the memory. When accessing fib6_node, corresponding rcu APIs need to be used. And all previous sessions protected by the write lock will now be protected by the spin lock per table. All previous sessions protected by read lock will now be protected by rcu_read_lock(). A couple of things to note here: 1. As part of the work of replacing rwlock with rcu, the linked list of fn->leaf now has to be rcu protected as well. So both fn->leaf and rt->dst.rt6_next are now __rcu tagged and corresponding rcu APIs are used when manipulating them. 2. For fn->rr_ptr, first of all, it also needs to be rcu protected now and is tagged with __rcu and rcu APIs are used in corresponding places. Secondly, fn->rr_ptr is changed in rt6_select() which is a reader thread. This makes the issue a bit complicated. We think a valid solution for it is to let rt6_select() grab the tb6_lock if it decides to change it. As it is not in the normal operation and only happens when there is no valid neighbor cache for the route, we think the performance impact should be low. 3. fib6_walk_continue() has to be called with tb6_lock held even in the route dumping related functions, e.g. inet6_dump_fib(), fib6_tables_dump() and ipv6_route_seq_ops. It is because fib6_walk_continue() makes modifications to the walker structure, and so are fib6_repair_tree() and fib6_del_route(). In order to do proper syncing between them, we need to let fib6_walk_continue() hold the lock. We may be able to do further improvement on the way we do the tree walk to get rid of the need for holding the spin lock. But not for now. 4. When fib6_del_route() removes a route from the tree, we no longer mark rt->dst.rt6_next to NULL to make simultaneous reader be able to further traverse the list with rcu. However, rt->dst.rt6_next is only valid within this same rcu period. No one should access it later. 5. All the operation of atomic_inc(rt->rt6i_ref) is changed to be performed before we publish this route (either by linking it to fn->leaf or insert it in the list pointed by fn->leaf) just to be safe because as soon as we publish the route, some read thread will be able to access it. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-07ipv6: replace dst_hold() with dst_hold_safe() in routing codeWei Wang1-1/+2
With rwlock, it is safe to call dst_hold() in the read thread because read thread is guaranteed to be separated from write thread. However, after we replace rwlock with rcu, it is no longer safe to use dst_hold(). A dst might already have been deleted but is waiting for the rcu grace period to pass before freeing the memory when a read thread is trying to do dst_hold(). This could potentially cause double free issue. So this commit replaces all dst_hold() with dst_hold_safe() in all read thread to avoid this double free issue. And in order to make the code more compact, a new function ip6_hold_safe() is introduced. It calls dst_hold_safe() first, and if that fails, it will either fall back to hold and return net->ipv6.ip6_null_entry or set rt to NULL according to the caller's need. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-07ipv6: hook up exception table to store dst cacheWei Wang1-1/+0
This commit makes use of the exception hash table implementation to store dst caches created by pmtu discovery and ip redirect into the hash table under the rt_info and no longer inserts these routes into fib6 tree. This makes the fib6 tree only contain static configured routes and could now be protected by rcu instead of a rw lock. With this change, in the route lookup related functions, after finding the rt6_info with the longest prefix, we also need to search for the exception table before doing backtracking. In the route delete function, if the route being deleted is not a dst cache, deletion of this route also need to flush the whole hash table under it. If it is a dst cache, then only delete the cached dst in the hash table. Note: for fib6_walk_continue() function, w->root now is always pointing to a root node considering that fib6_prune_clones() is removed from the code. So we add a WARN_ON() msg to make sure w->root always points to a root node and also removed the update of w->root in fib6_repair_tree(). This is a prerequisite for later patch because we don't need to make w->root as rcu protected when replacing rwlock with RCU. Also, we remove all prune related variables as it is no longer used. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-07ipv6: prepare fib6_locate() for exception tableWei Wang1-1/+1
fib6_locate() is used to find the fib6_node according to the passed in prefix address key. It currently tries to find the fib6_node with the exact match of the passed in key. However, when we move cached routes into the exception table, fib6_locate() will fail to find the fib6_node for it as the cached routes will be stored in the exception table under the fib6_node with the longest prefix match of the cache's dst addr key. This commit adds a new parameter to let the caller specify if it needs exact match or longest prefix match. Right now, all callers still does exact match when calling fib6_locate(). It will be changed in later commit where exception table is hooked up to store cached routes. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04rtnetlink: remove __rtnl_af_unregisterFlorian Westphal1-2/+2
switch the only caller to rtnl_af_unregister. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-30net: ipv6: send NS for DAD when link operationally upMike Manning1-6/+6
The NS for DAD are sent on admin up as long as a valid qdisc is found. A race condition exists by which these packets will not egress the interface if the operational state of the lower device is not yet up. The solution is to delay DAD until the link is operationally up according to RFC2863. Rather than only doing this, follow the existing code checks by deferring IPv6 device initialization altogether. The fix allows DAD on devices like tunnels that are controlled by userspace control plane. The fix has no impact on regular deployments, but means that there is no IPv6 connectivity until the port has been opened in the case of port-based network access control, which should be desirable. Signed-off-by: Mike Manning <mmanning@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26ipv6: Remove redundant unlikely()Tobias Klauser1-1/+1
IS_ERR() already implies unlikely(), so it can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20ipv6: fix net.ipv6.conf.all interface DAD handlersMatteo Croce1-7/+20
Currently, writing into net.ipv6.conf.all.{accept_dad,use_optimistic,optimistic_dad} has no effect. Fix handling of these flags by: - using the maximum of global and per-interface values for the accept_dad flag. That is, if at least one of the two values is non-zero, enable DAD on the interface. If at least one value is set to 2, enable DAD and disable IPv6 operation on the interface if MAC-based link-local address was found - using the logical OR of global and per-interface values for the optimistic_dad flag. If at least one of them is set to one, optimistic duplicate address detection (RFC 4429) is enabled on the interface - using the logical OR of global and per-interface values for the use_optimistic flag. If at least one of them is set to one, optimistic addresses won't be marked as deprecated during source address selection on the interface. While at it, as we're modifying the prototype for ipv6_use_optimistic_addr(), drop inline, and let the compiler decide. Fixes: 7fd2561e4ebd ("net: ipv6: Add a sysctl to make optimistic addresses useful candidates") Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20net: ipv6: fix regression of no RTM_DELADDR sent after DAD failureMike Manning1-2/+3
Commit f784ad3d79e5 ("ipv6: do not send RTM_DELADDR for tentative addresses") incorrectly assumes that no RTM_NEWADDR are sent for addresses in tentative state, as this does happen for the standard IPv6 use-case of DAD failure, see the call to ipv6_ifa_notify() in addconf_dad_stop(). So as a result of this change, no RTM_DELADDR is sent after DAD failure for a link-local when strict DAD (accept_dad=2) is configured, or on the next admin down in other cases. The absence of this notification breaks backwards compatibility and causes problems after DAD failure if this notification was being relied on. The solution is to allow RTM_DELADDR to still be sent after DAD failure. Fixes: f784ad3d79e5 ("ipv6: do not send RTM_DELADDR for tentative addresses") Signed-off-by: Mike Manning <mmanning@brocade.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Three cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29ipv6: fix sparse warning on rt6i_nodeWei Wang1-1/+1
Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This generates a new sparse warning on rt->rt6i_node related code: net/ipv6/route.c:1394:30: error: incompatible types in comparison expression (different address spaces) ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison expression (different address spaces) This commit adds "__rcu" tag for rt6i_node and makes sure corresponding rcu API is used for it. After this fix, sparse no longer generates the above warning. Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-21net: ipv6: put host and anycast routes on device with addressDavid Ahern1-42/+0
One nagging difference between ipv4 and ipv6 is host routes for ipv6 addresses are installed using the loopback device or VRF / L3 Master device. e.g., 2001:db8:1::/120 dev veth0 proto kernel metric 256 pref medium local 2001:db8:1::1 dev lo table local proto kernel metric 0 pref medium Using the loopback device is convenient -- necessary for local tx, but has some nasty side effects, most notably setting the 'lo' device down causes all host routes for all local IPv6 address to be removed from the FIB and completely breaks IPv6 networking across all interfaces. This patch puts FIB entries for IPv6 routes against the device. This simplifies the routes in the FIB, for example by making dst->dev and rt6i_idev->dev the same (a future patch can look at removing the device reference taken for rt6i_idev for FIB entries). When copies are made on FIB lookups, the cloned route has dst->dev set to loopback (or the L3 master device). This is needed for the local Tx of packets to local addresses. With fib entries allocated against the real network device, the addrconf code that reinserts host routes on admin up of 'lo' is no longer needed. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10rtnetlink: make rtnl_register accept a flags parameterFlorian Westphal1-7/+7
This change allows us to later indicate to rtnetlink core that certain doit functions should be called without acquiring rtnl_mutex. This change should have no effect, we simply replace the last (now unused) calcit argument with the new flag. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04ipv6: Regenerate host route according to node pointer upon interface upIdo Schimmel1-2/+2
When an interface is brought back up, the kernel tries to restore the host routes tied to its permanent addresses. However, if the host route was removed from the FIB, then we need to reinsert it. This is done by releasing the current dst and allocating a new, so as to not reuse a dst with obsolete values. Since this function is called under RTNL and using the same explanation from the previous patch, we can test if the route is in the FIB by checking its node pointer instead of its reference count. Tested using the following script and Andrey's reproducer mentioned in commit 8048ced9beb2 ("net: ipv6: regenerate host route if moved to gc list") and linked below: $ ip link set dev lo up $ ip link add dummy1 type dummy $ ip -6 address add cafe::1/64 dev dummy1 $ ip link set dev lo down # cafe::1/128 is removed $ ip link set dev dummy1 up $ ip link set dev lo up The host route is correctly regenerated. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Link: http://lkml.kernel.org/r/CAAeHK+zSe82vc5gCRgr_EoUwiALPnWVdWJBPwJZBpbxYz=kGJw@mail.gmail.com Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04ipv6: Regenerate host route according to node pointer upon loopback upIdo Schimmel1-1/+1
When the loopback device is brought back up we need to check if the host route attached to the address is still in the FIB and regenerate one in case it's not. Host routes using the loopback device are always inserted into and removed from the FIB under RTNL (under which this function is called), so we can test their node pointer instead of the reference count in order to check if the route is in the FIB or not. Tested using the following script from Nicolas mentioned in commit a220445f9f43 ("ipv6: correctly add local routes when lo goes up"): $ ip link add dummy1 type dummy $ ip link set dummy1 up $ ip link set lo down ; ip link set lo up The host route is correctly regenerated. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, ipv6: convert inet6_ifaddr.refcnt from atomic_t to refcount_tReshetova, Elena1-1/+1
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, ipv6: convert inet6_dev.refcnt from atomic_t to refcount_tReshetova, Elena1-1/+1
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03ipv6: dad: don't remove dynamic addresses if link is downSabrina Dubroca1-9/+9
Currently, when the link for $DEV is down, this command succeeds but the address is removed immediately by DAD (1): ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 In the same situation, this will succeed and not remove the address (2): ip addr add 1111::12/64 dev $DEV ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 The comment in addrconf_dad_begin() when !IF_READY makes it look like this is the intended behavior, but doesn't explain why: * If the device is not ready: * - keep it tentative if it is a permanent address. * - otherwise, kill it. We clearly cannot prevent userspace from doing (2), but we can make (1) work consistently with (2). addrconf_dad_stop() is only called in two cases: if DAD failed, or to skip DAD when the link is down. In that second case, the fix is to avoid deleting the address, like we already do for permanent addresses. Fixes: 3c21edbd1137 ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+3
A set of overlapping changes in macvlan and the rocker driver, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22ipv6: avoid unregistering inet6_dev for loopbackWANG Cong1-2/+3
The per netns loopback_dev->ip6_ptr is unregistered and set to NULL when its mtu is set to smaller than IPV6_MIN_MTU, this leads to that we could set rt->rt6i_idev NULL after a rt6_uncached_list_flush_dev() and then crash after another call. In this case we should just bring its inet6_dev down, rather than unregistering it, at least prior to commit 176c39af29bc ("netns: fix addrconf_ifdown kernel panic") we always override the case for loopback. Thanks a lot to Andrey for finding a reliable reproducer. Fixes: 176c39af29bc ("netns: fix addrconf_ifdown kernel panic") Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Daniel Lezcano <dlezcano@fr.ibm.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+3
Two entries being added at the same time to the IFLA policy table, whilst parallel bug fixes to decnet routing dst handling overlapping with the dst gc removal in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-18ipv6: call dst_hold_safe() properlyWei Wang1-2/+2
Similar as ipv4, ipv6 path also needs to call dst_hold_safe() when necessary to avoid double free issue on the dst. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15ipv6: fix calling in6_ifa_hold incorrectly for dad workXin Long1-3/+3
Now when starting the dad work in addrconf_mod_dad_work, if the dad work is idle and queued, it needs to hold ifa. The problem is there's one gap in [1], during which if the pending dad work is removed elsewhere. It will miss to hold ifa, but the dad word is still idea and queue. if (!delayed_work_pending(&ifp->dad_work)) in6_ifa_hold(ifp); <--------------[1] mod_delayed_work(addrconf_wq, &ifp->dad_work, delay); An use-after-free issue can be caused by this. Chen Wei found this issue when WARN_ON(!hlist_unhashed(&ifp->addr_lst)) in net6_ifa_finish_destroy was hit because of it. As Hannes' suggestion, this patch is to fix it by holding ifa first in addrconf_mod_dad_work, then calling mod_delayed_work and putting ifa if the dad_work is already in queue. Note that this patch did not choose to fix it with: if (!mod_delayed_work(delay)) in6_ifa_hold(ifp); As with it, when delay == 0, dad_work would be scheduled immediately, all addrconf_mod_dad_work(0) callings had to be moved under ifp->lock. Reported-by: Wei Chen <weichen@redhat.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09Ipvlan should return an error when an address is already in use.Krister Johansen1-1/+16
The ipvlan code already knows how to detect when a duplicate address is about to be assigned to an ipvlan device. However, that failure is not propogated outward and leads to a silent failure. Introduce a validation step at ip address creation time and allow device drivers to register to validate the incoming ip addresses. The ipvlan code is the first consumer. If it detects an address in use, we can return an error to the user before beginning to commit the new ifa in the networking code. This can be especially useful if it is necessary to provision many ipvlans in containers. The provisioning software (or operator) can use this to detect situations where an ip address is unexpectedly in use. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-22net: ipv6: Plumb extack through route add functionsDavid Ahern1-2/+2
Plumb extack argument down to route add functions. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15ipv6: avoid dad-failures for addresses with NODADMahesh Bandewar1-1/+4
Every address gets added with TENTATIVE flag even for the addresses with IFA_F_NODAD flag and dad-work is scheduled for them. During this DAD process we realize it's an address with NODAD and complete the process without sending any probe. However the TENTATIVE flags stays on the address for sometime enough to cause misinterpretation when we receive a NS. While processing NS, if the address has TENTATIVE flag, we mark it DADFAILED and endup with an address that was originally configured as NODAD with DADFAILED. We can't avoid scheduling dad_work for addresses with NODAD but we can avoid adding TENTATIVE flag to avoid this racy situation. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notfWANG Cong1-0/+1
For each netns (except init_net), we initialize its null entry in 3 places: 1) The template itself, as we use kmemdup() 2) Code around dst_init_metrics() in ip6_route_net_init() 3) ip6_route_dev_notify(), which is supposed to initialize it after loopback registers Unfortunately the last one still happens in a wrong order because we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to net->loopback_dev's idev, thus we have to do that after we add idev to loopback. However, this notifier has priority == 0 same as ipv6_dev_notf, and ipv6_dev_notf is registered after ip6_route_dev_notifier so it is called actually after ip6_route_dev_notifier. This is similar to commit 2f460933f58e ("ipv6: initialize route null entry in addrconf_init()") which fixes init_net. Fix it by picking a smaller priority for ip6_route_dev_notifier. Also, we have to release the refcnt accordingly when unregistering loopback_dev because device exit functions are called before subsys exit functions. Acked-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-04ipv6: initialize route null entry in addrconf_init()WANG Cong1-0/+2
Andrey reported a crash on init_net.ipv6.ip6_null_entry->rt6i_idev since it is always NULL. This is clearly wrong, we have code to initialize it to loopback_dev, unfortunately the order is still not correct. loopback_dev is registered very early during boot, we lose a chance to re-initialize it in notifier. addrconf_init() is called after ip6_route_init(), which means we have no chance to correct it. Fix it by moving this initialization explicitly after ipv6_add_dev(init_net.loopback_dev) in addrconf_init(). Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-03net: ipv6: Do not duplicate DAD on link upDavid Ahern1-2/+3
Andrey reported a warning triggered by the rcu code: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 5911 at lib/debugobjects.c:289 debug_print_object+0x175/0x210 ODEBUG: activate active (active state 1) object type: rcu_head hint: (null) Modules linked in: CPU: 1 PID: 5911 Comm: a.out Not tainted 4.11.0-rc8+ #271 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x192/0x22d lib/dump_stack.c:52 __warn+0x19f/0x1e0 kernel/panic.c:549 warn_slowpath_fmt+0xe0/0x120 kernel/panic.c:564 debug_print_object+0x175/0x210 lib/debugobjects.c:286 debug_object_activate+0x574/0x7e0 lib/debugobjects.c:442 debug_rcu_head_queue kernel/rcu/rcu.h:75 __call_rcu.constprop.76+0xff/0x9c0 kernel/rcu/tree.c:3229 call_rcu_sched+0x12/0x20 kernel/rcu/tree.c:3288 rt6_rcu_free net/ipv6/ip6_fib.c:158 rt6_release+0x1ea/0x290 net/ipv6/ip6_fib.c:188 fib6_del_route net/ipv6/ip6_fib.c:1461 fib6_del+0xa42/0xdc0 net/ipv6/ip6_fib.c:1500 __ip6_del_rt+0x100/0x160 net/ipv6/route.c:2174 ip6_del_rt+0x140/0x1b0 net/ipv6/route.c:2187 __ipv6_ifa_notify+0x269/0x780 net/ipv6/addrconf.c:5520 addrconf_ifdown+0xe60/0x1a20 net/ipv6/addrconf.c:3672 ... Andrey's reproducer program runs in a very tight loop, calling 'unshare -n' and then spawning 2 sets of 14 threads running random ioctl calls. The relevant networking sequence: 1. New network namespace created via unshare -n - ip6tnl0 device is created in down state 2. address added to ip6tnl0 - equivalent to ip -6 addr add dev ip6tnl0 fd00::bb/1 - DAD is started on the address and when it completes the host route is inserted into the FIB 3. ip6tnl0 is brought up - the new fixup_permanent_addr function restarts DAD on the address 4. exit namespace - teardown / cleanup sequence starts - once in a blue moon, lo teardown appears to happen BEFORE teardown of ip6tunl0 + down on 'lo' removes the host route from the FIB since the dst->dev for the route is loobback + host route added to rcu callback list * rcu callback has not run yet, so rt is NOT on the gc list so it has NOT been marked obsolete 5. in parallel to 4. worker_thread runs addrconf_dad_completed - DAD on the address on ip6tnl0 completes - calls ipv6_ifa_notify which inserts the host route All of that happens very quickly. The result is that a host route that has been deleted from the IPv6 FIB and added to the RCU list is re-inserted into the FIB. The exit namespace eventually gets to cleaning up ip6tnl0 which removes the host route from the FIB again, calls the rcu function for cleanup -- and triggers the double rcu trace. The root cause is duplicate DAD on the address -- steps 2 and 3. Arguably, DAD should not be started in step 2. The interface is in the down state, so it can not really send out requests for the address which makes starting DAD pointless. Since the second DAD was introduced by a recent change, seems appropriate to use it for the Fixes tag and have the fixup function only start DAD for addresses in the PREDAD state which occurs in addrconf_ifdown if the address is retained. Big thanks to Andrey for isolating a reliable reproducer for this problem. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsahern@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+12
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-25net: ipv6: regenerate host route if moved to gc listDavid Ahern1-2/+12
Taking down the loopback device wreaks havoc on IPv6 routing. By extension, taking down a VRF device wreaks havoc on its table. Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6 FIB code while running syzkaller fuzzer. The root cause is a dead dst that is on the garbage list gets reinserted into the IPv6 FIB. While on the gc (or perhaps when it gets added to the gc list) the dst->next is set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the out-of-bounds access. Andrey's reproducer was the key to getting to the bottom of this. With IPv6, host routes for an address have the dst->dev set to the loopback device. When the 'lo' device is taken down, rt6_ifdown initiates a walk of the fib evicting routes with the 'lo' device which means all host routes are removed. That process moves the dst which is attached to an inet6_ifaddr to the gc list and marks it as dead. The recent change to keep global IPv6 addresses added a new function, fixup_permanent_addr, that is called on admin up. That function restarts dad for an inet6_ifaddr and when it completes the host route attached to it is inserted into the fib. Since the route was marked dead and moved to the gc list, re-inserting the route causes the reported out-of-bounds accesses. If the device with the address is taken down or the address is removed, the WARN_ON in fib6_del is triggered. All of those faults are fixed by regenerating the host route if the existing one has been moved to the gc list, something that can be determined by checking if the rt6i_ref counter is 0. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net: rtnetlink: plumb extended ack to doit functionDavid Ahern1-8/+12
Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse for doit functions that call it directly. This is the first step to using extended error reporting in rtnetlink. >From here individual subsystems can be updated to set netlink_ext_ack as needed. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17Merge branch 'for-upstream' of ↵David S. Miller1-6/+17
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-04-14 Here's the main batch of Bluetooth & 802.15.4 patches for the 4.12 kernel. - Many fixes to 6LoWPAN, in particular for BLE - New CA8210 IEEE 802.15.4 device driver (accounting for most of the lines of code added in this pull request) - Added Nokia Bluetooth (UART) HCI driver - Some serdev & TTY changes that are dependencies for the Nokia driver (with acks from relevant maintainers and an agreement that these come through the bluetooth tree) - Support for new Intel Bluetooth device - Various other minor cleanups/fixes here and there Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-4/+7
Conflicts were simply overlapping changes. In the net/ipv4/route.c case the code had simply moved around a little bit and the same fix was made in both 'net' and 'net-next'. In the net/sched/sch_generic.c case a fix in 'net' happened at the same time that a new argument was added to qdisc_hash_add(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13netlink: pass extended ACK struct to parsing functionsJohannes Berg1-6/+10
Pass the new extended ACK reporting struct to all of the generic netlink parsing functions. For now, pass NULL in almost all callers (except for some in the core.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-126lowpan: Fix IID format for BluetoothLuiz Augusto von Dentz1-1/+5
According to RFC 7668 U/L bit shall not be used: https://wiki.tools.ietf.org/html/rfc7668#section-3.2.2 [Page 10]: In the figure, letter 'b' represents a bit from the Bluetooth device address, copied as is without any changes on any bit. This means that no bit in the IID indicates whether the underlying Bluetooth device address is public or random. |0 1|1 3|3 4|4 6| |0 5|6 1|2 7|8 3| +----------------+----------------+----------------+----------------+ |bbbbbbbbbbbbbbbb|bbbbbbbb11111111|11111110bbbbbbbb|bbbbbbbbbbbbbbbb| +----------------+----------------+----------------+----------------+ Because of this the code cannot figure out the address type from the IP address anymore thus it makes no sense to use peer_lookup_ba as it needs the peer address type. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-04-12ipv6: addrconf: fix 48 bit 6lowpan autoconfigurationAlexander Aring1-5/+12
This patch adds support for 48 bit 6LoWPAN address length autoconfiguration which is the case for BTLE 6LoWPAN. Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2017-04-12ipv6: Fix idev->addr_list corruptionRabin Vincent1-4/+7
addrconf_ifdown() removes elements from the idev->addr_list without holding the idev->lock. If this happens while the loop in __ipv6_dev_get_saddr() is handling the same element, that function ends up in an infinite loop: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [test:1719] Call Trace: ipv6_get_saddr_eval+0x13c/0x3a0 __ipv6_dev_get_saddr+0xe4/0x1f0 ipv6_dev_get_saddr+0x1b4/0x204 ip6_dst_lookup_tail+0xcc/0x27c ip6_dst_lookup_flow+0x38/0x80 udpv6_sendmsg+0x708/0xba8 sock_sendmsg+0x18/0x30 SyS_sendto+0xb8/0xf8 syscall_common+0x34/0x58 Fixes: 6a923934c33 (Revert "ipv6: Revert optional address flusing on ifdown.") Signed-off-by: Rabin Vincent <rabinv@axis.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29net: ipv6: Add support for RTM_DELNETCONFDavid Ahern1-5/+15
Send RTM_DELNETCONF notifications when a device is deleted. The message only needs the device index, so modify inet6_netconf_fill_devconf to skip devconf references if it is NULL. Allows a userspace cache to remove entries as devices are deleted. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29net: ipv6: Refactor inet6_netconf_notify_devconf to take eventDavid Ahern1-11/+22
Refactor inet6_netconf_notify_devconf to take the event as an input arg. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>