diff options
author | David S. Miller <davem@davemloft.net> | 2017-02-07 21:07:56 +0300 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2017-02-07 21:07:56 +0300 |
commit | 29ba6e7400a317725bdfb86a725d1824447dbcd7 (patch) | |
tree | b009850c5a2e7c633a94eeacb71a25f91b4b64f0 /net/ipv4/tcp_input.c | |
parent | b08d46b01e995dd7b653b22d35bd1d958d6ee9b4 (diff) | |
parent | 51ce8bd4d17a761e1a90a34a1b5c9b762cce7553 (diff) | |
download | linux-29ba6e7400a317725bdfb86a725d1824447dbcd7.tar.xz |
Merge branch 'replace-dst_confirm'
Julian Anastasov says:
====================
net: dst_confirm replacement
This patchset addresses the problem of neighbour
confirmation where received replies from one nexthop
can cause confirmation of different nexthop when using
the same dst. Thanks to YueHaibing <yuehaibing@huawei.com>
for tracking the dst->pending_confirm problem.
Sockets can obtain cached output route. Such
routes can be to known nexthop (rt_gateway=IP) or to be
used simultaneously for different nexthop IPs by different
subnet prefixes (nh->nh_scope = RT_SCOPE_HOST, rt_gateway=0).
At first look, there are more problems:
- dst_confirm() sets flag on dst and not on dst->path,
as result, indication is lost when XFRM is used
- DNAT can change the nexthop, so the really used nexthop is
not confirmed
So, the following solution is to avoid using
dst->pending_confirm.
The current dst_confirm() usage is as follows:
Protocols confirming dst on received packets:
- TCP (1 dst per socket)
- SCTP (1 dst per transport)
- CXGB*
Protocols supporting sendmsg with MSG_CONFIRM [ | MSG_PROBE ] to
confirm neighbour:
- UDP IPv4/IPv6
- ICMPv4 PING
- RAW IPv4/IPv6
- L2TP/IPv6
MSG_CONFIRM for other purposes (fix not needed):
- CAN
Sending without locking the socket:
- UDP (when no cork)
- RAW (when hdrincl=1)
Redirects from old to new GW:
- rt6_do_redirect
The patchset includes the following changes:
1. sock: add sk_dst_pending_confirm flag
- used only by TCP with patch 4 to remember the received
indication in sk->sk_dst_pending_confirm
2. net: add dst_pending_confirm flag to skbuff
- skb->dst_pending_confirm will be used by all protocols
in following patches, via skb_{set,get}_dst_pending_confirm
3. sctp: add dst_pending_confirm flag
- SCTP uses per-transport dsts and can not use
sk->sk_dst_pending_confirm like TCP
4. tcp: replace dst_confirm with sk_dst_confirm
5. net: add confirm_neigh method to dst_ops
- IPv4 and IPv6 provision for slow neigh lookups for MSG_PROBE users.
I decided to use neigh lookup only for this case because on
MSG_PROBE the skb may pass MTU checks but it does not reach
the neigh confirmation code. This patch will be used from patch 6.
- xfrm_confirm_neigh: we use the last tunnel address, if present.
When there are only transports, the original dest address is used.
6. net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
- dst_confirm conversion for UDP, RAW, ICMP and L2TP/IPv6
- these protocols use MSG_CONFIRM propagated by ip*_append_data
to skb->dst_pending_confirm. sk->sk_dst_pending_confirm is not
used because some sending paths do not lock the socket. For
MSG_PROBE we use the slow lookup (dst_confirm_neigh).
- there are also 2 cases that need the slow lookup:
__ip6_rt_update_pmtu and rt6_do_redirect. I hope
&ipv6_hdr(skb)->saddr is the correct nexthop address to use here.
7. net: pending_confirm is not used anymore
- I failed to understand the CXGB* code, I see dst_confirm()
calls but I'm not sure dst_neigh_output() was called. For now
I just removed the dst->pending_confirm flag and left all
dst_confirm() calls there. Any better idea?
- Now may be old function neigh_output() should be restored
instead of dst_neigh_output?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4/tcp_input.c')
-rw-r--r-- | net/ipv4/tcp_input.c | 12 |
1 files changed, 3 insertions, 9 deletions
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 27c95acbb52f..2c0ff327b6df 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3644,11 +3644,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) if (tp->tlp_high_seq) tcp_process_tlp_ack(sk, ack, flag); - if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) { - struct dst_entry *dst = __sk_dst_get(sk); - if (dst) - dst_confirm(dst); - } + if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) + sk_dst_confirm(sk); if (icsk->icsk_pending == ICSK_TIME_RETRANS) tcp_schedule_loss_probe(sk); @@ -5995,7 +5992,6 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) break; case TCP_FIN_WAIT1: { - struct dst_entry *dst; int tmo; /* If we enter the TCP_FIN_WAIT1 state and we are a @@ -6022,9 +6018,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) tcp_set_state(sk, TCP_FIN_WAIT2); sk->sk_shutdown |= SEND_SHUTDOWN; - dst = __sk_dst_get(sk); - if (dst) - dst_confirm(dst); + sk_dst_confirm(sk); if (!sock_flag(sk, SOCK_DEAD)) { /* Wake up lingering close() */ |