Age | Commit message (Collapse) | Author | Files | Lines |
|
Add support for IFA_RT_PRIORITY to ipv6 addresses.
If the metric is changed on an existing address then the new route
is inserted before removing the old one. Since the metric is one
of the route keys, the prefix route can not be atomically replaced.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update inet6_addr_modify to take ifa6_config argument versus a parameter
list. This is an argument move only; no functional change intended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the creation of struct ifa6_config up to callers of inet6_addr_add.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move config parameters for adding an ipv6 address to a struct. struct
names stem from inet6_rtm_newaddr which is the modern handler for
adding an address.
Start the conversion to ifa6_config with ipv6_add_addr. This is an argument
move only; no functional change intended. Mapping of variable changes:
addr --> cfg->pfx
peer_addr --> cfg->peer_pfx
pfxlen --> cfg->plen
flags --> cfg->ifa_flags
scope, valid_lft, prefered_lft have the same names within cfg
(with corrected spelling).
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
the message be freed immediately, no need to trim it
back to the previous size.
Inspired by commit 7a9b3ec1e19f ("nl80211: remove unnecessary genlmsg_cancel() calls")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
seg6_do_srh_encap and seg6_do_srh_inline can possibly do an
out-of-bounds access when adding the SRH to the packet. This no longer
happen when expanding the skb not only by the size of the SRH (+
outer IPv6 header), but also by skb->mac_len.
[ 53.793056] BUG: KASAN: use-after-free in seg6_do_srh_encap+0x284/0x620
[ 53.794564] Write of size 14 at addr ffff88011975ecfa by task ping/674
[ 53.796665] CPU: 0 PID: 674 Comm: ping Not tainted 4.17.0-rc3-ARCH+ #90
[ 53.796670] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[ 53.796673] Call Trace:
[ 53.796679] <IRQ>
[ 53.796689] dump_stack+0x71/0xab
[ 53.796700] print_address_description+0x6a/0x270
[ 53.796707] kasan_report+0x258/0x380
[ 53.796715] ? seg6_do_srh_encap+0x284/0x620
[ 53.796722] memmove+0x34/0x50
[ 53.796730] seg6_do_srh_encap+0x284/0x620
[ 53.796741] ? seg6_do_srh+0x29b/0x360
[ 53.796747] seg6_do_srh+0x29b/0x360
[ 53.796756] seg6_input+0x2e/0x2e0
[ 53.796765] lwtunnel_input+0x93/0xd0
[ 53.796774] ipv6_rcv+0x690/0x920
[ 53.796783] ? ip6_input+0x170/0x170
[ 53.796791] ? eth_gro_receive+0x2d0/0x2d0
[ 53.796800] ? ip6_input+0x170/0x170
[ 53.796809] __netif_receive_skb_core+0xcc0/0x13f0
[ 53.796820] ? netdev_info+0x110/0x110
[ 53.796827] ? napi_complete_done+0xb6/0x170
[ 53.796834] ? e1000_clean+0x6da/0xf70
[ 53.796845] ? process_backlog+0x129/0x2a0
[ 53.796853] process_backlog+0x129/0x2a0
[ 53.796862] net_rx_action+0x211/0x5c0
[ 53.796870] ? napi_complete_done+0x170/0x170
[ 53.796887] ? run_rebalance_domains+0x11f/0x150
[ 53.796891] __do_softirq+0x10e/0x39e
[ 53.796894] do_softirq_own_stack+0x2a/0x40
[ 53.796895] </IRQ>
[ 53.796898] do_softirq.part.16+0x54/0x60
[ 53.796900] __local_bh_enable_ip+0x5b/0x60
[ 53.796903] ip6_finish_output2+0x416/0x9f0
[ 53.796906] ? ip6_dst_lookup_flow+0x110/0x110
[ 53.796909] ? ip6_sk_dst_lookup_flow+0x390/0x390
[ 53.796911] ? __rcu_read_unlock+0x66/0x80
[ 53.796913] ? ip6_mtu+0x44/0xf0
[ 53.796916] ? ip6_output+0xfc/0x220
[ 53.796918] ip6_output+0xfc/0x220
[ 53.796921] ? ip6_finish_output+0x2b0/0x2b0
[ 53.796923] ? memcpy+0x34/0x50
[ 53.796926] ip6_send_skb+0x43/0xc0
[ 53.796929] rawv6_sendmsg+0x1216/0x1530
[ 53.796932] ? __orc_find+0x6b/0xc0
[ 53.796934] ? rawv6_rcv_skb+0x160/0x160
[ 53.796937] ? __rcu_read_unlock+0x66/0x80
[ 53.796939] ? __rcu_read_unlock+0x66/0x80
[ 53.796942] ? is_bpf_text_address+0x1e/0x30
[ 53.796944] ? kernel_text_address+0xec/0x100
[ 53.796946] ? __kernel_text_address+0xe/0x30
[ 53.796948] ? unwind_get_return_address+0x2f/0x50
[ 53.796950] ? __save_stack_trace+0x92/0x100
[ 53.796954] ? save_stack+0x89/0xb0
[ 53.796956] ? kasan_kmalloc+0xa0/0xd0
[ 53.796958] ? kmem_cache_alloc+0xd2/0x1f0
[ 53.796961] ? prepare_creds+0x23/0x160
[ 53.796963] ? __x64_sys_capset+0x252/0x3e0
[ 53.796966] ? do_syscall_64+0x69/0x160
[ 53.796968] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.796971] ? __alloc_pages_nodemask+0x170/0x380
[ 53.796973] ? __alloc_pages_slowpath+0x12c0/0x12c0
[ 53.796977] ? tty_vhangup+0x20/0x20
[ 53.796979] ? policy_nodemask+0x1a/0x90
[ 53.796982] ? __mod_node_page_state+0x8d/0xa0
[ 53.796986] ? __check_object_size+0xe7/0x240
[ 53.796989] ? __sys_sendto+0x229/0x290
[ 53.796991] ? rawv6_rcv_skb+0x160/0x160
[ 53.796993] __sys_sendto+0x229/0x290
[ 53.796996] ? __ia32_sys_getpeername+0x50/0x50
[ 53.796999] ? commit_creds+0x2de/0x520
[ 53.797002] ? security_capset+0x57/0x70
[ 53.797004] ? __x64_sys_capset+0x29f/0x3e0
[ 53.797007] ? __x64_sys_rt_sigsuspend+0xe0/0xe0
[ 53.797011] ? __do_page_fault+0x664/0x770
[ 53.797014] __x64_sys_sendto+0x74/0x90
[ 53.797017] do_syscall_64+0x69/0x160
[ 53.797019] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.797022] RIP: 0033:0x7f43b7a6714a
[ 53.797023] RSP: 002b:00007ffd891bd368 EFLAGS: 00000246 ORIG_RAX:
000000000000002c
[ 53.797026] RAX: ffffffffffffffda RBX: 00000000006129c0 RCX: 00007f43b7a6714a
[ 53.797028] RDX: 0000000000000040 RSI: 00000000006129c0 RDI: 0000000000000004
[ 53.797029] RBP: 00007ffd891be640 R08: 0000000000610940 R09: 000000000000001c
[ 53.797030] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 53.797032] R13: 000000000060e6a0 R14: 0000000000008004 R15: 000000000040b661
[ 53.797171] Allocated by task 642:
[ 53.797460] kasan_kmalloc+0xa0/0xd0
[ 53.797463] kmem_cache_alloc+0xd2/0x1f0
[ 53.797465] getname_flags+0x40/0x210
[ 53.797467] user_path_at_empty+0x1d/0x40
[ 53.797469] do_faccessat+0x12a/0x320
[ 53.797471] do_syscall_64+0x69/0x160
[ 53.797473] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.797607] Freed by task 642:
[ 53.797869] __kasan_slab_free+0x130/0x180
[ 53.797871] kmem_cache_free+0xa8/0x230
[ 53.797872] filename_lookup+0x15b/0x230
[ 53.797874] do_faccessat+0x12a/0x320
[ 53.797876] do_syscall_64+0x69/0x160
[ 53.797878] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.798014] The buggy address belongs to the object at ffff88011975e600
which belongs to the cache names_cache of size 4096
[ 53.799043] The buggy address is located 1786 bytes inside of
4096-byte region [ffff88011975e600, ffff88011975f600)
[ 53.800013] The buggy address belongs to the page:
[ 53.800414] page:ffffea000465d600 count:1 mapcount:0
mapping:0000000000000000 index:0x0 compound_mapcount: 0
[ 53.801259] flags: 0x17fff0000008100(slab|head)
[ 53.801640] raw: 017fff0000008100 0000000000000000 0000000000000000
0000000100070007
[ 53.803147] raw: dead000000000100 dead000000000200 ffff88011b185a40
0000000000000000
[ 53.803787] page dumped because: kasan: bad access detected
[ 53.804384] Memory state around the buggy address:
[ 53.804788] ffff88011975eb80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.805384] ffff88011975ec00: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.805979] >ffff88011975ec80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.806577] ^
[ 53.807165] ffff88011975ed00: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.807762] ffff88011975ed80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.808356] ==================================================================
[ 53.808949] Disabling lock debugging due to kernel taint
Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of using extra modules for these, turn the config options into
an implicit dependency that adds masq feature to the protocol specific nf_nat module.
before:
text data bss dec hex filename
2001 860 4 2865 b31 net/ipv4/netfilter/nf_nat_masquerade_ipv4.ko
5579 780 2 6361 18d9 net/ipv4/netfilter/nf_nat_ipv4.ko
2860 836 8 3704 e78 net/ipv6/netfilter/nf_nat_masquerade_ipv6.ko
6648 780 2 7430 1d06 net/ipv6/netfilter/nf_nat_ipv6.ko
after:
text data bss dec hex filename
7245 872 8 8125 1fbd net/ipv4/netfilter/nf_nat_ipv4.ko
9165 848 12 10025 2729 net/ipv6/netfilter/nf_nat_ipv6.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In addition to already existing BPF hooks for sys_bind and sys_connect,
the patch provides new hooks for sys_sendmsg.
It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR`
that provides access to socket itlself (properties like family, type,
protocol) and user-passed `struct sockaddr *` so that BPF program can
override destination IP and port for system calls such as sendto(2) or
sendmsg(2) and/or assign source IP to the socket.
The hooks are implemented as two new attach types:
`BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and
UDPv6 correspondingly.
UDPv4 and UDPv6 separate attach types for same reason as sys_bind and
sys_connect hooks, i.e. to prevent reading from / writing to e.g.
user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound.
The difference with already existing hooks is sys_sendmsg are
implemented only for unconnected UDP.
For TCP it doesn't make sense to change user-provided `struct sockaddr *`
at sendto(2)/sendmsg(2) time since socket either was already connected
and has source/destination set or wasn't connected and call to
sendto(2)/sendmsg(2) would lead to ENOTCONN anyway.
Connected UDP is already handled by sys_connect hooks that can override
source/destination at connect time and use fast-path later, i.e. these
hooks don't affect UDP fast-path.
Rewriting source IP is implemented differently than that in sys_connect
hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to
just bind socket to desired local IP address since source IP can be set
on per-packet basis by using ancillary data (cmsg(3)). So no matter if
socket is bound or not, source IP has to be rewritten on every call to
sys_sendmsg.
To do so two new fields are added to UAPI `struct bpf_sock_addr`;
* `msg_src_ip4` to set source IPv4 for UDPv4;
* `msg_src_ip6` to set source IPv6 for UDPv6.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Commit bb0ad1987e96 ("ipv6: fib6_rules: support for match on sport, dport
and ip proto") added support for protocol and ports to FIB rules.
Update the FIB lookup tracepoint to dump the parameters.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-05-24
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc).
2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers.
3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit.
4) Jiong Wang adds support for indirect and arithmetic shifts to NFP
5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible.
6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing
to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions.
7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT.
8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events.
9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The BPF seg6local hook should be powerful enough to enable users to
implement most of the use-cases one could think of. After some thinking,
we figured out that the following actions should be possible on a SRv6
packet, requiring 3 specific helpers :
- bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
- bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
(to add/delete TLVs)
- bpf_lwt_seg6_action: Apply some SRv6 network programming actions
(specifically End.X, End.T, End.B6 and
End.B6.Encap)
The specifications of these helpers are provided in the patch (see
include/uapi/linux/bpf.h).
The non-sensitive fields of the SRH are the following : flags, tag and
TLVs. The other fields can not be modified, to maintain the SRH
integrity. Flags, tag and TLVs can easily be modified as their validity
can be checked afterwards via seg6_validate_srh. It is not allowed to
modify the segments directly. If one wants to add segments on the path,
he should stack a new SRH using the End.B6 action via
bpf_lwt_seg6_action.
Growing, shrinking or editing TLVs via the helpers will flag the SRH as
invalid, and it will have to be re-validated before re-entering the IPv6
layer. This flag is stored in a per-CPU buffer, along with the current
header length in bytes.
Storing the SRH len in bytes in the control block is mandatory when using
bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
boundary). When adding/deleting TLVs within the BPF program, the SRH may
temporary be in an invalid state where its length cannot be rounded to 8
bytes without remainder, hence the need to store the length in bytes
separately. The caller of the BPF program can then ensure that the SRH's
final length is valid using this value. Again, a final SRH modified by a
BPF program which doesn’t respect the 8-bytes boundary will be discarded
as it will be considered as invalid.
Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
available from the LWT BPF IN hook, but not from the seg6local BPF one.
This helper allows to encapsulate a Segment Routing Header (either with
a new outer IPv6 header, or by inlining it directly in the existing IPv6
header) into a non-SRv6 packet. This helper is required if we want to
offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
as the BPF seg6local hook only works on traffic already containing a SRH.
This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
the same purpose but with a static SRH per route.
These helpers require CONFIG_IPV6=y (and not =m).
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The function lookup_nexthop is essential to implement most of the seg6local
actions. As we want to provide a BPF helper allowing to apply some of these
actions on the packet being processed, the helper should be able to call
this function, hence the need to make it public.
Moreover, if one argument is incorrect or if the next hop can not be found,
an error should be returned by the BPF helper so the BPF program can adapt
its processing of the packet (return an error, properly force the drop,
...). This patch hence makes this function return dst->error to indicate a
possible error.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, they are:
1) Remove obsolete nf_log tracing from nf_tables, from Florian Westphal.
2) Add support for map lookups to numgen, random and hash expressions,
from Laura Garcia.
3) Allow to register nat hooks for iptables and nftables at the same
time. Patchset from Florian Westpha.
4) Timeout support for rbtree sets.
5) ip6_rpfilter works needs interface for link-local addresses, from
Vincent Bernat.
6) Add nf_ct_hook and nf_nat_hook structures and use them.
7) Do not drop packets on packets raceing to insert conntrack entries
into hashes, this is particularly a problem in nfqueue setups.
8) Address fallout from xt_osf separation to nf_osf, patches
from Florian Westphal and Fernando Mancera.
9) Remove reference to struct nft_af_info, which doesn't exist anymore.
From Taehee Yoo.
This batch comes with is a conflict between 25fd386e0bc0 ("netfilter:
core: add missing __rcu annotation") in your tree and 2c205dd3981f
("netfilter: add struct nf_nat_hook and use it") coming in this batch.
This conflict can be solved by leaving the __rcu tag on
__netfilter_net_init() - added by 25fd386e0bc0 - and remove all code
related to nf_nat_decode_session_hook - which is gone after
2c205dd3981f, as described by:
diff --cc net/netfilter/core.c
index e0ae4aae96f5,206fb2c4c319..168af54db975
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@@ -611,7 -580,13 +611,8 @@@ const struct nf_conntrack_zone nf_ct_zo
EXPORT_SYMBOL_GPL(nf_ct_zone_dflt);
#endif /* CONFIG_NF_CONNTRACK */
- static void __net_init __netfilter_net_init(struct nf_hook_entries **e, int max)
-#ifdef CONFIG_NF_NAT_NEEDED
-void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *);
-EXPORT_SYMBOL(nf_nat_decode_session_hook);
-#endif
-
+ static void __net_init
+ __netfilter_net_init(struct nf_hook_entries __rcu **e, int max)
{
int h;
I can also merge your net-next tree into nf-next, solve the conflict and
resend the pull request if you prefer so.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is a followup to fib6 rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
UDP GSO delays final datagram construction to the GSO layer. This
conflicts with protocol transformations.
Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
CC: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In commit 47b7e7f82802, this bit was removed at the same time the
RT6_LOOKUP_F_IFACE flag was removed. However, it is needed when
link-local addresses are used, which is a very common case: when
packets are routed, neighbor solicitations are done using link-local
addresses. For example, the following neighbor solicitation is not
matched by "-m rpfilter":
IP6 fe80::5254:33ff:fe00:1 > ff02::1:ff00:3: ICMP6, neighbor
solicitation, who has 2001:db8::5254:33ff:fe00:3, length 32
Commit 47b7e7f82802 doesn't quite explain why we shouldn't use
RT6_LOOKUP_F_IFACE in the rpfilter case. I suppose the interface check
later in the function would make it redundant. However, the remaining
of the routing code is using RT6_LOOKUP_F_IFACE when there is no
source address (which matches rpfilter's case with a non-unicast
destination, like with neighbor solicitation).
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Fixes: 47b7e7f82802 ("netfilter: don't set F_IFACE on ipv6 fib lookups")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Currently the packet rewrite and instantiation of nat NULL bindings
happens from the protocol specific nat backend.
Invocation occurs either via ip(6)table_nat or the nf_tables nat chain type.
Invocation looks like this (simplified):
NF_HOOK()
|
`---iptable_nat
|
`---> nf_nat_l3proto_ipv4 -> nf_nat_packet
|
new packet? pass skb though iptables nat chain
|
`---> iptable_nat: ipt_do_table
In nft case, this looks the same (nft_chain_nat_ipv4 instead of
iptable_nat).
This is a problem for two reasons:
1. Can't use iptables nat and nf_tables nat at the same time,
as the first user adds a nat binding (nf_nat_l3proto_ipv4 adds a
NULL binding if do_table() did not find a matching nat rule so we
can detect post-nat tuple collisions).
2. If you use e.g. nft_masq, snat, redir, etc. uses must also register
an empty base chain so that the nat core gets called fro NF_HOOK()
to do the reverse translation, which is neither obvious nor user
friendly.
After this change, the base hook gets registered not from iptable_nat or
nftables nat hooks, but from the l3 nat core.
iptables/nft nat base hooks get registered with the nat core instead:
NF_HOOK()
|
`---> nf_nat_l3proto_ipv4 -> nf_nat_packet
|
new packet? pass skb through iptables/nftables nat chains
|
+-> iptables_nat: ipt_do_table
+-> nft nat chain x
`-> nft nat chain y
The nat core deals with null bindings and reverse translation.
When no mapping exists, it calls the registered nat lookup hooks until
one creates a new mapping.
If both iptables and nftables nat hooks exist, the first matching
one is used (i.e., higher priority wins).
Also, nft users do not need to create empty nat hooks anymore,
nat core always registers the base hooks that take care of reverse/reply
translation.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Will be used in followup patch when nat types no longer
use nf_register_net_hook() but will instead register with the nat core.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The ip(6)tables nat table is currently receiving skbs from the netfilter
core, after a followup patch skbs will be coming from the netfilter nat
core instead, so the table is no longer backed by normal hook_ops.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Copy-pasted, both l3 helpers almost use same code here.
Split out the common part into an 'inet' helper.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Bring consistency to ipv6 route replace and append semantics.
Remove rt6_qualify_for_ecmp which is just guess work. It fails in 2 cases:
1. can not replace a route with a reject route. Existing code appends
a new route instead of replacing the existing one.
2. can not have a multipath route where a leg uses a dev only nexthop
Existing use cases affected by this change:
1. adding a route with existing prefix and metric using NLM_F_CREATE
without NLM_F_APPEND or NLM_F_EXCL (ie., what iproute2 calls
'prepend'). Existing code auto-determines that the new nexthop can
be appended to an existing route to create a multipath route. This
change breaks that by requiring the APPEND flag for the new route
to be added to an existing one. Instead the prepend just adds another
route entry.
2. route replace. Existing code replaces first matching multipath route
if new route is multipath capable and fallback to first matching
non-ECMP route (reject or dev only route) in case one isn't available.
New behavior replaces first matching route. (Thanks to Ido for spotting
this one)
Note: Newer iproute2 is needed to display multipath routes with a dev-only
nexthop. This is due to a bug in iproute2 and parsing nexthops.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Determine path MTU from a FIB lookup result. Logic is based on
ip6_dst_mtu_forward plus lookup of nexthop exception.
Add ip6_dst_mtu_forward to ipv6_stubs to handle access by core
bpf code.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
S390 bpf_jit.S is removed in net-next and had changes in 'net',
since that code isn't used any more take the removal.
TLS data structures split the TX and RX components in 'net-next',
put the new struct members from the bug fix in 'net' into the RX
part.
The 'net-next' tree had some reworking of how the ERSPAN code works in
the GRE tunneling code, overlapping with a one-line headroom
calculation fix in 'net'.
Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
that read the prog members via READ_ONCE() into local variables
before using them.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently ip6gre and ip6erspan share single metadata mode device,
using 'collect_md_tun'. Thus, when doing:
ip link add dev ip6gre11 type ip6gretap external
ip link add dev ip6erspan12 type ip6erspan external
RTNETLINK answers: File exists
simply fails due to the 2nd tries to create the same collect_md_tun.
The patch fixes it by adding a separate collect md tunnel device
for the ip6erspan, 'collect_md_tun_erspan'. As a result, a couple
of places need to refactor/split up in order to distinguish ip6gre
and ip6erspan.
First, move the collect_md check at ip6gre_tunnel_{unlink,link} and
create separate function {ip6gre,ip6ersapn}_tunnel_{link_md,unlink_md}.
Then before link/unlink, make sure the link_md/unlink_md is called.
Finally, a separate ndo_uninit is created for ip6erspan. Tested it
using the samples/bpf/test_tunnel_bpf.sh.
Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode")
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Device features may change during transmission. In particular with
corking, a device may toggle scatter-gather in between allocating
and writing to an skb.
Do not unconditionally assume that !NETIF_F_SG at write time implies
that the same held at alloc time and thus the skb has sufficient
tailroom.
This issue predates git history.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Even though ip6erspan_tap_init() sets up hlen and tun_hlen according to
what ERSPAN needs, it goes ahead to call ip6gre_tnl_link_config() which
overwrites these settings with GRE-specific ones.
Similarly for changelink callbacks, which are handled by
ip6gre_changelink() calls ip6gre_tnl_change() calls
ip6gre_tnl_link_config() as well.
The difference ends up being 12 vs. 20 bytes, and this is generally not
a problem, because a 12-byte request likely ends up allocating more and
the extra 8 bytes are thus available. However correct it is not.
So replace the newlink and changelink callbacks with an ERSPAN-specific
ones, reusing the newly-introduced _common() functions.
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extract from ip6gre_changelink() a reusable function
ip6gre_changelink_common(). This will allow introduction of
ERSPAN-specific _changelink() function with not a lot of code
duplication.
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extract from ip6gre_newlink() a reusable function
ip6gre_newlink_common(). The ip6gre_tnl_link_config() call needs to be
made customizable for ERSPAN, thus reorder it with calls to
ip6_tnl_change_mtu() and dev_hold(), and extract the whole tail to the
caller, ip6gre_newlink(). Thus enable an ERSPAN-specific _newlink()
function without a lot of duplicity.
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Split a reusable function ip6gre_tnl_copy_tnl_parm() from
ip6gre_tnl_change(). This will allow ERSPAN-specific code to
reuse the common parts while customizing the behavior for ERSPAN.
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function ip6gre_tnl_link_config() is used for setting up
configuration of both ip6gretap and ip6erspan tunnels. Split the
function into the common part and the route-lookup part. The latter then
takes the calculated header length as an argument. This split will allow
the patches down the line to sneak in a custom header length computation
for the ERSPAN tunnel.
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev->needed_headroom is not primed until ip6_tnl_xmit(), so it starts
out zero. Thus the call to skb_cow_head() fails to actually make sure
there's enough headroom to push the ERSPAN headers to. That can lead to
the panic cited below. (Reproducer below that).
Fix by requesting either needed_headroom if already primed, or just the
bare minimum needed for the header otherwise.
[ 190.703567] kernel BUG at net/core/skbuff.c:104!
[ 190.708384] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[ 190.714007] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld
[ 190.728975] CPU: 1 PID: 959 Comm: kworker/1:2 Not tainted 4.17.0-rc4-net_master-custom-139 #10
[ 190.737647] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
[ 190.747006] Workqueue: ipv6_addrconf addrconf_dad_work
[ 190.752222] RIP: 0010:skb_panic+0xc3/0x100
[ 190.756358] RSP: 0018:ffff8801d54072f0 EFLAGS: 00010282
[ 190.761629] RAX: 0000000000000085 RBX: ffff8801c1a8ecc0 RCX: 0000000000000000
[ 190.768830] RDX: 0000000000000085 RSI: dffffc0000000000 RDI: ffffed003aa80e54
[ 190.776025] RBP: ffff8801bd1ec5a0 R08: ffffed003aabce19 R09: ffffed003aabce19
[ 190.783226] R10: 0000000000000001 R11: ffffed003aabce18 R12: ffff8801bf695dbe
[ 190.790418] R13: 0000000000000084 R14: 00000000000006c0 R15: ffff8801bf695dc8
[ 190.797621] FS: 0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000
[ 190.805786] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 190.811582] CR2: 000055fa929aced0 CR3: 0000000003228004 CR4: 00000000001606e0
[ 190.818790] Call Trace:
[ 190.821264] <IRQ>
[ 190.823314] ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
[ 190.828940] ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
[ 190.834562] skb_push+0x78/0x90
[ 190.837749] ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
[ 190.843219] ? ip6gre_tunnel_ioctl+0xd90/0xd90 [ip6_gre]
[ 190.848577] ? debug_check_no_locks_freed+0x210/0x210
[ 190.853679] ? debug_check_no_locks_freed+0x210/0x210
[ 190.858783] ? print_irqtrace_events+0x120/0x120
[ 190.863451] ? sched_clock_cpu+0x18/0x210
[ 190.867496] ? cyc2ns_read_end+0x10/0x10
[ 190.871474] ? skb_network_protocol+0x76/0x200
[ 190.875977] dev_hard_start_xmit+0x137/0x770
[ 190.880317] ? do_raw_spin_trylock+0x6d/0xa0
[ 190.884624] sch_direct_xmit+0x2ef/0x5d0
[ 190.888589] ? pfifo_fast_dequeue+0x3fa/0x670
[ 190.892994] ? pfifo_fast_change_tx_queue_len+0x810/0x810
[ 190.898455] ? __lock_is_held+0xa0/0x160
[ 190.902422] __qdisc_run+0x39e/0xfc0
[ 190.906041] ? _raw_spin_unlock+0x29/0x40
[ 190.910090] ? pfifo_fast_enqueue+0x24b/0x3e0
[ 190.914501] ? sch_direct_xmit+0x5d0/0x5d0
[ 190.918658] ? pfifo_fast_dequeue+0x670/0x670
[ 190.923047] ? __dev_queue_xmit+0x172/0x1770
[ 190.927365] ? preempt_count_sub+0xf/0xd0
[ 190.931421] __dev_queue_xmit+0x410/0x1770
[ 190.935553] ? ___slab_alloc+0x605/0x930
[ 190.939524] ? print_irqtrace_events+0x120/0x120
[ 190.944186] ? memcpy+0x34/0x50
[ 190.947364] ? netdev_pick_tx+0x1c0/0x1c0
[ 190.951428] ? __skb_clone+0x2fd/0x3d0
[ 190.955218] ? __copy_skb_header+0x270/0x270
[ 190.959537] ? rcu_read_lock_sched_held+0x93/0xa0
[ 190.964282] ? kmem_cache_alloc+0x344/0x4d0
[ 190.968520] ? cyc2ns_read_end+0x10/0x10
[ 190.972495] ? skb_clone+0x123/0x230
[ 190.976112] ? skb_split+0x820/0x820
[ 190.979747] ? tcf_mirred+0x554/0x930 [act_mirred]
[ 190.984582] tcf_mirred+0x554/0x930 [act_mirred]
[ 190.989252] ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred]
[ 190.996109] ? __lock_acquire+0x706/0x26e0
[ 191.000239] ? sched_clock_cpu+0x18/0x210
[ 191.004294] tcf_action_exec+0xcf/0x2a0
[ 191.008179] tcf_classify+0xfa/0x340
[ 191.011794] __netif_receive_skb_core+0x8e1/0x1c60
[ 191.016630] ? debug_check_no_locks_freed+0x210/0x210
[ 191.021732] ? nf_ingress+0x500/0x500
[ 191.025458] ? process_backlog+0x347/0x4b0
[ 191.029619] ? print_irqtrace_events+0x120/0x120
[ 191.034302] ? lock_acquire+0xd8/0x320
[ 191.038089] ? process_backlog+0x1b6/0x4b0
[ 191.042246] ? process_backlog+0xc2/0x4b0
[ 191.046303] process_backlog+0xc2/0x4b0
[ 191.050189] net_rx_action+0x5cc/0x980
[ 191.053991] ? napi_complete_done+0x2c0/0x2c0
[ 191.058386] ? mark_lock+0x13d/0xb40
[ 191.062001] ? clockevents_program_event+0x6b/0x1d0
[ 191.066922] ? print_irqtrace_events+0x120/0x120
[ 191.071593] ? __lock_is_held+0xa0/0x160
[ 191.075566] __do_softirq+0x1d4/0x9d2
[ 191.079282] ? ip6_finish_output2+0x524/0x1460
[ 191.083771] do_softirq_own_stack+0x2a/0x40
[ 191.087994] </IRQ>
[ 191.090130] do_softirq.part.13+0x38/0x40
[ 191.094178] __local_bh_enable_ip+0x135/0x190
[ 191.098591] ip6_finish_output2+0x54d/0x1460
[ 191.102916] ? ip6_forward_finish+0x2f0/0x2f0
[ 191.107314] ? ip6_mtu+0x3c/0x2c0
[ 191.110674] ? ip6_finish_output+0x2f8/0x650
[ 191.114992] ? ip6_output+0x12a/0x500
[ 191.118696] ip6_output+0x12a/0x500
[ 191.122223] ? ip6_route_dev_notify+0x5b0/0x5b0
[ 191.126807] ? ip6_finish_output+0x650/0x650
[ 191.131120] ? ip6_fragment+0x1a60/0x1a60
[ 191.135182] ? icmp6_dst_alloc+0x26e/0x470
[ 191.139317] mld_sendpack+0x672/0x830
[ 191.143021] ? igmp6_mcf_seq_next+0x2f0/0x2f0
[ 191.147429] ? __local_bh_enable_ip+0x77/0x190
[ 191.151913] ipv6_mc_dad_complete+0x47/0x90
[ 191.156144] addrconf_dad_completed+0x561/0x720
[ 191.160731] ? addrconf_rs_timer+0x3a0/0x3a0
[ 191.165036] ? mark_held_locks+0xc9/0x140
[ 191.169095] ? __local_bh_enable_ip+0x77/0x190
[ 191.173570] ? addrconf_dad_work+0x50d/0xa20
[ 191.177886] ? addrconf_dad_work+0x529/0xa20
[ 191.182194] addrconf_dad_work+0x529/0xa20
[ 191.186342] ? addrconf_dad_completed+0x720/0x720
[ 191.191088] ? __lock_is_held+0xa0/0x160
[ 191.195059] ? process_one_work+0x45d/0xe20
[ 191.199302] ? process_one_work+0x51e/0xe20
[ 191.203531] ? rcu_read_lock_sched_held+0x93/0xa0
[ 191.208279] process_one_work+0x51e/0xe20
[ 191.212340] ? pwq_dec_nr_in_flight+0x200/0x200
[ 191.216912] ? get_lock_stats+0x4b/0xf0
[ 191.220788] ? preempt_count_sub+0xf/0xd0
[ 191.224844] ? worker_thread+0x219/0x860
[ 191.228823] ? do_raw_spin_trylock+0x6d/0xa0
[ 191.233142] worker_thread+0xeb/0x860
[ 191.236848] ? process_one_work+0xe20/0xe20
[ 191.241095] kthread+0x206/0x300
[ 191.244352] ? process_one_work+0xe20/0xe20
[ 191.248587] ? kthread_stop+0x570/0x570
[ 191.252459] ret_from_fork+0x3a/0x50
[ 191.256082] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24
[ 191.275327] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d54072f0
[ 191.281024] ---[ end trace 7ea51094e099e006 ]---
[ 191.285724] Kernel panic - not syncing: Fatal exception in interrupt
[ 191.292168] Kernel Offset: disabled
[ 191.295697] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Reproducer:
ip link add h1 type veth peer name swp1
ip link add h3 type veth peer name swp3
ip link set dev h1 up
ip address add 192.0.2.1/28 dev h1
ip link add dev vh3 type vrf table 20
ip link set dev h3 master vh3
ip link set dev vh3 up
ip link set dev h3 up
ip link set dev swp3 up
ip address add dev swp3 2001:db8:2::1/64
ip link set dev swp1 up
tc qdisc add dev swp1 clsact
ip link add name gt6 type ip6erspan \
local 2001:db8:2::1 remote 2001:db8:2::2 oseq okey 123
ip link set dev gt6 up
sleep 1
tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
action mirred egress mirror dev gt6
ping -I h1 192.0.2.2
Fixes: e41c7c68ea77 ("ip6erspan: make sure enough headroom at xmit.")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
__gre6_xmit() pushes GRE headers before handing over to ip6_tnl_xmit()
for generic IP-in-IP processing. However it doesn't make sure that there
is enough headroom to push the header to. That can lead to the panic
cited below. (Reproducer below that).
Fix by requesting either needed_headroom if already primed, or just the
bare minimum needed for the header otherwise.
[ 158.576725] kernel BUG at net/core/skbuff.c:104!
[ 158.581510] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[ 158.587174] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld
[ 158.602268] CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 4.17.0-rc4-net_master-custom-139 #10
[ 158.610938] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
[ 158.620426] RIP: 0010:skb_panic+0xc3/0x100
[ 158.624586] RSP: 0018:ffff8801d3f27110 EFLAGS: 00010286
[ 158.629882] RAX: 0000000000000082 RBX: ffff8801c02cc040 RCX: 0000000000000000
[ 158.637127] RDX: 0000000000000082 RSI: dffffc0000000000 RDI: ffffed003a7e4e18
[ 158.644366] RBP: ffff8801bfec8020 R08: ffffed003aabce19 R09: ffffed003aabce19
[ 158.651574] R10: 000000000000000b R11: ffffed003aabce18 R12: ffff8801c364de66
[ 158.658786] R13: 000000000000002c R14: 00000000000000c0 R15: ffff8801c364de68
[ 158.666007] FS: 0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000
[ 158.674212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 158.680036] CR2: 00007f4b3702dcd0 CR3: 0000000003228002 CR4: 00000000001606e0
[ 158.687228] Call Trace:
[ 158.689752] ? __gre6_xmit+0x246/0xd80 [ip6_gre]
[ 158.694475] ? __gre6_xmit+0x246/0xd80 [ip6_gre]
[ 158.699141] skb_push+0x78/0x90
[ 158.702344] __gre6_xmit+0x246/0xd80 [ip6_gre]
[ 158.706872] ip6gre_tunnel_xmit+0x3bc/0x610 [ip6_gre]
[ 158.711992] ? __gre6_xmit+0xd80/0xd80 [ip6_gre]
[ 158.716668] ? debug_check_no_locks_freed+0x210/0x210
[ 158.721761] ? print_irqtrace_events+0x120/0x120
[ 158.726461] ? sched_clock_cpu+0x18/0x210
[ 158.730572] ? sched_clock_cpu+0x18/0x210
[ 158.734692] ? cyc2ns_read_end+0x10/0x10
[ 158.738705] ? skb_network_protocol+0x76/0x200
[ 158.743216] ? netif_skb_features+0x1b2/0x550
[ 158.747648] dev_hard_start_xmit+0x137/0x770
[ 158.752010] sch_direct_xmit+0x2ef/0x5d0
[ 158.755992] ? pfifo_fast_dequeue+0x3fa/0x670
[ 158.760460] ? pfifo_fast_change_tx_queue_len+0x810/0x810
[ 158.765975] ? __lock_is_held+0xa0/0x160
[ 158.770002] __qdisc_run+0x39e/0xfc0
[ 158.773673] ? _raw_spin_unlock+0x29/0x40
[ 158.777781] ? pfifo_fast_enqueue+0x24b/0x3e0
[ 158.782191] ? sch_direct_xmit+0x5d0/0x5d0
[ 158.786372] ? pfifo_fast_dequeue+0x670/0x670
[ 158.790818] ? __dev_queue_xmit+0x172/0x1770
[ 158.795195] ? preempt_count_sub+0xf/0xd0
[ 158.799313] __dev_queue_xmit+0x410/0x1770
[ 158.803512] ? ___slab_alloc+0x605/0x930
[ 158.807525] ? ___slab_alloc+0x605/0x930
[ 158.811540] ? memcpy+0x34/0x50
[ 158.814768] ? netdev_pick_tx+0x1c0/0x1c0
[ 158.818895] ? __skb_clone+0x2fd/0x3d0
[ 158.822712] ? __copy_skb_header+0x270/0x270
[ 158.827079] ? rcu_read_lock_sched_held+0x93/0xa0
[ 158.831903] ? kmem_cache_alloc+0x344/0x4d0
[ 158.836199] ? skb_clone+0x123/0x230
[ 158.839869] ? skb_split+0x820/0x820
[ 158.843521] ? tcf_mirred+0x554/0x930 [act_mirred]
[ 158.848407] tcf_mirred+0x554/0x930 [act_mirred]
[ 158.853104] ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred]
[ 158.860005] ? __lock_acquire+0x706/0x26e0
[ 158.864162] ? mark_lock+0x13d/0xb40
[ 158.867832] tcf_action_exec+0xcf/0x2a0
[ 158.871736] tcf_classify+0xfa/0x340
[ 158.875402] __netif_receive_skb_core+0x8e1/0x1c60
[ 158.880334] ? nf_ingress+0x500/0x500
[ 158.884059] ? process_backlog+0x347/0x4b0
[ 158.888241] ? lock_acquire+0xd8/0x320
[ 158.892050] ? process_backlog+0x1b6/0x4b0
[ 158.896228] ? process_backlog+0xc2/0x4b0
[ 158.900291] process_backlog+0xc2/0x4b0
[ 158.904210] net_rx_action+0x5cc/0x980
[ 158.908047] ? napi_complete_done+0x2c0/0x2c0
[ 158.912525] ? rcu_read_unlock+0x80/0x80
[ 158.916534] ? __lock_is_held+0x34/0x160
[ 158.920541] __do_softirq+0x1d4/0x9d2
[ 158.924308] ? trace_event_raw_event_irq_handler_exit+0x140/0x140
[ 158.930515] run_ksoftirqd+0x1d/0x40
[ 158.934152] smpboot_thread_fn+0x32b/0x690
[ 158.938299] ? sort_range+0x20/0x20
[ 158.941842] ? preempt_count_sub+0xf/0xd0
[ 158.945940] ? schedule+0x5b/0x140
[ 158.949412] kthread+0x206/0x300
[ 158.952689] ? sort_range+0x20/0x20
[ 158.956249] ? kthread_stop+0x570/0x570
[ 158.960164] ret_from_fork+0x3a/0x50
[ 158.963823] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24
[ 158.983235] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d3f27110
[ 158.988935] ---[ end trace 5af56ee845aa6cc8 ]---
[ 158.993641] Kernel panic - not syncing: Fatal exception in interrupt
[ 159.000176] Kernel Offset: disabled
[ 159.003767] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Reproducer:
ip link add h1 type veth peer name swp1
ip link add h3 type veth peer name swp3
ip link set dev h1 up
ip address add 192.0.2.1/28 dev h1
ip link add dev vh3 type vrf table 20
ip link set dev h3 master vh3
ip link set dev vh3 up
ip link set dev h3 up
ip link set dev swp3 up
ip address add dev swp3 2001:db8:2::1/64
ip link set dev swp1 up
tc qdisc add dev swp1 clsact
ip link add name gt6 type ip6gretap \
local 2001:db8:2::1 remote 2001:db8:2::2
ip link set dev gt6 up
sleep 1
tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
action mirred egress mirror dev gt6
ping -I h1 192.0.2.2
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ERSPAN only support version 1 and 2. When packets send to an
erspan device which does not have proper version number set,
drop the packet. In real case, we observe multicast packets
sent to the erspan pernet device, erspan0, which does not have
erspan version configured.
Reported-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-05-17
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Provide a new BPF helper for doing a FIB and neighbor lookup
in the kernel tables from an XDP or tc BPF program. The helper
provides a fast-path for forwarding packets. The API supports
IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are
implemented in this initial work, from David (Ahern).
2) Just a tiny diff but huge feature enabled for nfp driver by
extending the BPF offload beyond a pure host processing offload.
Offloaded XDP programs are allowed to set the RX queue index and
thus opening the door for defining a fully programmable RSS/n-tuple
filter replacement. Once BPF decided on a queue already, the device
data-path will skip the conventional RSS processing completely,
from Jakub.
3) The original sockmap implementation was array based similar to
devmap. However unlike devmap where an ifindex has a 1:1 mapping
into the map there are use cases with sockets that need to be
referenced using longer keys. Hence, sockhash map is added reusing
as much of the sockmap code as possible, from John.
4) Introduce BTF ID. The ID is allocatd through an IDR similar as
with BPF maps and progs. It also makes BTF accessible to user
space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data
through BPF_OBJ_GET_INFO_BY_FD, from Martin.
5) Enable BPF stackmap with build_id also in NMI context. Due to the
up_read() of current->mm->mmap_sem build_id cannot be parsed.
This work defers the up_read() via a per-cpu irq_work so that
at least limited support can be enabled, from Song.
6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND
JIT conversion as well as implementation of an optimized 32/64 bit
immediate load in the arm64 JIT that allows to reduce the number of
emitted instructions; in case of tested real-world programs they
were shrinking by three percent, from Daniel.
7) Add ifindex parameter to the libbpf loader in order to enable
BPF offload support. Right now only iproute2 can load offloaded
BPF and this will also enable libbpf for direct integration into
other applications, from David (Beckett).
8) Convert the plain text documentation under Documentation/bpf/ into
RST format since this is the appropriate standard the kernel is
moving to for all documentation. Also add an overview README.rst,
from Jesper.
9) Add __printf verification attribute to the bpf_verifier_vlog()
helper. Though it uses va_list we can still allow gcc to check
the format string, from Mathieu.
10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...'
is a bash 4.0+ feature which is not guaranteed to be available
when calling out to shell, therefore use a more portable variant,
from Joe.
11) Fix a 64 bit division in xdp_umem_reg() by using div_u64()
instead of relying on the gcc built-in, from Björn.
12) Fix a sock hashmap kmalloc warning reported by syzbot when an
overly large key size is used in hashmap then causing overflows
in htab->elem_size. Reject bogus attr->key_size early in the
sock_hash_alloc(), from Yonghong.
13) Ensure in BPF selftests when urandom_read is being linked that
--build-id is always enabled so that test_stacktrace_build_id[_nmi]
won't be failing, from Alexei.
14) Add bitsperlong.h as well as errno.h uapi headers into the tools
header infrastructure which point to one of the arch specific
uapi headers. This was needed in order to fix a build error on
some systems for the BPF selftests, from Sirio.
15) Allow for short options to be used in the xdp_monitor BPF sample
code. And also a bpf.h tools uapi header sync in order to fix a
selftest build failure. Both from Prashant.
16) More formally clarify the meaning of ID in the direct packet access
section of the BPF documentation, from Wang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Variant of proc_create_data that directly take a seq_file show
callback and deals with network namespaces in ->open and ->release.
All callers of proc_create + single_open_net converted over, and
single_{open,release}_net are removed entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Variants of proc_create{,_data} that directly take a struct seq_operations
and deal with network namespaces in ->open and ->release. All callers of
proc_create + seq_open_net converted over, and seq_{open,release}_net are
removed entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The code should be using the pid namespace from the procfs mount
instead of trying to look it up during open.
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Pass the hashtable to the proc private data instead of copying
it into the per-file private data.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Remove the pointless ping_seq_afinfo indirection and make the code look
like most other protocols.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Avoid most of the afinfo indirections and just call the proc helpers
directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Remove a couple indirections to make the code look like most other
protocols.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Variants of proc_create{,_data} that directly take a seq_file show
callback and drastically reduces the boilerplate code in the callers.
All trivial callers converted over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
syzbot found a way to trigger an infinitie loop by overflowing
@offset variable that has been forced to use u16 for some very
obscure reason in the past.
We probably want to look at NEXTHDR_FRAGMENT handling which looks
wrong, in a separate patch.
In net-next, we shall try to use skb_header_pointer() instead of
pskb_may_pull().
watchdog: BUG: soft lockup - CPU#1 stuck for 134s! [syz-executor738:4553]
Modules linked in:
irq event stamp: 13885653
hardirqs last enabled at (13885652): [<ffffffff878009d5>] restore_regs_and_return_to_kernel+0x0/0x2b
hardirqs last disabled at (13885653): [<ffffffff87800905>] interrupt_entry+0xb5/0xf0 arch/x86/entry/entry_64.S:625
softirqs last enabled at (13614028): [<ffffffff84df0809>] tun_napi_alloc_frags drivers/net/tun.c:1478 [inline]
softirqs last enabled at (13614028): [<ffffffff84df0809>] tun_get_user+0x1dd9/0x4290 drivers/net/tun.c:1825
softirqs last disabled at (13614032): [<ffffffff84df1b6f>] tun_get_user+0x313f/0x4290 drivers/net/tun.c:1942
CPU: 1 PID: 4553 Comm: syz-executor738 Not tainted 4.17.0-rc3+ #40
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x20/0x50 kernel/kcov.c:101
RSP: 0018:ffff8801d8cfe250 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: ffff8801d88a8080 RBX: ffff8801d7389e40 RCX: 0000000000000006
RDX: 0000000000000000 RSI: ffffffff868da4ad RDI: ffff8801c8a53277
RBP: ffff8801d8cfe250 R08: ffff8801d88a8080 R09: ffff8801d8cfe3e8
R10: ffffed003b19fc87 R11: ffff8801d8cfe43f R12: ffff8801c8a5327f
R13: 0000000000000000 R14: ffff8801c8a4e5fe R15: ffff8801d8cfe3e8
FS: 0000000000d88940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 00000001acab3000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
_decode_session6+0xc1d/0x14f0 net/ipv6/xfrm6_policy.c:150
__xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2368
xfrm_decode_session_reverse include/net/xfrm.h:1213 [inline]
icmpv6_route_lookup+0x395/0x6e0 net/ipv6/icmp.c:372
icmp6_send+0x1982/0x2da0 net/ipv6/icmp.c:551
icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43
ip6_input_finish+0x14e1/0x1a30 net/ipv6/ip6_input.c:305
NF_HOOK include/linux/netfilter.h:288 [inline]
ip6_input+0xe1/0x5e0 net/ipv6/ip6_input.c:327
dst_input include/net/dst.h:450 [inline]
ip6_rcv_finish+0x29c/0xa10 net/ipv6/ip6_input.c:71
NF_HOOK include/linux/netfilter.h:288 [inline]
ipv6_rcv+0xeb8/0x2040 net/ipv6/ip6_input.c:208
__netif_receive_skb_core+0x2468/0x3650 net/core/dev.c:4646
__netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4711
netif_receive_skb_internal+0x126/0x7b0 net/core/dev.c:4785
napi_frags_finish net/core/dev.c:5226 [inline]
napi_gro_frags+0x631/0xc40 net/core/dev.c:5299
tun_get_user+0x3168/0x4290 drivers/net/tun.c:1951
tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1996
call_write_iter include/linux/fs.h:1784 [inline]
do_iter_readv_writev+0x859/0xa50 fs/read_write.c:680
do_iter_write+0x185/0x5f0 fs/read_write.c:959
vfs_writev+0x1c7/0x330 fs/read_write.c:1004
do_writev+0x112/0x2f0 fs/read_write.c:1039
__do_sys_writev fs/read_write.c:1112 [inline]
__se_sys_writev fs/read_write.c:1109 [inline]
__x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: syzbot+0053c8...@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:
1) Fix handling of simultaneous open TCP connection in conntrack,
from Jozsef Kadlecsik.
2) Insufficient sanitify check of xtables extension names, from
Florian Westphal.
3) Skip unnecessary synchronize_rcu() call when transaction log
is already empty, from Florian Westphal.
4) Incorrect destination mac validation in ebt_stp, from Stephen
Hemminger.
5) xtables module reference counter leak in nft_compat, from
Florian Westphal.
6) Incorrect connection reference counting logic in IPVS
one-packet scheduler, from Julian Anastasov.
7) Wrong stats for 32-bits CPU in IPVS, also from Julian.
8) Calm down sparse error in netfilter core, also from Florian.
9) Use nla_strlcpy to fix compilation warning in nfnetlink_acct
and nfnetlink_cthelper, again from Florian.
10) Missing module alias in icmp and icmp6 xtables extensions,
from Florian Westphal.
11) Base chain statistics in nf_tables may be unset/null, from Florian.
12) Fix handling of large matchinfo size in nft_compat, this includes
one preparation for before this fix. From Florian.
13) Fix bogus EBUSY error when deleting chains due to incorrect reference
counting from the preparation phase of the two-phase commit protocol.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bpf syscall and selftests conflicts were trivial
overlapping changes.
The r8169 change involved moving the added mdelay from 'net' into a
different function.
A TLS close bug fix overlapped with the splitting of the TLS state
into separate TX and RX parts. I just expanded the tests in the bug
fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf
== X".
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the truncated bit is set only when 1) the mirrored packet
is larger than mtu and 2) the ipv4 packet tot_len is larger than
the actual skb->len. This patch adds another case for detecting
whether ipv6 packet is truncated or not, by checking the ipv6 header
payload_len and the skb->len.
Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add stubs to retrieve a handle to an IPv6 FIB table, fib6_get_table,
a stub to do a lookup in a specific table, fib6_table_lookup, and
a stub for a full route lookup.
The stubs are needed for core bpf code to handle the case when the
IPv6 module is not builtin.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|