summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2020-12-21bonding: fix feature flag setting at init timeJarod Wilson1-2/+0
[ Upstream commit 007ab5345545aba2f9cbe4c096cc35d2fd3275ac ] Don't try to adjust XFRM support flags if the bond device isn't yet registered. Bad things can currently happen when netdev_change_features() is called without having wanted_features fully filled in yet. This code runs both on post-module-load mode changes, as well as at module init time, and when run at module init time, it is before register_netdevice() has been called and filled in wanted_features. The empty wanted_features led to features also getting emptied out, which was definitely not the intended behavior, so prevent that from happening. Originally, I'd hoped to stop adjusting wanted_features at all in the bonding driver, as it's documented as being something only the network core should touch, but we actually do need to do this to properly update both the features and wanted_features fields when changing the bond type, or we get to a situation where ethtool sees: esp-hw-offload: off [requested on] I do think we should be using netdev_update_features instead of netdev_change_features here though, so we only send notifiers when the features actually changed. Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") Reported-by: Ivan Vecera <ivecera@redhat.com> Suggested-by: Ivan Vecera <ivecera@redhat.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jarod Wilson <jarod@redhat.com> Link: https://lore.kernel.org/r/20201205172229.576587-1-jarod@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-11netfilter: nftables_offload: build mask based from the matching bytesPablo Neira Ayuso1-0/+3
commit a5d45bc0dc50f9dd83703510e9804d813a9cac32 upstream. Userspace might match on prefix bytes of header fields if they are on the byte boundary, this requires that the mask is adjusted accordingly. Use NFT_OFFLOAD_MATCH_EXACT() for meta since prefix byte matching is not allowed for this type of selector. The bitwise expression might be optimized out by userspace, hence the kernel needs to infer the prefix from the number of payload bytes to match on. This patch adds nft_payload_offload_mask() to calculate the bitmask to match on the prefix. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-11netfilter: nftables_offload: set address type in control dissectorPablo Neira Ayuso1-0/+4
commit 3c78e9e0d33a27ab8050e4492c03c6a1f8d0ed6b upstream. This patch adds nft_flow_rule_set_addr_type() to set the address type from the nft_payload expression accordingly. If the address type is not set in the control dissector then a rule that matches either on source or destination IP address does not work. After this patch, nft hardware offload generates the flow dissector configuration as tc-flower does to match on an IP address. This patch has been also tested functionally to make sure packets are filtered out by the NIC. This is also getting the code aligned with the existing netfilter flow offload infrastructure which is also setting the control dissector. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08inet_ecn: Fix endianness of checksum update when setting ECT(1)Toke Høiland-Jørgensen1-1/+1
[ Upstream commit 2867e1eac61016f59b3d730e3f7aa488e186e917 ] When adding support for propagating ECT(1) marking in IP headers it seems I suffered from endianness-confusion in the checksum update calculation: In fact the ECN field is in the *lower* bits of the first 16-bit word of the IP header when calculating in network byte order. This means that the addition performed to update the checksum field was wrong; let's fix that. Fixes: b723748750ec ("tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040") Reported-by: Jonathan Morton <chromatix99@gmail.com> Tested-by: Pete Heist <pete@heistp.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20201130183705.17540-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08net/tls: Protect from calling tls_dev_del for TLS RX twiceMaxim Mikityanskiy1-0/+6
[ Upstream commit 025cc2fb6a4e84e9a0552c0017dcd1c24b7ac7da ] tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after calling tls_dev_del if TLX TX offload is also enabled. Clearing tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a time frame when tls_device_down may get called and call tls_dev_del for RX one extra time, confusing the driver, which may lead to a crash. This patch corrects this racy behavior by adding a flag to prevent tls_device_down from calling tls_dev_del the second time. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-02bonding: wait for sysfs kobject destruction before freeing struct slaveJamie Iles1-0/+8
[ Upstream commit b9ad3e9f5a7a760ab068e33e1f18d240ba32ce92 ] syzkaller found that with CONFIG_DEBUG_KOBJECT_RELEASE=y, releasing a struct slave device could result in the following splat: kobject: 'bonding_slave' (00000000cecdd4fe): kobject_release, parent 0000000074ceb2b2 (delayed 1000) bond0 (unregistering): (slave bond_slave_1): Releasing backup interface ------------[ cut here ]------------ ODEBUG: free active (active state 0) object type: timer_list hint: workqueue_select_cpu_near kernel/workqueue.c:1549 [inline] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x98 kernel/workqueue.c:1600 WARNING: CPU: 1 PID: 842 at lib/debugobjects.c:485 debug_print_object+0x180/0x240 lib/debugobjects.c:485 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 842 Comm: kworker/u4:4 Tainted: G S 5.9.0-rc8+ #96 Hardware name: linux,dummy-virt (DT) Workqueue: netns cleanup_net Call trace: dump_backtrace+0x0/0x4d8 include/linux/bitmap.h:239 show_stack+0x34/0x48 arch/arm64/kernel/traps.c:142 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x174/0x1f8 lib/dump_stack.c:118 panic+0x360/0x7a0 kernel/panic.c:231 __warn+0x244/0x2ec kernel/panic.c:600 report_bug+0x240/0x398 lib/bug.c:198 bug_handler+0x50/0xc0 arch/arm64/kernel/traps.c:974 call_break_hook+0x160/0x1d8 arch/arm64/kernel/debug-monitors.c:322 brk_handler+0x30/0xc0 arch/arm64/kernel/debug-monitors.c:329 do_debug_exception+0x184/0x340 arch/arm64/mm/fault.c:864 el1_dbg+0x48/0xb0 arch/arm64/kernel/entry-common.c:65 el1_sync_handler+0x170/0x1c8 arch/arm64/kernel/entry-common.c:93 el1_sync+0x80/0x100 arch/arm64/kernel/entry.S:594 debug_print_object+0x180/0x240 lib/debugobjects.c:485 __debug_check_no_obj_freed lib/debugobjects.c:967 [inline] debug_check_no_obj_freed+0x200/0x430 lib/debugobjects.c:998 slab_free_hook mm/slub.c:1536 [inline] slab_free_freelist_hook+0x190/0x210 mm/slub.c:1577 slab_free mm/slub.c:3138 [inline] kfree+0x13c/0x460 mm/slub.c:4119 bond_free_slave+0x8c/0xf8 drivers/net/bonding/bond_main.c:1492 __bond_release_one+0xe0c/0xec8 drivers/net/bonding/bond_main.c:2190 bond_slave_netdev_event drivers/net/bonding/bond_main.c:3309 [inline] bond_netdev_event+0x8f0/0xa70 drivers/net/bonding/bond_main.c:3420 notifier_call_chain+0xf0/0x200 kernel/notifier.c:83 __raw_notifier_call_chain kernel/notifier.c:361 [inline] raw_notifier_call_chain+0x44/0x58 kernel/notifier.c:368 call_netdevice_notifiers_info+0xbc/0x150 net/core/dev.c:2033 call_netdevice_notifiers_extack net/core/dev.c:2045 [inline] call_netdevice_notifiers net/core/dev.c:2059 [inline] rollback_registered_many+0x6a4/0xec0 net/core/dev.c:9347 unregister_netdevice_many.part.0+0x2c/0x1c0 net/core/dev.c:10509 unregister_netdevice_many net/core/dev.c:10508 [inline] default_device_exit_batch+0x294/0x338 net/core/dev.c:10992 ops_exit_list.isra.0+0xec/0x150 net/core/net_namespace.c:189 cleanup_net+0x44c/0x888 net/core/net_namespace.c:603 process_one_work+0x96c/0x18c0 kernel/workqueue.c:2269 worker_thread+0x3f0/0xc30 kernel/workqueue.c:2415 kthread+0x390/0x498 kernel/kthread.c:292 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:925 This is a potential use-after-free if the sysfs nodes are being accessed whilst removing the struct slave, so wait for the object destruction to complete before freeing the struct slave itself. Fixes: 07699f9a7c8d ("bonding: add sysfs /slave dir for bond slave devices.") Fixes: a068aab42258 ("bonding: Fix reference count leak in bond_sysfs_slave_add.") Cc: Qiushi Wu <wu000273@umn.edu> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20201120142827.879226-1-jamie@nuviainc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24ip_tunnels: Set tunnel option flag when tunnel metadata is presentYi-Hung Wei1-3/+4
[ Upstream commit 9c2e14b48119b39446031d29d994044ae958d8fc ] Currently, we may set the tunnel option flag when the size of metadata is zero. For example, we set TUNNEL_GENEVE_OPT in the receive function no matter the geneve option is present or not. As this may result in issues on the tunnel flags consumers, this patch fixes the issue. Related discussion: * https://lore.kernel.org/netdev/1604448694-19351-1-git-send-email-yihung.wei@gmail.com/T/#u Fixes: 256c87c17c53 ("net: check tunnel option type in tunnel flags") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Link: https://lore.kernel.org/r/1605053800-74072-1-git-send-email-yihung.wei@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24net/tls: Fix wrong record sn in async mode of device resyncTariq Toukan1-1/+15
[ Upstream commit 138559b9f99d3b6b1d5e75c78facc067a23871c6 ] In async_resync mode, we log the TCP seq of records until the async request is completed. Later, in case one of the logged seqs matches the resync request, we return it, together with its record serial number. Before this fix, we mistakenly returned the serial number of the current record instead. Fixes: ed9b7646b06a ("net/tls: Add asynchronous resync") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Link: https://lore.kernel.org/r/20201115131448.2702-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24Exempt multicast addresses from five-second neighbor lifetimeJeff Dike1-0/+1
[ Upstream commit 8cf8821e15cd553339a5b48ee555a0439c2b2742 ] Commit 58956317c8de ("neighbor: Improve garbage collection") guarantees neighbour table entries a five-second lifetime. Processes which make heavy use of multicast can fill the neighour table with multicast addresses in five seconds. At that point, neighbour entries can't be GC-ed because they aren't five seconds old yet, the kernel log starts to fill up with "neighbor table overflow!" messages, and sends start to fail. This patch allows multicast addresses to be thrown out before they've lived out their five seconds. This makes room for non-multicast addresses and makes messages to all addresses more reliable in these circumstances. Fixes: 58956317c8de ("neighbor: Improve garbage collection") Signed-off-by: Jeff Dike <jdike@akamai.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20201113015815.31397-1-jdike@akamai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01netfilter: nftables_offload: KASAN slab-out-of-bounds Read in ↵Saeed Mirzamohammadi1-0/+6
nft_flow_rule_create commit 31cc578ae2de19c748af06d859019dced68e325d upstream. This patch fixes the issue due to: BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2 net/netfilter/nf_tables_offload.c:40 Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244 The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds. This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue. Add nft_expr_more() and use it to fix this problem. Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-29netfilter: nf_log: missing vlan offload tag and protoPablo Neira Ayuso1-0/+1
[ Upstream commit 0d9826bc18ce356e8909919ad681ad65d0a6061e ] Dump vlan tag and proto for the usual vlan offload case if the NF_LOG_MACDECODE flag is set on. Without this information the logging is misleading as there is no reference to the VLAN header. [12716.993704] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0800 SRC=192.168.10.2 DST=172.217.168.163 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=2548 DF PROTO=TCP SPT=55848 DPT=80 WINDOW=501 RES=0x00 ACK FIN URGP=0 [12721.157643] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0806 ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=86:6c:92:ea:d6:73 IPSRC=192.168.10.2 MACDST=0e:3b:eb:86:73:76 IPDST=192.168.10.1 Fixes: 83e96d443b37 ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_infoLeon Romanovsky1-1/+4
[ Upstream commit d086a1c65aabb5a4e1edc580ca583e2964c62b44 ] The access of tcf_tunnel_info() produces the following splat, so fix it by dereferencing the tcf_tunnel_key_params pointer with marker that internal tcfa_liock is held. ============================= WARNING: suspicious RCU usage 5.9.0+ #1 Not tainted ----------------------------- include/net/tc_act/tc_tunnel_key.h:59 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by tc/34839: #0: ffff88828572c2a0 (&p->tcfa_lock){+...}-{2:2}, at: tc_setup_flow_action+0xb3/0x48b5 stack backtrace: CPU: 1 PID: 34839 Comm: tc Not tainted 5.9.0+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x9a/0xd0 tc_setup_flow_action+0x14cb/0x48b5 fl_hw_replace_filter+0x347/0x690 [cls_flower] fl_change+0x2bad/0x4875 [cls_flower] tc_new_tfilter+0xf6f/0x1ba0 rtnetlink_rcv_msg+0x5f2/0x870 netlink_rcv_skb+0x124/0x350 netlink_unicast+0x433/0x700 netlink_sendmsg+0x6f1/0xbd0 sock_sendmsg+0xb0/0xe0 ____sys_sendmsg+0x4fa/0x6d0 ___sys_sendmsg+0x12e/0x1b0 __sys_sendmsg+0xa4/0x120 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f1f8cd4fe57 Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffdc1e193b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1f8cd4fe57 RDX: 0000000000000000 RSI: 00007ffdc1e19420 RDI: 0000000000000003 RBP: 000000005f85aafa R08: 0000000000000001 R09: 00007ffdc1e1936c R10: 000000000040522d R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 00007ffdc1e1d6f0 R15: 0000000000482420 Fixes: 3ebaf6da0716 ("net: sched: Do not assume RTNL is held in tunnel key action helpers") Fixes: 7a47281439ba ("net: sched: lock action when translating it to flow_action infra") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-17Bluetooth: L2CAP: Fix calling sk_filter on non-socket based channelLuiz Augusto von Dentz1-0/+2
commit f19425641cb2572a33cb074d5e30283720bd4d22 upstream. Only sockets will have the chan->data set to an actual sk, channels like A2MP would have its own data which would likely cause a crash when calling sk_filter, in order to fix this a new callback has been introduced so channels can implement their own filtering if necessary. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02netlink: fix policy dump leakJohannes Berg1-1/+2
If userspace doesn't complete the policy dump, we leak the allocated state. Fix this. Fixes: d07dcf9aadd6 ("netlink: add infrastructure to expose policies to userspace") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29genetlink: add missing kdoc for validation flagsJakub Kicinski1-0/+1
Validation flags are missing kdoc, add it. Fixes: ef6243acb478 ("genetlink: optionally validate strictly/dumps") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28Merge branch 'master' of ↵David S. Miller1-10/+6
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2020-09-28 1) Fix a build warning in ip_vti if CONFIG_IPV6 is not set. From YueHaibing. 2) Restore IPCB on espintcp before handing the packet to xfrm as the information there is still needed. From Sabrina Dubroca. 3) Fix pmtu updating for xfrm interfaces. From Sabrina Dubroca. 4) Some xfrm state information was not cloned with xfrm_do_migrate. Fixes to clone the full xfrm state, from Antony Antony. 5) Use the correct address family in xfrm_state_find. The struct flowi must always be interpreted along with the original address family. This got lost over the years. Fix from Herbert Xu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25net/ipv4: always honour route mtu during forwardingMaciej Żenczykowski1-0/+6
Documentation/networking/ip-sysctl.txt:46 says: ip_forward_use_pmtu - BOOLEAN By default we don't trust protocol path MTUs while forwarding because they could be easily forged and can lead to unwanted fragmentation by the router. You only need to enable this if you have user-space software which tries to discover path mtus by itself and depends on the kernel honoring this information. This is normally not the case. Default: 0 (disabled) Possible values: 0 - disabled 1 - enabled Which makes it pretty clear that setting it to 1 is a potential security/safety/DoS issue, and yet it is entirely reasonable to want forwarded traffic to honour explicitly administrator configured route mtus (instead of defaulting to device mtu). Indeed, I can't think of a single reason why you wouldn't want to. Since you configured a route mtu you probably know better... It is pretty common to have a higher device mtu to allow receiving large (jumbo) frames, while having some routes via that interface (potentially including the default route to the internet) specify a lower mtu. Note that ipv6 forwarding uses device mtu unless the route is locked (in which case it will use the route mtu). This approach is not usable for IPv4 where an 'mtu lock' on a route also has the side effect of disabling TCP path mtu discovery via disabling the IPv4 DF (don't frag) bit on all outgoing frames. I'm not aware of a way to lock a route from an IPv6 RA, so that also potentially seems wrong. Signed-off-by: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com> Cc: Vinay Paradkar <vparadka@qti.qualcomm.com> Cc: Tyler Wear <twear@quicinc.com> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25net_sched: defer tcf_idr_insert() in tcf_action_init_1()Cong Wang1-2/+0
All TC actions call tcf_idr_insert() for new action at the end of their ->init(), so we can actually move it to a central place in tcf_action_init_1(). And once the action is inserted into the global IDR, other parallel process could free it immediately as its refcnt is still 1, so we can not fail after this, we need to move it after the goto action validation to avoid handling the failure case after insertion. This is found during code review, is not directly triggered by syzbot. And this prepares for the next patch. Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21net: sctp: Fix IPv6 ancestor_size calc in sctp_copy_descendantHenry Ptasinski1-3/+5
When calculating ancestor_size with IPv6 enabled, simply using sizeof(struct ipv6_pinfo) doesn't account for extra bytes needed for alignment in the struct sctp6_sock. On x86, there aren't any extra bytes, but on ARM the ipv6_pinfo structure is aligned on an 8-byte boundary so there were 4 pad bytes that were omitted from the ancestor_size calculation. This would lead to corruption of the pd_lobby pointers, causing an oops when trying to free the sctp structure on socket close. Fixes: 636d25d557d1 ("sctp: not copy sctp_sock pd_lobby in sctp_copy_descendant") Signed-off-by: Henry Ptasinski <hptasinski@google.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15net: sched: only keep the available bits when setting vxlan md->gbpXin Long1-0/+3
As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can be set to or parse from the packet for vxlan gbp option. So we'd better do the mask when set it in act_tunnel_key and cls_flower. Otherwise, when users don't know these bits, they may configure with a value which can never be matched. Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15ipv4: Initialize flowi4_multipath_hash in data pathDavid Ahern1-0/+1
flowi4_multipath_hash was added by the commit referenced below for tunnels. Unfortunately, the patch did not initialize the new field for several fast path lookups that do not initialize the entire flow struct to 0. Fix those locations. Currently, flowi4_multipath_hash is random garbage and affects the hash value computed by fib_multipath_hash for multipath selection. Fixes: 24ba14406c5c ("route: Add multipath_hash in flowi_common to make user-define hash") Signed-off-by: David Ahern <dsahern@gmail.com> Cc: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-11netlink: fix doc about nlmsg_parse/nla_validateNicolas Dichtel1-2/+0
There is no @validate argument. CC: Johannes Berg <johannes.berg@intel.com> Fixes: 3de644035446 ("netlink: re-add parse/validate functions in strict mode") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-08netfilter: nf_tables: coalesce multiple notifications into one skbuffPablo Neira Ayuso1-0/+1
On x86_64, each notification results in one skbuff allocation which consumes at least 768 bytes due to the skbuff overhead. This patch coalesces several notifications into one single skbuff, so each notification consumes at least ~211 bytes, that ~3.5 times less memory consumption. As a result, this is reducing the chances to exhaust the netlink socket receive buffer. Rule of thumb is that each notification batch only contains netlink messages whose report flag is the same, nfnetlink_send() requires this to do appropriate delivery to userspace, either via unicast (echo mode) or multicast (monitor mode). The skbuff control buffer is used to annotate the report flag for later handling at the new coalescing routine. The batch skbuff notification size is NLMSG_GOODSIZE, using a larger skbuff would allow for more socket receiver buffer savings (to amortize the cost of the skbuff even more), however, going over that size might break userspace applications, so let's be conservative and stick to NLMSG_GOODSIZE. Reported-by: Phil Sutter <phil@nwl.cc> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-07xfrm: clone XFRMA_REPLAY_ESN_VAL in xfrm_do_migrateAntony Antony1-10/+6
XFRMA_REPLAY_ESN_VAL was not cloned completely from the old to the new. Migrate this attribute during XFRMA_MSG_MIGRATE v1->v2: - move curleft cloning to a separate patch Fixes: af2f464e326e ("xfrm: Assign esn pointers when cloning a state") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-08-27Merge tag 'rxrpc-fixes-20200820' of ↵David S. Miller1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc, afs: Fix probing issues Here are some fixes for rxrpc and afs to fix issues in the RTT measuring in rxrpc and thence the Volume Location server probing in afs: (1) Move the serial number of a received ACK into a local variable to simplify the next patch. (2) Fix the loss of RTT samples due to extra interposed ACKs causing baseline information to be discarded too early. This is a particular problem for afs when it sends a single very short call to probe a server it hasn't talked to recently. (3) Fix rxrpc_kernel_get_srtt() to indicate whether it actually has seen any valid samples or not. (4) Remove a field that's set/woken, but never read/waited on. (5) Expose the RTT and other probe information through procfs to make debugging of this stuff easier. (6) Fix VL rotation in afs to only use summary information from VL probing and not the probe running state (which gets clobbered when next a probe is issued). (7) Fix VL rotation to actually return the error aggregated from the probe errors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24ipv6: ndisc: adjust ndisc_ifinfo_sysctl_change prototypeTobias Klauser1-1/+1
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ndisc_ifinfo_sysctl_change to take a kernel pointer. Adjust its prototype in net/ndisc.h as well to fix the following sparse warning: net/ipv6/ndisc.c:1838:5: error: symbol 'ndisc_ifinfo_sysctl_change' redeclared with different type (incompatible argument 3 (different address spaces)): net/ipv6/ndisc.c:1838:5: int extern [addressable] [signed] [toplevel] ndisc_ifinfo_sysctl_change( ... ) net/ipv6/ndisc.c: note: in included file (through include/net/ipv6.h): ./include/net/ndisc.h:496:5: note: previously declared as: ./include/net/ndisc.h:496:5: int extern [addressable] [signed] [toplevel] ndisc_ifinfo_sysctl_change( ... ) net/ipv6/ndisc.c: note: in included file (through include/net/ip6_route.h): Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-21netfilter: nf_tables: fix destination register zeroingFlorian Westphal1-0/+2
Following bug was reported via irc: nft list ruleset set knock_candidates_ipv4 { type ipv4_addr . inet_service size 65535 elements = { 127.0.0.1 . 123, 127.0.0.1 . 123 } } .. udp dport 123 add @knock_candidates_ipv4 { ip saddr . 123 } udp dport 123 add @knock_candidates_ipv4 { ip saddr . udp dport } It should not have been possible to add a duplicate set entry. After some debugging it turned out that the problem is the immediate value (123) in the second-to-last rule. Concatenations use 32bit registers, i.e. the elements are 8 bytes each, not 6 and it turns out the kernel inserted inet firewall @knock_candidates_ipv4 element 0100007f ffff7b00 : 0 [end] element 0100007f 00007b00 : 0 [end] Note the non-zero upper bits of the first element. It turns out that nft_immediate doesn't zero the destination register, but this is needed when the length isn't a multiple of 4. Furthermore, the zeroing in nft_payload is broken. We can't use [len / 4] = 0 -- if len is a multiple of 4, index is off by one. Skip zeroing in this case and use a conditional instead of (len -1) / 4. Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-20rxrpc: Make rxrpc_kernel_get_srtt() indicate validityDavid Howells1-1/+1
Fix rxrpc_kernel_get_srtt() to indicate the validity of the returned smoothed RTT. If we haven't had any valid samples yet, the SRTT isn't useful. Fixes: c410bf01933e ("rxrpc: Fix the excessive initial retransmission timeout") Signed-off-by: David Howells <dhowells@redhat.com>
2020-08-19ipv6: some fixes for ipv6_dev_find()Xin Long1-1/+2
This patch is to do 3 things for ipv6_dev_find(): As David A. noticed, - rt6_lookup() is not really needed. Different from __ip_dev_find(), ipv6_dev_find() doesn't have a compatibility problem, so remove it. As Hideaki suggested, - "valid" (non-tentative) check for the address is also needed. ipv6_chk_addr() calls ipv6_chk_addr_and_flags(), which will traverse the address hash list, but it's heavy to be called inside ipv6_dev_find(). This patch is to reuse the code of ipv6_chk_addr_and_flags() for ipv6_dev_find(). - dev parameter is passed into ipv6_dev_find(), as link-local addresses from user space has sin6_scope_id set and the dev lookup needs it. Fixes: 81f6cb31222d ("ipv6: add ipv6_dev_find()") Suggested-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds2-0/+6
Pull networking fixes from David Miller: "Some merge window fallout, some longer term fixes: 1) Handle headroom properly in lapbether and x25_asy drivers, from Xie He. 2) Fetch MAC address from correct r8152 device node, from Thierry Reding. 3) In the sw kTLS path we should allow MSG_CMSG_COMPAT in sendmsg, from Rouven Czerwinski. 4) Correct fdputs in socket layer, from Miaohe Lin. 5) Revert troublesome sockptr_t optimization, from Christoph Hellwig. 6) Fix TCP TFO key reading on big endian, from Jason Baron. 7) Missing CAP_NET_RAW check in nfc, from Qingyu Li. 8) Fix inet fastreuse optimization with tproxy sockets, from Tim Froidcoeur. 9) Fix 64-bit divide in new SFC driver, from Edward Cree. 10) Add a tracepoint for prandom_u32 so that we can more easily perform usage analysis. From Eric Dumazet. 11) Fix rwlock imbalance in AF_PACKET, from John Ogness" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits) net: openvswitch: introduce common code for flushing flows af_packet: TPACKET_V3: fix fill status rwlock imbalance random32: add a tracepoint for prandom_u32() Revert "ipv4: tunnel: fix compilation on ARCH=um" net: accept an empty mask in /sys/class/net/*/queues/rx-*/rps_cpus net: ethernet: stmmac: Disable hardware multicast filter net: stmmac: dwmac1000: provide multicast filter fallback ipv4: tunnel: fix compilation on ARCH=um vsock: fix potential null pointer dereference in vsock_poll() sfc: fix ef100 design-param checking net: initialize fastreuse on inet_inherit_port net: refactor bind_bucket fastreuse into helper net: phy: marvell10g: fix null pointer dereference net: Fix potential memory leak in proto_register() net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init ionic_lif: Use devm_kcalloc() in ionic_qcq_alloc() net/nfc/rawsock.c: add CAP_NET_RAW check. hinic: fix strncpy output truncated compile warnings drivers/net/wan/x25_asy: Added needed_headroom and a skb->len check net/tls: Fix kmap usage ...
2020-08-12net: refactor bind_bucket fastreuse into helperTim Froidcoeur1-0/+4
Refactor the fastreuse update code in inet_csk_get_port into a small helper function that can be called from other places. Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-11Merge tag 'locking-urgent-2020-08-10' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Thomas Gleixner: "A set of locking fixes and updates: - Untangle the header spaghetti which causes build failures in various situations caused by the lockdep additions to seqcount to validate that the write side critical sections are non-preemptible. - The seqcount associated lock debug addons which were blocked by the above fallout. seqcount writers contrary to seqlock writers must be externally serialized, which usually happens via locking - except for strict per CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot validate that the lock is held. This new debug mechanism adds the concept of associated locks. sequence count has now lock type variants and corresponding initializers which take a pointer to the associated lock used for writer serialization. If lockdep is enabled the pointer is stored and write_seqcount_begin() has a lockdep assertion to validate that the lock is held. Aside of the type and the initializer no other code changes are required at the seqcount usage sites. The rest of the seqcount API is unchanged and determines the type at compile time with the help of _Generic which is possible now that the minimal GCC version has been moved up. Adding this lockdep coverage unearthed a handful of seqcount bugs which have been addressed already independent of this. While generally useful this comes with a Trojan Horse twist: On RT kernels the write side critical section can become preemtible if the writers are serialized by an associated lock, which leads to the well known reader preempts writer livelock. RT prevents this by storing the associated lock pointer independent of lockdep in the seqcount and changing the reader side to block on the lock when a reader detects that a writer is in the write side critical section. - Conversion of seqcount usage sites to associated types and initializers" * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) locking/seqlock, headers: Untangle the spaghetti monster locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header x86/headers: Remove APIC headers from <asm/smp.h> seqcount: More consistent seqprop names seqcount: Compress SEQCNT_LOCKNAME_ZERO() seqlock: Fold seqcount_LOCKNAME_init() definition seqlock: Fold seqcount_LOCKNAME_t definition seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g hrtimer: Use sequence counter with associated raw spinlock kvm/eventfd: Use sequence counter with associated spinlock userfaultfd: Use sequence counter with associated spinlock NFSv4: Use sequence counter with associated spinlock iocost: Use sequence counter with associated spinlock raid5: Use sequence counter with associated spinlock vfs: Use sequence counter with associated spinlock timekeeping: Use sequence counter with associated raw spinlock xfrm: policy: Use sequence counters with associated lock netfilter: nft_set_rbtree: Use sequence counter with associated rwlock netfilter: conntrack: Use sequence counter with associated spinlock sched: tasks: Use sequence counter with associated spinlock ...
2020-08-10tcp: correct read of TFO keys on big endian systemsJason Baron1-0/+2
When TFO keys are read back on big endian systems either via the global sysctl interface or via getsockopt() using TCP_FASTOPEN_KEY, the values don't match what was written. For example, on s390x: # echo "1-2-3-4" > /proc/sys/net/ipv4/tcp_fastopen_key # cat /proc/sys/net/ipv4/tcp_fastopen_key 02000000-01000000-04000000-03000000 Instead of: # cat /proc/sys/net/ipv4/tcp_fastopen_key 00000001-00000002-00000003-00000004 Fix this by converting to the correct endianness on read. This was reported by Colin Ian King when running the 'tcp_fastopen_backup_key' net selftest on s390x, which depends on the read value matching what was written. I've confirmed that the test now passes on big and little endian systems. Signed-off-by: Jason Baron <jbaron@akamai.com> Fixes: 438ac88009bc ("net: fastopen: robustness and endianness fixes for SipHash") Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Eric Dumazet <edumazet@google.com> Reported-and-tested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-07Merge tag 'dlm-5.9' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set includes a some improvements to the dlm networking layer: improving the ability to trace dlm messages for debugging, and improved handling of bad messages or disrupted connections" * tag 'dlm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: fs: dlm: implement tcp graceful shutdown fs: dlm: change handling of reconnects fs: dlm: don't close socket on invalid message fs: dlm: set skb mark per peer socket fs: dlm: set skb mark for listen socket net: sock: add sock_set_mark dlm: Fix kobject memleak
2020-08-06net: sock: add sock_set_markAlexander Aring1-0/+1
This patch adds a new socket helper function to set the mark value for a kernel socket. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds58-256/+1105
Pull networking updates from David Miller: 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan. 2) Support UDP segmentation in code TSO code, from Eric Dumazet. 3) Allow flashing different flash images in cxgb4 driver, from Vishal Kulkarni. 4) Add drop frames counter and flow status to tc flower offloading, from Po Liu. 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni. 6) Various new indirect call avoidance, from Eric Dumazet and Brian Vazquez. 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from Yonghong Song. 8) Support querying and setting hardware address of a port function via devlink, use this in mlx5, from Parav Pandit. 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson. 10) Switch qca8k driver over to phylink, from Jonathan McDowell. 11) In bpftool, show list of processes holding BPF FD references to maps, programs, links, and btf objects. From Andrii Nakryiko. 12) Several conversions over to generic power management, from Vaibhav Gupta. 13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry Yakunin. 14) Various https url conversions, from Alexander A. Klimov. 15) Timestamping and PHC support for mscc PHY driver, from Antoine Tenart. 16) Support bpf iterating over tcp and udp sockets, from Yonghong Song. 17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov. 18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan. 19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several drivers. From Luc Van Oostenryck. 20) XDP support for xen-netfront, from Denis Kirjanov. 21) Support receive buffer autotuning in MPTCP, from Florian Westphal. 22) Support EF100 chip in sfc driver, from Edward Cree. 23) Add XDP support to mvpp2 driver, from Matteo Croce. 24) Support MPTCP in sock_diag, from Paolo Abeni. 25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic infrastructure, from Jakub Kicinski. 26) Several pci_ --> dma_ API conversions, from Christophe JAILLET. 27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel. 28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki. 29) Refactor a lot of networking socket option handling code in order to avoid set_fs() calls, from Christoph Hellwig. 30) Add rfc4884 support to icmp code, from Willem de Bruijn. 31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei. 32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin. 33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin. 34) Support TCP syncookies in MPTCP, from Flowian Westphal. 35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano Brivio. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits) net: thunderx: initialize VF's mailbox mutex before first usage usb: hso: remove bogus check for EINPROGRESS usb: hso: no complaint about kmalloc failure hso: fix bailout in error case of probe ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM selftests/net: relax cpu affinity requirement in msg_zerocopy test mptcp: be careful on subflow creation selftests: rtnetlink: make kci_test_encap() return sub-test result selftests: rtnetlink: correct the final return value for the test net: dsa: sja1105: use detected device id instead of DT one on mismatch tipc: set ub->ifindex for local ipv6 address ipv6: add ipv6_dev_find() net: openvswitch: silence suspicious RCU usage warning Revert "vxlan: fix tos value before xmit" ptp: only allow phase values lower than 1 period farsync: switch from 'pci_' to 'dma_' API wan: wanxl: switch from 'pci_' to 'dma_' API hv_netvsc: do not use VF device if link is down dpaa2-eth: Fix passing zero to 'PTR_ERR' warning net: macb: Properly handle phylink on at91sam9x ...
2020-08-05ipv6: add ipv6_dev_find()Xin Long1-0/+2
This is to add an ip_dev_find like function for ipv6, used to find the dev by saddr. It will be used by TIPC protocol. So also export it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-05Merge tag 'seccomp-v5.9-rc1' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "There are a bunch of clean ups and selftest improvements along with two major updates to the SECCOMP_RET_USER_NOTIF filter return: EPOLLHUP support to more easily detect the death of a monitored process, and being able to inject fds when intercepting syscalls that expect an fd-opening side-effect (needed by both container folks and Chrome). The latter continued the refactoring of __scm_install_fd() started by Christoph, and in the process found and fixed a handful of bugs in various callers. - Improved selftest coverage, timeouts, and reporting - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner) - Refactor __scm_install_fd() into __receive_fd() and fix buggy callers - Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun Dhillon)" * tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits) selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD seccomp: Introduce addfd ioctl to seccomp user notifier fs: Expand __receive_fd() to accept existing fd pidfd: Replace open-coded receive_fd() fs: Add receive_fd() wrapper for __receive_fd() fs: Move __scm_install_fd() to __receive_fd() net/scm: Regularize compat handling of scm_detach_fds() pidfd: Add missing sock updates for pidfd_getfd() net/compat: Add missing sock updates for SCM_RIGHTS selftests/seccomp: Check ENOSYS under tracing selftests/seccomp: Refactor to use fixture variants selftests/harness: Clean up kern-doc for fixtures seccomp: Use -1 marker for end of mode 1 syscall list seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall() selftests/seccomp: Make kcmp() less required seccomp: Use pr_fmt selftests/seccomp: Improve calibration loop selftests/seccomp: use 90s as timeout selftests/seccomp: Expand benchmark to per-filter measurements ...
2020-08-04Merge tag 'uninit-macro-v5.9-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
2020-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2-0/+14
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Flush the cleanup xtables worker to make sure destructors have completed, from Florian Westphal. 2) iifgroup is matching erroneously, also from Florian. 3) Add selftest for meta interface matching, from Florian Westphal. 4) Move nf_ct_offload_timeout() to header, from Roi Dayan. 5) Call nf_ct_offload_timeout() from flow_offload_add() to make sure garbage collection does not evict offloaded flow, from Roi Dayan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04tunnels: PMTU discovery support for directly bridged IP packetsStefano Brivio2-10/+2
It's currently possible to bridge Ethernet tunnels carrying IP packets directly to external interfaces without assigning them addresses and routes on the bridged network itself: this is the case for UDP tunnels bridged with a standard bridge or by Open vSwitch. PMTU discovery is currently broken with those configurations, because the encapsulation effectively decreases the MTU of the link, and while we are able to account for this using PMTU discovery on the lower layer, we don't have a way to relay ICMP or ICMPv6 messages needed by the sender, because we don't have valid routes to it. On the other hand, as a tunnel endpoint, we can't fragment packets as a general approach: this is for instance clearly forbidden for VXLAN by RFC 7348, section 4.3: VTEPs MUST NOT fragment VXLAN packets. Intermediate routers may fragment encapsulated VXLAN packets due to the larger frame size. The destination VTEP MAY silently discard such VXLAN fragments. The same paragraph recommends that the MTU over the physical network accomodates for encapsulations, but this isn't a practical option for complex topologies, especially for typical Open vSwitch use cases. Further, it states that: Other techniques like Path MTU discovery (see [RFC1191] and [RFC1981]) MAY be used to address this requirement as well. Now, PMTU discovery already works for routed interfaces, we get route exceptions created by the encapsulation device as they receive ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and we already rebuild those messages with the appropriate MTU and route them back to the sender. Add the missing bits for bridged cases: - checks in skb_tunnel_check_pmtu() to understand if it's appropriate to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and RFC 4443 section 2.4 for ICMPv6. This function is already called by UDP tunnels - a new function generating those ICMP or ICMPv6 replies. We can't reuse icmp_send() and icmp6_send() as we don't see the sender as a valid destination. This doesn't need to be generic, as we don't cover any other type of ICMP errors given that we only provide an encapsulation function to the sender While at it, make the MTU check in skb_tunnel_check_pmtu() accurate: we might receive GSO buffers here, and the passed headroom already includes the inner MAC length, so we don't have to account for it a second time (that would imply three MAC headers on the wire, but there are just two). This issue became visible while bridging IPv6 packets with 4500 bytes of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50 bytes of encapsulation headroom, we would advertise MTU as 3950, and we would reject fragmented IPv6 datagrams of 3958 bytes size on the wire. We're exclusively dealing with network MTU here, though, so we could get Ethernet frames up to 3964 octets in that case. v2: - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern) - split IPv4/IPv6 functions (David Ahern) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-2/+0
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-08-04 The following pull-request contains BPF updates for your *net-next* tree. We've added 73 non-merge commits during the last 9 day(s) which contain a total of 135 files changed, 4603 insertions(+), 1013 deletions(-). The main changes are: 1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko. 2) Add BPF iterator for map elements and to iterate all BPF programs for efficient in-kernel inspection, from Yonghong Song and Alexei Starovoitov. 3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid unwinder errors, from Song Liu. 4) Allow cgroup local storage map to be shared between programs on the same cgroup. Also extend BPF selftests with coverage, from YiFei Zhu. 5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM load instructions, from Jean-Philippe Brucker. 6) Follow-up fixes on BPF socket lookup in combination with reuseport group handling. Also add related BPF selftests, from Jakub Sitnicki. 7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for socket create/release as well as bind functions, from Stanislav Fomichev. 8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct xdp_statistics, from Peilin Ye. 9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime. 10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6} fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin. 11) Fix a bpftool segfault due to missing program type name and make it more robust to prevent them in future gaps, from Quentin Monnet. 12) Consolidate cgroup helper functions across selftests and fix a v6 localhost resolver issue, from John Fastabend. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04Merge tag 'mlx5-updates-2020-08-03' of ↵David S. Miller1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-08-03 This patchset introduces some updates to mlx5 driver. 1) Jakub converts mlx5 to use the new udp tunnel infrastructure. Starting with a hack to allow drivers to request a static configuration of the default vxlan port, and then a patch that converts mlx5. 2) Parav implements change_carrier ndo for VF eswitch representors, to speedup link state control of representors netdevices. 3) Alex Vesker, makes a simple update to software steering to fix an issue with push vlan action sequence 4) Leon removes a redundant dump stack on error flow. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04devlink: Pass extack when setting trap's action and group's parametersIdo Schimmel1-2/+4
A later patch will refuse to set the action of certain traps in mlxsw and also to change the policer binding of certain groups. Pass extack so that failure could be communicated clearly to user space. Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04devlink: Add early_drop trapAmit Cohen1-0/+3
Add the packet trap that can report packets that were ECN marked due to RED AQM. Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-0/+29
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) UAF in chain binding support from previous batch, from Dan Carpenter. 2) Queue up delayed work to expire connections with no destination, from Andrew Sy Kim. 3) Use fallthrough pseudo-keyword, from Gustavo A. R. Silva. 4) Replace HTTP links with HTTPS, from Alexander A. Klimov. 5) Remove superfluous null header checks in ip6tables, from Gaurav Singh. 6) Add extended netlink error reporting for expression. 7) Report EEXIST on overlapping chain, set elements and flowtable devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04net/sched: act_ct: fix miss set mru for ovs after defrag in act_ctwenxu1-1/+2
When openvswitch conntrack offload with act_ct action. Fragment packets defrag in the ingress tc act_ct action and miss the next chain. Then the packet pass to the openvswitch datapath without the mru. The over mtu packet will be dropped in output action in openvswitch for over mtu. "kernel: net2: dropped over-mtu packet: 1528 > 1500" This patch add mru in the tc_skb_ext for adefrag and miss next chain situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru to the qdisc_skb_cb when the packet defrag. And When the chain miss, The mru is set to tc_skb_ext which can be got by ovs datapath. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03udp_tunnel: add the ability to hard-code IANA VXLANJakub Kicinski1-0/+5
mlx5 has the IANA VXLAN port (4789) hard coded by the device, instead of being added dynamically when tunnels are created. To support this add a workaround flag to struct udp_tunnel_nic_info. Skipping updates for the port is fairly trivial, dumping the hard coded port via ethtool requires some code duplication. The port is not a part of any real table, we dump it in a special table which has no tunnel types supported and only one entry. This is the last known workaround / hack needed to convert all drivers to the new infra. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-08-03netfilter: conntrack: Move nf_ct_offload_timeout to header fileRoi Dayan1-0/+12
To be used by callers from other modules. [ Rename DAY to NF_CT_DAY to avoid possible symbol name pollution issue --Pablo ] Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller3-6/+13
Resolved kernel/bpf/btf.c using instructions from merge commit 69138b34a7248d2396ab85c8652e20c0c39beaba Signed-off-by: David S. Miller <davem@davemloft.net>