summaryrefslogtreecommitdiff
path: root/net/xfrm
AgeCommit message (Collapse)AuthorFilesLines
2022-12-02xfrm: replay: Fix ESN wrap around for GSOChristian Langrock2-2/+15
[ Upstream commit 4b549ccce941798703f159b227aa28c716aa78fa ] When using GSO it can happen that the wrong seq_hi is used for the last packets before the wrap around. This can lead to double usage of a sequence number. To avoid this, we should serialize this last GSO packet. Fixes: d7dbefc45cf5 ("xfrm: Add xfrm_replay_overflow functions for offloading") Co-developed-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Christian Langrock <christian.langrock@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26xfrm: Update ipcomp_scratches with NULL when freedKhalid Masum1-0/+1
[ Upstream commit 8a04d2fc700f717104bfb95b0f6694e448a4537f ] Currently if ipcomp_alloc_scratches() fails to allocate memory ipcomp_scratches holds obsolete address. So when we try to free the percpu scratches using ipcomp_free_scratches() it tries to vfree non existent vm area. Described below: static void * __percpu *ipcomp_alloc_scratches(void) { ... scratches = alloc_percpu(void *); if (!scratches) return NULL; ipcomp_scratches does not know about this allocation failure. Therefore holding the old obsolete address. ... } So when we free, static void ipcomp_free_scratches(void) { ... scratches = ipcomp_scratches; Assigning obsolete address from ipcomp_scratches if (!scratches) return; for_each_possible_cpu(i) vfree(*per_cpu_ptr(scratches, i)); Trying to free non existent page, causing warning: trying to vfree existent vm area. ... } Fix this breakage by updating ipcomp_scrtches with NULL when scratches is freed Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: syzbot+5ec9bb042ddfe9644773@syzkaller.appspotmail.com Tested-by: syzbot+5ec9bb042ddfe9644773@syzkaller.appspotmail.com Signed-off-by: Khalid Masum <khalid.masum.92@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26xfrm: Reinject transport-mode packets through workqueueLiu Jian1-5/+13
[ Upstream commit 4f4920669d21e1060b7243e5118dc3b71ced1276 ] The following warning is displayed when the tcp6-multi-diffip11 stress test case of the LTP test suite is tested: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ns-tcpserver:48198] CPU: 0 PID: 48198 Comm: ns-tcpserver Kdump: loaded Not tainted 6.0.0-rc6+ #39 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : des3_ede_encrypt+0x27c/0x460 [libdes] lr : 0x3f sp : ffff80000ceaa1b0 x29: ffff80000ceaa1b0 x28: ffff0000df056100 x27: ffff0000e51e5280 x26: ffff80004df75030 x25: ffff0000e51e4600 x24: 000000000000003b x23: 0000000000802080 x22: 000000000000003d x21: 0000000000000038 x20: 0000000080000020 x19: 000000000000000a x18: 0000000000000033 x17: ffff0000e51e4780 x16: ffff80004e2d1448 x15: ffff80004e2d1248 x14: ffff0000e51e4680 x13: ffff80004e2d1348 x12: ffff80004e2d1548 x11: ffff80004e2d1848 x10: ffff80004e2d1648 x9 : ffff80004e2d1748 x8 : ffff80004e2d1948 x7 : 000000000bcaf83d x6 : 000000000000001b x5 : ffff80004e2d1048 x4 : 00000000761bf3bf x3 : 000000007f1dd0a3 x2 : ffff0000e51e4780 x1 : ffff0000e3b9a2f8 x0 : 00000000db44e872 Call trace: des3_ede_encrypt+0x27c/0x460 [libdes] crypto_des3_ede_encrypt+0x1c/0x30 [des_generic] crypto_cbc_encrypt+0x148/0x190 crypto_skcipher_encrypt+0x2c/0x40 crypto_authenc_encrypt+0xc8/0xfc [authenc] crypto_aead_encrypt+0x2c/0x40 echainiv_encrypt+0x144/0x1a0 [echainiv] crypto_aead_encrypt+0x2c/0x40 esp6_output_tail+0x1c8/0x5d0 [esp6] esp6_output+0x120/0x278 [esp6] xfrm_output_one+0x458/0x4ec xfrm_output_resume+0x6c/0x1f0 xfrm_output+0xac/0x4ac __xfrm6_output+0x130/0x270 xfrm6_output+0x60/0xec ip6_xmit+0x2ec/0x5bc inet6_csk_xmit+0xbc/0x10c __tcp_transmit_skb+0x460/0x8c0 tcp_write_xmit+0x348/0x890 __tcp_push_pending_frames+0x44/0x110 tcp_rcv_established+0x3c8/0x720 tcp_v6_do_rcv+0xdc/0x4a0 tcp_v6_rcv+0xc24/0xcb0 ip6_protocol_deliver_rcu+0xf0/0x574 ip6_input_finish+0x48/0x7c ip6_input+0x48/0xc0 ip6_rcv_finish+0x80/0x9c xfrm_trans_reinject+0xb0/0xf4 tasklet_action_common.constprop.0+0xf8/0x134 tasklet_action+0x30/0x3c __do_softirq+0x128/0x368 do_softirq+0xb4/0xc0 __local_bh_enable_ip+0xb0/0xb4 put_cpu_fpsimd_context+0x40/0x70 kernel_neon_end+0x20/0x40 sha1_base_do_update.constprop.0.isra.0+0x11c/0x140 [sha1_ce] sha1_ce_finup+0x94/0x110 [sha1_ce] crypto_shash_finup+0x34/0xc0 hmac_finup+0x48/0xe0 crypto_shash_finup+0x34/0xc0 shash_digest_unaligned+0x74/0x90 crypto_shash_digest+0x4c/0x9c shash_ahash_digest+0xc8/0xf0 shash_async_digest+0x28/0x34 crypto_ahash_digest+0x48/0xcc crypto_authenc_genicv+0x88/0xcc [authenc] crypto_authenc_encrypt+0xd8/0xfc [authenc] crypto_aead_encrypt+0x2c/0x40 echainiv_encrypt+0x144/0x1a0 [echainiv] crypto_aead_encrypt+0x2c/0x40 esp6_output_tail+0x1c8/0x5d0 [esp6] esp6_output+0x120/0x278 [esp6] xfrm_output_one+0x458/0x4ec xfrm_output_resume+0x6c/0x1f0 xfrm_output+0xac/0x4ac __xfrm6_output+0x130/0x270 xfrm6_output+0x60/0xec ip6_xmit+0x2ec/0x5bc inet6_csk_xmit+0xbc/0x10c __tcp_transmit_skb+0x460/0x8c0 tcp_write_xmit+0x348/0x890 __tcp_push_pending_frames+0x44/0x110 tcp_push+0xb4/0x14c tcp_sendmsg_locked+0x71c/0xb64 tcp_sendmsg+0x40/0x6c inet6_sendmsg+0x4c/0x80 sock_sendmsg+0x5c/0x6c __sys_sendto+0x128/0x15c __arm64_sys_sendto+0x30/0x40 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x170/0x194 do_el0_svc+0x38/0x4c el0_svc+0x28/0xe0 el0t_64_sync_handler+0xbc/0x13c el0t_64_sync+0x180/0x184 Get softirq info by bcc tool: ./softirqs -NT 10 Tracing soft irq event time... Hit Ctrl-C to end. 15:34:34 SOFTIRQ TOTAL_nsecs block 158990 timer 20030920 sched 46577080 net_rx 676746820 tasklet 9906067650 15:34:45 SOFTIRQ TOTAL_nsecs block 86100 sched 38849790 net_rx 676532470 timer 1163848790 tasklet 9409019620 15:34:55 SOFTIRQ TOTAL_nsecs sched 58078450 net_rx 475156720 timer 533832410 tasklet 9431333300 The tasklet software interrupt takes too much time. Therefore, the xfrm_trans_reinject executor is changed from tasklet to workqueue. Add add spin lock to protect the queue. This reduces the processing flow of the tcp_sendmsg function in this scenario. Fixes: acf568ee859f0 ("xfrm: Reinject transport-mode packets through tasklet") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31net: Fix data-races around netdev_max_backlog.Kuniyuki Iwashima2-2/+2
[ Upstream commit 5dcd08cd19912892586c6082d56718333e2d19db ] While reading netdev_max_backlog, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. While at it, we remove the unnecessary spaces in the doc. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31xfrm: policy: fix metadata dst->dev xmit null pointer dereferenceNikolay Aleksandrov1-1/+1
[ Upstream commit 17ecd4a4db4783392edd4944f5e8268205083f70 ] When we try to transmit an skb with metadata_dst attached (i.e. dst->dev == NULL) through xfrm interface we can hit a null pointer dereference[1] in xfrmi_xmit2() -> xfrm_lookup_with_ifid() due to the check for a loopback skb device when there's no policy which dereferences dst->dev unconditionally. Not having dst->dev can be interepreted as it not being a loopback device, so just add a check for a null dst_orig->dev. With this fix xfrm interface's Tx error counters go up as usual. [1] net-next calltrace captured via netconsole: BUG: kernel NULL pointer dereference, address: 00000000000000c0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 1 PID: 7231 Comm: ping Kdump: loaded Not tainted 5.19.0+ #24 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 RIP: 0010:xfrm_lookup_with_ifid+0x5eb/0xa60 Code: 8d 74 24 38 e8 26 a4 37 00 48 89 c1 e9 12 fc ff ff 49 63 ed 41 83 fd be 0f 85 be 01 00 00 41 be ff ff ff ff 45 31 ed 48 8b 03 <f6> 80 c0 00 00 00 08 75 0f 41 80 bc 24 19 0d 00 00 01 0f 84 1e 02 RSP: 0018:ffffb0db82c679f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffd0db7fcad430 RCX: ffffb0db82c67a10 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb0db82c67a80 RBP: ffffb0db82c67a80 R08: ffffb0db82c67a14 R09: 0000000000000000 R10: 0000000000000000 R11: ffff8fa449667dc8 R12: ffffffff966db880 R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000000 FS: 00007ff35c83f000(0000) GS:ffff8fa478480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c0 CR3: 000000001ebb7000 CR4: 0000000000350ee0 Call Trace: <TASK> xfrmi_xmit+0xde/0x460 ? tcf_bpf_act+0x13d/0x2a0 dev_hard_start_xmit+0x72/0x1e0 __dev_queue_xmit+0x251/0xd30 ip_finish_output2+0x140/0x550 ip_push_pending_frames+0x56/0x80 raw_sendmsg+0x663/0x10a0 ? try_charge_memcg+0x3fd/0x7a0 ? __mod_memcg_lruvec_state+0x93/0x110 ? sock_sendmsg+0x30/0x40 sock_sendmsg+0x30/0x40 __sys_sendto+0xeb/0x130 ? handle_mm_fault+0xae/0x280 ? do_user_addr_fault+0x1e7/0x680 ? kvm_read_and_reset_apf_flags+0x3b/0x50 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7ff35cac1366 Code: eb 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89 RSP: 002b:00007fff738e4028 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007fff738e57b0 RCX: 00007ff35cac1366 RDX: 0000000000000040 RSI: 0000557164e4b450 RDI: 0000000000000003 RBP: 0000557164e4b450 R08: 00007fff738e7a2c R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 R13: 00007fff738e5770 R14: 00007fff738e4030 R15: 0000001d00000001 </TASK> Modules linked in: netconsole veth br_netfilter bridge bonding virtio_net [last unloaded: netconsole] CR2: 00000000000000c0 CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Daniel Borkmann <daniel@iogearbox.net> Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31xfrm: clone missing x->lastused in xfrm_do_migrateAntony Antony1-0/+1
[ Upstream commit 6aa811acdb76facca0b705f4e4c1d948ccb6af8b ] x->lastused was not cloned in xfrm_do_migrate. Add it to clone during migrate. Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31xfrm: fix refcount leak in __xfrm_policy_check()Xin Xiong1-0/+1
[ Upstream commit 9c9cb23e00ddf45679b21b4dacc11d1ae7961ebe ] The issue happens on an error path in __xfrm_policy_check(). When the fetching process of the object `pols[1]` fails, the function simply returns 0, forgetting to decrement the reference count of `pols[0]`, which is incremented earlier by either xfrm_sk_policy_lookup() or xfrm_policy_lookup(). This may result in memory leaks. Fix it by decreasing the reference count of `pols[0]` in that path. Fixes: 134b0fc544ba ("IPsec: propagate security module errors up from flow_cache_lookup") Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-29ip: Fix data-races around sysctl_ip_no_pmtu_disc.Kuniyuki Iwashima1-1/+1
[ Upstream commit 0968d2a441bf6afb551fd99e60fa65ed67068963 ] While reading sysctl_ip_no_pmtu_disc, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-29xfrm: xfrm_policy: fix a possible double xfrm_pols_put() in xfrm_bundle_lookup()Hangyu Hua1-1/+4
[ Upstream commit f85daf0e725358be78dfd208dea5fd665d8cb901 ] xfrm_policy_lookup() will call xfrm_pol_hold_rcu() to get a refcount of pols[0]. This refcount can be dropped in xfrm_expand_policies() when xfrm_expand_policies() return error. pols[0]'s refcount is balanced in here. But xfrm_bundle_lookup() will also call xfrm_pols_put() with num_pols == 1 to drop this refcount when xfrm_expand_policies() return error. This patch also fix an illegal address access. pols[0] will save a error point when xfrm_policy_lookup fails. This lead to xfrm_pols_put to resolve an illegal address in xfrm_bundle_lookup's error path. Fix these by setting num_pols = 0 in xfrm_expand_policies()'s error path. Fixes: 80c802f3073e ("xfrm: cache bundles instead of policies for outgoing flows") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25xfrm: rework default policy structureNicolas Dichtel2-28/+25
[ Upstream commit b58b1f563ab78955d37e9e43e02790a85c66ac05 ] This is a follow up of commit f8d858e607b2 ("xfrm: make user policy API complete"). The goal is to align userland API to the internal structures. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08xfrm: fix tunnel model fragmentation behaviorLina Wang1-1/+4
[ Upstream commit 4ff2980b6bd2aa6b4ded3ce3b7c0ccfab29980af ] in tunnel mode, if outer interface(ipv4) is less, it is easily to let inner IPV6 mtu be less than 1280. If so, a Packet Too Big ICMPV6 message is received. When send again, packets are fragmentized with 1280, they are still rejected with ICMPV6(Packet Too Big) by xfrmi_xmit2(). According to RFC4213 Section3.2.2: if (IPv4 path MTU - 20) is less than 1280 if packet is larger than 1280 bytes Send ICMPv6 "packet too big" with MTU=1280 Drop packet else Encapsulate but do not set the Don't Fragment flag in the IPv4 header. The resulting IPv4 packet might be fragmented by the IPv4 layer on the encapsulator or by some router along the IPv4 path. endif else if packet is larger than (IPv4 path MTU - 20) Send ICMPv6 "packet too big" with MTU = (IPv4 path MTU - 20). Drop packet. else Encapsulate and set the Don't Fragment flag in the IPv4 header. endif endif Packets should be fragmentized with ipv4 outer interface, so change it. After it is fragemtized with ipv4, there will be double fragmenation. No.48 & No.51 are ipv6 fragment packets, No.48 is double fragmentized, then tunneled with IPv4(No.49& No.50), which obey spec. And received peer cannot decrypt it rightly. 48 2002::10 2002::11 1296(length) IPv6 fragment (off=0 more=y ident=0xa20da5bc nxt=50) 49 0x0000 (0) 2002::10 2002::11 1304 IPv6 fragment (off=0 more=y ident=0x7448042c nxt=44) 50 0x0000 (0) 2002::10 2002::11 200 ESP (SPI=0x00035000) 51 2002::10 2002::11 180 Echo (ping) request 52 0x56dc 2002::10 2002::11 248 IPv6 fragment (off=1232 more=n ident=0xa20da5bc nxt=50) xfrm6_noneed_fragment has fixed above issues. Finally, it acted like below: 1 0x6206 192.168.1.138 192.168.1.1 1316 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6206) [Reassembled in #2] 2 0x6206 2002::10 2002::11 88 IPv6 fragment (off=0 more=y ident=0x1f440778 nxt=50) 3 0x0000 2002::10 2002::11 248 ICMPv6 Echo (ping) request Signed-off-by: Lina Wang <lina.wang@mediatek.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-19xfrm: Fix xfrm migrate issues when address family changesYan Yan1-3/+5
[ Upstream commit e03c3bba351f99ad932e8f06baa9da1afc418e02 ] xfrm_migrate cannot handle address family change of an xfrm_state. The symptons are the xfrm_state will be migrated to a wrong address, and sending as well as receiving packets wil be broken. This commit fixes it by breaking the original xfrm_state_clone method into two steps so as to update the props.family before running xfrm_init_state. As the result, xfrm_state's inner mode, outer mode, type and IP header length in xfrm_state_migrate can be updated with the new address family. Tested with additions to Android's kernel unit test suite: https://android-review.googlesource.com/c/kernel/tests/+/1885354 Signed-off-by: Yan Yan <evitayan@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-19xfrm: Check if_id in xfrm_migrateYan Yan3-8/+19
[ Upstream commit c1aca3080e382886e2e58e809787441984a2f89b ] This patch enables distinguishing SAs and SPs based on if_id during the xfrm_migrate flow. This ensures support for xfrm interfaces throughout the SA/SP lifecycle. When there are multiple existing SPs with the same direction, the same xfrm_selector and different endpoint addresses, xfrm_migrate might fail with ENODATA. Specifically, the code path for performing xfrm_migrate is: Stage 1: find policy to migrate with xfrm_migrate_policy_find(sel, dir, type, net) Stage 2: find and update state(s) with xfrm_migrate_state_find(mp, net) Stage 3: update endpoint address(es) of template(s) with xfrm_policy_migrate(pol, m, num_migrate) Currently "Stage 1" always returns the first xfrm_policy that matches, and "Stage 3" looks for the xfrm_tmpl that matches the old endpoint address. Thus if there are multiple xfrm_policy with same selector, direction, type and net, "Stage 1" might rertun a wrong xfrm_policy and "Stage 3" will fail with ENODATA because it cannot find a xfrm_tmpl with the matching endpoint address. The fix is to allow userspace to pass an if_id and add if_id to the matching rule in Stage 1 and Stage 2 since if_id is a unique ID for xfrm_policy and xfrm_state. For compatibility, if_id will only be checked if the attribute is set. Tested with additions to Android's kernel unit test suite: https://android-review.googlesource.com/c/kernel/tests/+/1668886 Signed-off-by: Yan Yan <evitayan@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-19Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"Kai Lueke1-18/+3
commit a3d9001b4e287fc043e5539d03d71a32ab114bcb upstream. This reverts commit 68ac0f3810e76a853b5f7b90601a05c3048b8b54 because ID 0 was meant to be used for configuring the policy/state without matching for a specific interface (e.g., Cilium is affected, see https://github.com/cilium/cilium/pull/18789 and https://github.com/cilium/cilium/pull/19019). Signed-off-by: Kai Lueke <kailueke@linux.microsoft.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6"Jiri Bohac1-12/+2
commit a6d95c5a628a09be129f25d5663a7e9db8261f51 upstream. This reverts commit b515d2637276a3810d6595e10ab02c13bfd0b63a. Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: xfrm_state_mtu should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS calculation in ipsec transport mode, resulting complete stalls of TCP connections. This happens when the (P)MTU is 1280 or slighly larger. The desired formula for the MSS is: MSS = (MTU - ESP_overhead) - IP header - TCP header However, the above commit clamps the (MTU - ESP_overhead) to a minimum of 1280, turning the formula into MSS = max(MTU - ESP overhead, 1280) - IP header - TCP header With the (P)MTU near 1280, the calculated MSS is too large and the resulting TCP packets never make it to the destination because they are over the actual PMTU. The above commit also causes suboptimal double fragmentation in xfrm tunnel mode, as described in https://lore.kernel.org/netdev/20210429202529.codhwpc7w6kbudug@dwarf.suse.cz/ The original problem the above commit was trying to fix is now fixed by commit 6596a0229541270fb8d38d989f91b78838e5e9da ("xfrm: fix MTU regression"). Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08xfrm: enforce validity of offload input flagsLeon Romanovsky1-1/+5
commit 7c76ecd9c99b6e9a771d813ab1aa7fa428b3ade1 upstream. struct xfrm_user_offload has flags variable that received user input, but kernel didn't check if valid bits were provided. It caused a situation where not sanitized input was forwarded directly to the drivers. For example, XFRM_OFFLOAD_IPV6 define that was exposed, was used by strongswan, but not implemented in the kernel at all. As a solution, check and sanitize input flags to forward XFRM_OFFLOAD_INBOUND to the drivers. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08xfrm: fix the if_id check in changelinkAntony Antony1-1/+1
commit 6d0d95a1c2b07270870e7be16575c513c29af3f1 upstream. if_id will be always 0, because it was not yet initialized. Fixes: 8dce43919566 ("xfrm: interface with if_id 0 should return error") Reported-by: Pavel Machek <pavel@denx.de> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27xfrm: Don't accidentally set RTO_ONLINK in decode_session4()Guillaume Nault1-1/+2
commit 23e7b1bfed61e301853b5e35472820d919498278 upstream. Similar to commit 94e2238969e8 ("xfrm4: strip ECN bits from tos field"), clear the ECN bits from iph->tos when setting ->flowi4_tos. This ensures that the last bit of ->flowi4_tos is cleared, so ip_route_output_key_hash() isn't going to restrict the scope of the route lookup. Use ~INET_ECN_MASK instead of IPTOS_RT_MASK, because we have no reason to clear the high order bits. Found by code inspection, compile tested only. Fixes: 4da3089f2b58 ("[IPSEC]: Use TOS when doing tunnel lookups") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27xfrm: fix policy lookup for ipv6 gre packetsGhalem Boudour1-0/+21
commit bcf141b2eb551b3477b24997ebc09c65f117a803 upstream. On egress side, xfrm lookup is called from __gre6_xmit() with the fl6_gre_key field not initialized leading to policies selectors check failure. Consequently, gre packets are sent without encryption. On ingress side, INET6_PROTO_NOPOLICY was set, thus packets were not checked against xfrm policies. Like for egress side, fl6_gre_key should be correctly set, this is now done in decode_session6(). Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Cc: stable@vger.kernel.org Signed-off-by: Ghalem Boudour <ghalem.boudour@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27xfrm: rate limit SA mapping change message to user spaceAntony Antony3-4/+43
[ Upstream commit 4e484b3e969b52effd95c17f7a86f39208b2ccf4 ] Kernel generates mapping change message, XFRM_MSG_MAPPING, when a source port chage is detected on a input state with UDP encapsulation set. Kernel generates a message for each IPsec packet with new source port. For a high speed flow per packet mapping change message can be excessive, and can overload the user space listener. Introduce rate limiting for XFRM_MSG_MAPPING message to the user space. The rate limiting is configurable via netlink, when adding a new SA or updating it. Use the new attribute XFRMA_MTIMER_THRESH in seconds. v1->v2 change: update xfrm_sa_len() v2->v3 changes: use u32 insted unsigned long to reduce size of struct xfrm_state fix xfrm_ompat size Reported-by: kernel test robot <lkp@intel.com> accept XFRM_MSG_MAPPING only when XFRMA_ENCAP is present Co-developed-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27net/xfrm: IPsec tunnel mode fix inner_ipproto setting in sec_pathRaed Salem1-5/+25
[ Upstream commit 45a98ef4922def8c679ca7c454403d1957fe70e7 ] The inner_ipproto saves the inner IP protocol of the plain text packet. This allows vendor's IPsec feature making offload decision at skb's features_check and configuring hardware at ndo_start_xmit, current code implenetation did not handle the case where IPsec is used in tunnel mode. Fix by handling the case when IPsec is used in tunnel mode by reading the protocol of the plain text packet IP protocol. Fixes: fa4535238fb5 ("net/xfrm: Add inner_ipproto into sec_path") Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27xfrm: state and policy should fail if XFRMA_IF_ID 0Antony Antony1-3/+18
[ Upstream commit 68ac0f3810e76a853b5f7b90601a05c3048b8b54 ] xfrm ineterface does not allow xfrm if_id = 0 fail to create or update xfrm state and policy. With this commit: ip xfrm policy add src 192.0.2.1 dst 192.0.2.2 dir out if_id 0 RTNETLINK answers: Invalid argument ip xfrm state add src 192.0.2.1 dst 192.0.2.2 proto esp spi 1 \ reqid 1 mode tunnel aead 'rfc4106(gcm(aes))' \ 0x1111111111111111111111111111111111111111 96 if_id 0 RTNETLINK answers: Invalid argument v1->v2 change: - add Fixes: tag Fixes: 9f8550e4bd9d ("xfrm: fix disable_xfrm sysctl when used on xfrm interfaces") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27xfrm: interface with if_id 0 should return errorAntony Antony1-2/+12
[ Upstream commit 8dce43919566f06e865f7e8949f5c10d8c2493f5 ] xfrm interface if_id = 0 would cause xfrm policy lookup errors since Commit 9f8550e4bd9d. Now explicitly fail to create an xfrm interface when if_id = 0 With this commit: ip link add ipsec0 type xfrm dev lo if_id 0 Error: if_id must be non zero. v1->v2 change: - add Fixes: tag Fixes: 9f8550e4bd9d ("xfrm: fix disable_xfrm sysctl when used on xfrm interfaces") Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27xfrm: fix a small bug in xfrm_sa_len()Eric Dumazet1-1/+1
[ Upstream commit 7770a39d7c63faec6c4f33666d49a8cb664d0482 ] copy_user_offload() will actually push a struct struct xfrm_user_offload, which is different than (struct xfrm_state *)->xso (struct xfrm_state_offload) Fixes: d77e38e612a01 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-23xfrm: fix rcu lock in xfrm_notify_userpolicy()Nicolas Dichtel1-1/+6
As stated in the comment above xfrm_nlmsg_multicast(), rcu read lock must be held before calling this function. Reported-by: syzbot+3d9866419b4aa8f985d6@syzkaller.appspotmail.com Fixes: 703b94b93c19 ("xfrm: notify default policy on update") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-09-15xfrm: notify default policy on updateNicolas Dichtel1-0/+31
This configuration knob is very sensible, it should be notified when changing. Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-09-15xfrm: make user policy API completeNicolas Dichtel1-17/+19
>From a userland POV, this API was based on some magic values: - dirmask and action were bitfields but meaning of bits (XFRM_POL_DEFAULT_*) are not exported; - action is confusing, if a bit is set, does it mean drop or accept? Let's try to simplify this uapi by using explicit field and macros. Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-09-09net: xfrm: fix shift-out-of-bounds in xfrm_get_defaultPavel Skripkin1-0/+5
Syzbot hit shift-out-of-bounds in xfrm_get_default. The problem was in missing validation check for user data. up->dirmask comes from user-space, so we need to check if this value is less than XFRM_USERPOLICY_DIRMASK_MAX to avoid shift-out-of-bounds bugs. Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy") Reported-and-tested-by: syzbot+b2be9dd8ca6f6c73ee2d@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-08-27Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/David S. Miller2-0/+73
ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2021-08-27 1) Remove an unneeded extra variable in esp4 esp_ssg_unref. From Corey Minyard. 2) Add a configuration option to change the default behaviour to block traffic if there is no matching policy. Joint work with Christian Langrock and Antony Antony. 3) Fix a shift-out-of-bounce bug reported from syzbot. From Pavel Skripkin. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-04Merge branch 'master' of ↵David S. Miller4-26/+67
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-08-04 1) Fix a sysbot reported memory leak in xfrm_user_rcv_msg. From Pavel Skripkin. 2) Revert "xfrm: policy: Read seqcount outside of rcu-read side in xfrm_policy_lookup_bytype". This commit tried to fix a lockin bug, but only cured some of the symptoms. A proper fix is applied on top of this revert. 3) Fix a locking bug on xfrm state hash resize. A recent change on sequence counters accidentally repaced a spinlock by a mutex. Fix from Frederic Weisbecker. 4) Fix possible user-memory-access in xfrm_user_rcv_msg_compat(). From Dmitry Safonov. 5) Add initialiation sefltest fot xfrm_spdattr_type_t. From Dmitry Safonov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29net: xfrm: fix shift-out-of-bouncePavel Skripkin1-1/+6
We need to check up->dirmask to avoid shift-out-of-bounce bug, since up->dirmask comes from userspace. Also, added XFRM_USERPOLICY_DIRMASK_MAX constant to uapi to inform user-space that up->dirmask has maximum possible value Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy") Reported-and-tested-by: syzbot+9cd5837a045bbee5b810@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-07-26net: xfrm: Fix end of loop tests for list_for_each_entryHarshvardhan Jha1-1/+1
The list_for_each_entry() iterator, "pos" in this code, can never be NULL so the warning will never be printed. Signed-off-by: Harshvardhan Jha <harshvardhan.jha@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-07-21net/xfrm/compat: Copy xfrm_spdattr_type_t atributesDmitry Safonov1-5/+44
The attribute-translator has to take in mind maxtype, that is xfrm_link::nla_max. When it is set, attributes are not of xfrm_attr_type_t. Currently, they can be only XFRMA_SPD_MAX (message XFRM_MSG_NEWSPDINFO), their UABI is the same for 64/32-bit, so just copy them. Thanks to YueHaibing for reporting this: In xfrm_user_rcv_msg_compat() if maxtype is not zero and less than XFRMA_MAX, nlmsg_parse_deprecated() do not initialize attrs array fully. xfrm_xlate32() will access uninit 'attrs[i]' while iterating all attrs array. KASAN: probably user-memory-access in range [0x0000000041b58ab0-0x0000000041b58ab7] CPU: 0 PID: 15799 Comm: syz-executor.2 Tainted: G W 5.14.0-rc1-syzkaller #0 RIP: 0010:nla_type include/net/netlink.h:1130 [inline] RIP: 0010:xfrm_xlate32_attr net/xfrm/xfrm_compat.c:410 [inline] RIP: 0010:xfrm_xlate32 net/xfrm/xfrm_compat.c:532 [inline] RIP: 0010:xfrm_user_rcv_msg_compat+0x5e5/0x1070 net/xfrm/xfrm_compat.c:577 [...] Call Trace: xfrm_user_rcv_msg+0x556/0x8b0 net/xfrm/xfrm_user.c:2774 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 xfrm_netlink_rcv+0x6b/0x90 net/xfrm/xfrm_user.c:2824 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:702 [inline] Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator") Cc: <stable@kernel.org> Reported-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-07-21xfrm: Add possibility to set the default to block if we have no policySteffen Klassert2-0/+68
As the default we assume the traffic to pass, if we have no matching IPsec policy. With this patch, we have a possibility to change this default from allow to block. It can be configured via netlink. Each direction (input/output/forward) can be configured separately. With the default to block configuered, we need allow policies for all packet flows we accept. We do not use default policy lookup for the loopback device. v1->v2 - fix compiling when XFRM is disabled - Reported-by: kernel test robot <lkp@intel.com> Co-developed-by: Christian Langrock <christian.langrock@secunet.com> Signed-off-by: Christian Langrock <christian.langrock@secunet.com> Co-developed-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-07-02xfrm: Fix RCU vs hash_resize_mutex lock inversionFrederic Weisbecker1-9/+8
xfrm_bydst_resize() calls synchronize_rcu() while holding hash_resize_mutex. But then on PREEMPT_RT configurations, xfrm_policy_lookup_bytype() may acquire that mutex while running in an RCU read side critical section. This results in a deadlock. In fact the scope of hash_resize_mutex is way beyond the purpose of xfrm_policy_lookup_bytype() to just fetch a coherent and stable policy for a given destination/direction, along with other details. The lower level net->xfrm.xfrm_policy_lock, which among other things protects per destination/direction references to policy entries, is enough to serialize and benefit from priority inheritance against the write side. As a bonus, it makes it officially a per network namespace synchronization business where a policy table resize on namespace A shouldn't block a policy lookup on namespace B. Fixes: 77cc278f7b20 (xfrm: policy: Use sequence counters with associated lock) Cc: stable@vger.kernel.org Cc: Ahmed S. Darwish <a.darwish@linutronix.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Varad Gautam <varad.gautam@suse.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-07-02Revert "xfrm: policy: Read seqcount outside of rcu-read side in ↵Steffen Klassert1-14/+7
xfrm_policy_lookup_bytype" This reverts commit d7b0408934c749f546b01f2b33d07421a49b6f3e. This commit tried to fix a locking bug introduced by commit 77cc278f7b20 ("xfrm: policy: Use sequence counters with associated lock"). As it turned out, this patch did not really fix the bug. A proper fix for this bug is applied on top of this revert. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-07-01Merge tag 'net-next-5.14' of ↵Linus Torvalds8-119/+329
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - BPF: - add syscall program type and libbpf support for generating instructions and bindings for in-kernel BPF loaders (BPF loaders for BPF), this is a stepping stone for signed BPF programs - infrastructure to migrate TCP child sockets from one listener to another in the same reuseport group/map to improve flexibility of service hand-off/restart - add broadcast support to XDP redirect - allow bypass of the lockless qdisc to improving performance (for pktgen: +23% with one thread, +44% with 2 threads) - add a simpler version of "DO_ONCE()" which does not require jump labels, intended for slow-path usage - virtio/vsock: introduce SOCK_SEQPACKET support - add getsocketopt to retrieve netns cookie - ip: treat lowest address of a IPv4 subnet as ordinary unicast address allowing reclaiming of precious IPv4 addresses - ipv6: use prandom_u32() for ID generation - ip: add support for more flexible field selection for hashing across multi-path routes (w/ offload to mlxsw) - icmp: add support for extended RFC 8335 PROBE (ping) - seg6: add support for SRv6 End.DT46 behavior - mptcp: - DSS checksum support (RFC 8684) to detect middlebox meddling - support Connection-time 'C' flag - time stamping support - sctp: packetization Layer Path MTU Discovery (RFC 8899) - xfrm: speed up state addition with seq set - WiFi: - hidden AP discovery on 6 GHz and other HE 6 GHz improvements - aggregation handling improvements for some drivers - minstrel improvements for no-ack frames - deferred rate control for TXQs to improve reaction times - switch from round robin to virtual time-based airtime scheduler - add trace points: - tcp checksum errors - openvswitch - action execution, upcalls - socket errors via sk_error_report Device APIs: - devlink: add rate API for hierarchical control of max egress rate of virtual devices (VFs, SFs etc.) - don't require RCU read lock to be held around BPF hooks in NAPI context - page_pool: generic buffer recycling New hardware/drivers: - mobile: - iosm: PCIe Driver for Intel M.2 Modem - support for Qualcomm MSM8998 (ipa) - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU) - NXP SJA1110 Automotive Ethernet 10-port switch - Qualcomm QCA8327 switch support (qca8k) - Mikrotik 10/25G NIC (atl1c) Driver changes: - ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP (our first foray into MAC/PHY description via ACPI) - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx - Mellanox/Nvidia NIC (mlx5) - NIC VF offload of L2 bridging - support IRQ distribution to Sub-functions - Marvell (prestera): - add flower and match all - devlink trap - link aggregation - Netronome (nfp): connection tracking offload - Intel 1GE (igc): add AF_XDP support - Marvell DPU (octeontx2): ingress ratelimit offload - Google vNIC (gve): new ring/descriptor format support - Qualcomm mobile (rmnet & ipa): inline checksum offload support - MediaTek WiFi (mt76) - mt7915 MSI support - mt7915 Tx status reporting - mt7915 thermal sensors support - mt7921 decapsulation offload - mt7921 enable runtime pm and deep sleep - Realtek WiFi (rtw88) - beacon filter support - Tx antenna path diversity support - firmware crash information via devcoredump - Qualcomm WiFi (wcn36xx) - Wake-on-WLAN support with magic packets and GTK rekeying - Micrel PHY (ksz886x/ksz8081): add cable test support" * tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits) tcp: change ICSK_CA_PRIV_SIZE definition tcp_yeah: check struct yeah size at compile time gve: DQO: Fix off by one in gve_rx_dqo() stmmac: intel: set PCI_D3hot in suspend stmmac: intel: Enable PHY WOL option in EHL net: stmmac: option to enable PHY WOL with PMT enabled net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del} net: use netdev_info in ndo_dflt_fdb_{add,del} ptp: Set lookup cookie when creating a PTP PPS source. net: sock: add trace for socket errors net: sock: introduce sk_error_report net: dsa: replay the local bridge FDB entries pointing to the bridge dev too net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev net: dsa: include fdb entries pointing to bridge in the host fdb list net: dsa: include bridge addresses which are local in the host fdb list net: dsa: sync static FDB entries on foreign interfaces to hardware net: dsa: install the host MDB and FDB entries in the master's RX filter net: dsa: reference count the FDB addresses at the cross-chip notifier level net: dsa: introduce a separate cross-chip notifier type for host FDBs net: dsa: reference count the MDB entries at the cross-chip notifier level ...
2021-07-01Merge tag 'selinux-pr-20210629' of ↵Linus Torvalds1-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull SELinux updates from Paul Moore: - The slow_avc_audit() function is now non-blocking so we can remove the AVC_NONBLOCKING tricks; this also includes the 'flags' variant of avc_has_perm(). - Use kmemdup() instead of kcalloc()+copy when copying parts of the SELinux policydb. - The InfiniBand device name is now passed by reference when possible in the SELinux code, removing a strncpy(). - Minor cleanups including: constification of avtab function args, removal of useless LSM/XFRM function args, SELinux kdoc fixes, and removal of redundant assignments. * tag 'selinux-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: kill 'flags' argument in avc_has_perm_flags() and avc_audit() selinux: slow_avc_audit has become non-blocking selinux: Fix kernel-doc selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC lsm_audit,selinux: pass IB device name by reference selinux: Remove redundant assignment to rc selinux: Corrected comment to match kernel-doc comment selinux: delete selinux_xfrm_policy_lookup() useless argument selinux: constify some avtab function arguments selinux: simplify duplicate_policydb_cond_list() by using kmemdup()
2021-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-30/+41
Trivial conflict in net/netfilter/nf_tables_api.c. Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version. skmsg, and L4 bpf - keep the bpf code but remove the flags and err params. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-29net: xfrm: fix memory leak in xfrm_user_rcv_msgPavel Skripkin1-0/+10
Syzbot reported memory leak in xfrm_user_rcv_msg(). The problem was is non-freed skb's frag_list. In skb_release_all() skb_release_data() will be called only in case of skb->head != NULL, but netlink_skb_destructor() sets head to NULL. So, allocated frag_list skb should be freed manualy, since consume_skb() won't take care of it Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator") Reported-and-tested-by: syzbot+fb347cf82c73a90efcca@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-28Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/gitDavid S. Miller6-88/+248
/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2021-06-28 1) Remove an unneeded error assignment in esp4_gro_receive(). From Yang Li. 2) Add a new byseq state hashtable to find acquire states faster. From Sabrina Dubroca. 3) Remove some unnecessary variables in pfkey_create(). From zuoqilin. 4) Remove the unused description from xfrm_type struct. From Florian Westphal. 5) Fix a spelling mistake in the comment of xfrm_state_ok(). From gushengxian. 6) Replace hdr_off indirections by a small helper function. From Florian Westphal. 7) Remove xfrm4_output_finish and xfrm6_output_finish declarations, they are not used anymore.From Antony Antony. 8) Remove xfrm replay indirections. From Florian Westphal. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23Merge branch 'master' of ↵David S. Miller5-30/+41
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-06-23 1) Don't return a mtu smaller than 1280 on IPv6 pmtu discovery. From Sabrina Dubroca 2) Fix seqcount rcu-read side in xfrm_policy_lookup_bytype for the PREEMPT_RT case. From Varad Gautam. 3) Remove a repeated declaration of xfrm_parse_spi. From Shaokun Zhang. 4) IPv4 beet mode can't handle fragments, but IPv6 does. commit 68dc022d04eb ("xfrm: BEET mode doesn't support fragments for inner packets") handled IPv4 and IPv6 the same way. Relax the check for IPv6 because fragments are possible here. From Xin Long. 5) Memory allocation failures are not reported for XFRMA_ENCAP and XFRMA_COADDR in xfrm_state_construct. Fix this by moving both cases in front of the function. 6) Fix a missing initialization in the xfrm offload fallback fail case for bonding devices. From Ayush Sawal. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23net/xfrm: Add inner_ipproto into sec_pathHuy Nguyen1-1/+40
The inner_ipproto saves the inner IP protocol of the plain text packet. This allows vendor's IPsec feature making offload decision at skb's features_check and configuring hardware at ndo_start_xmit. For example, ConnectX6-DX IPsec device needs the plaintext's IP protocol to support partial checksum offload on VXLAN/GENEVE packet over IPsec transport mode tunnel. Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Huy Nguyen <huyn@nvidia.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-22xfrm: Fix xfrm offload fallback fail caseAyush Sawal1-0/+1
In case of xfrm offload, if xdo_dev_state_add() of driver returns -EOPNOTSUPP, xfrm offload fallback is failed. In xfrm state_add() both xso->dev and xso->real_dev are initialized to dev and when err(-EOPNOTSUPP) is returned only xso->dev is set to null. So in this scenario the condition in func validate_xmit_xfrm(), if ((x->xso.dev != dev) && (x->xso.real_dev == dev)) return skb; returns true, due to which skb is returned without calling esp_xmit() below which has fallback code. Hence the CRYPTO_FALLBACK is failing. So fixing this with by keeping x->xso.real_dev as NULL when err is returned in func xfrm_dev_state_add(). Fixes: bdfd2d1fa79a ("bonding/xfrm: use real_dev instead of slave_dev") Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: remove last replay indirectionFlorian Westphal2-26/+27
This replaces the overflow indirection with the new xfrm_replay_overflow helper. After this, the 'repl' pointer in xfrm_state is no longer needed and can be removed as well. xfrm_replay_overflow() is added in two incarnations, one is used when the kernel is compiled with xfrm hardware offload support enabled, the other when its disabled. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: avoid replay indirectionFlorian Westphal2-10/+19
Add and use xfrm_replay_check helper instead of indirection. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: remove recheck indirectionFlorian Westphal2-7/+17
Adds new xfrm_replay_recheck() helper and calls it from xfrm input path instead of the indirection. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: remove advance indirectionFlorian Westphal2-10/+16
Similar to other patches: add a new helper to avoid an indirection. v2: fix 'net/xfrm/xfrm_replay.c:519:13: warning: 'seq' may be used uninitialized in this function' warning. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: avoid xfrm replay notify indirectionFlorian Westphal2-18/+29
replay protection is implemented using a callback structure and then called via x->repl->notify(), x->repl->recheck(), and so on. all the differect functions are always built-in, so this could be direct calls instead. This first patch prepares for removal of the x->repl structure. Add an enum with the three available replay modes to the xfrm_state structure and then replace all x->repl->notify() calls by the new xfrm_replay_notify() helper. The helper checks the enum internally to adapt behaviour as needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-16xfrm: avoid compiler warning when ipv6 is disabledFlorian Westphal1-0/+2
with CONFIG_IPV6=n: xfrm_output.c:140:12: warning: 'xfrm6_hdr_offset' defined but not used Fixes: 9acf4d3b9ec1 ("xfrm: ipv6: add xfrm6_hdr_offset helper") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>