summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2024-01-08svcrdma: Add back svc_rdma_recv_ctxt::rc_pagesChuck Lever2-1/+8
Having an nfsd thread waiting for an RDMA Read completion is problematic if the Read responder (the client) stops responding. We need to go back to handling RDMA Reads by allowing the nfsd thread to return to the svc scheduler, then waking a second thread finish the RPC message once the Read completion fires. To start with, restore the rc_pages field so that RDMA Read pages can be managed across calls to svc_rdma_recvfrom(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Clean up comment in svc_rdma_accept()Chuck Lever1-7/+10
The comment that starts "Qualify ..." applies to only some of the following code paragraph. Re-arrange the lines so the comment makes more sense. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Remove queue-shortening warningsChuck Lever1-6/+1
These won't have much diagnostic value for site administrators. Since they can't be disabled, they become noise. What's more, the subsequent rdma_create_qp() call adjusts the Send Queue size (possibly downward) without warning, making the size reported by these pr_warns inaccurate. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Remove pointer addresses shown in dprintk()Chuck Lever1-3/+1
There are a couple of dprintk() call sites in svc_rdma_accept() that show pointer addresses. These days, displayed pointer addresses are hashed and thus have little or no diagnostic value, especially for site administrators. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Optimize svc_rdma_cc_init()Chuck Lever3-6/+7
The atomic_inc_return() in svc_rdma_send_cid_init() is expensive. Some svc_rdma_chunk_ctxt's now reside in long-lived container structures. They don't need a fresh completion ID for every I/O operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: De-duplicate completion ID initialization helpersChuck Lever3-22/+1
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Move the svc_rdma_cc_init() callChuck Lever2-3/+9
Now that the chunk_ctxt for Reads is no longer dynamically allocated it can be initialized once for the life of the object that contains it (struct svc_rdma_recv_ctxt). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Remove struct svc_rdma_read_infoChuck Lever1-29/+0
The remaining fields of struct svc_rdma_read_info are no longer referenced. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update the synopsis of svc_rdma_read_special()Chuck Lever1-10/+9
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_read_special() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update the synopsis of svc_rdma_read_call_chunk()Chuck Lever1-13/+11
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_read_call_chunk() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update synopsis of svc_rdma_read_multiple_chunks()Chuck Lever1-10/+9
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_read_multiple_chunks() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update synopsis of svc_rdma_copy_inline_range()Chuck Lever1-8/+9
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_copy_inline_range() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update the synopsis of svc_rdma_read_data_item()Chuck Lever1-9/+8
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_data_item() can use that recv_ctxt to derive that information rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update synopsis of svc_rdma_read_chunk_range()Chuck Lever1-12/+12
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_chunk_range() can use that recv_ctxt to derive that information rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update synopsis of svc_rdma_build_read_chunk()Chuck Lever1-11/+10
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_chunk() can use that recv_ctxt to derive that information rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update synopsis of svc_rdma_build_read_segment()Chuck Lever1-8/+9
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_segment() can use the recv_ctxt to derive that information rather than the other way around. This removes one usage of the ri_readctxt field, enabling its removal in a subsequent patch. At the same time, the use of ri_rqst can similarly be replaced with a passed-in function parameter. Start with build_read_segment() because it is a common utility function at the bottom of the Read chunk path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Move read_info::ri_pageoff into struct svc_rdma_recv_ctxtChuck Lever1-16/+15
Further clean up: move the starting byte offset field into svc_rdma_recv_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Move svc_rdma_read_info::ri_pageno to struct svc_rdma_recv_ctxtChuck Lever1-12/+9
Further clean up: move the page index field into svc_rdma_recv_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Start moving fields out of struct svc_rdma_read_infoChuck Lever1-31/+26
Since the request's svc_rdma_recv_ctxt will stay around for the duration of the RDMA Read operation, the contents of struct svc_rdma_read_info can reside in the request's svc_rdma_recv_ctxt rather than being allocated separately. This will eventually save a call to kmalloc() in a hot path. Start this clean-up by moving the Read chunk's svc_rdma_chunk_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Move struct svc_rdma_chunk_ctxt to svc_rdma.hChuck Lever1-18/+0
Prepare for nestling these into the send and recv ctxts so they no longer have to be allocated dynamically. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Remove the svc_rdma_chunk_ctxt::cc_rdma fieldChuck Lever1-2/+0
In every instance, the pointer address in that field is now available by other means. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Pass a pointer to the transport to svc_rdma_cc_release()Chuck Lever1-6/+7
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Explicitly pass the transport to svc_rdma_post_chunk_ctxt()Chuck Lever1-5/+5
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Explicitly pass the transport into Read chunk I/O pathsChuck Lever1-22/+36
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Explicitly pass the transport into Write chunk I/O pathsChuck Lever1-1/+4
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Acquire the svcxprt_rdma pointer from the CQ contextChuck Lever1-2/+3
Enable the removal of the svc_rdma_chunk_ctxt::cc_rdma field in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Reduce size of struct svc_rdma_rw_ctxtChuck Lever1-4/+8
SG_CHUNK_SIZE is 128, making struct svc_rdma_rw_ctxt + the first SGL array more than 4200 bytes in length, pushing the memory allocation well into order 1. Even so, the RDMA rw core doesn't seem to use more than max_send_sge entries in that array (typically 32 or less), so that is all wasted space. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update some svcrdma DMA-related tracepointsChuck Lever1-5/+5
A send/recv_ctxt already records transport-related information in the cq.id, thus there is no need to record the IP addresses of the transport endpoints. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: DMA error tracepoints should report completion IDsChuck Lever1-4/+5
Update the DMA error flow tracepoints to report the completion ID of the failing context. This ties the wait/failure to a particular operation or request, which is more useful than knowing only the failing transport. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: SQ error tracepoints should report completion IDsChuck Lever2-6/+6
Update the Send Queue's error flow tracepoints to report the completion ID of the waiting or failing context. This ties the wait/failure to a particular operation or request, which is a little more useful than knowing only the transport that is about to close. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08rpcrdma: Introduce a simple cid tracepoint classChuck Lever4-4/+4
De-duplicate some code, making it easier to add new tracepoints that report only a completion ID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Add lockdep class keys for transport locksChuck Lever1-0/+6
Two svcrdma-related transport locks can become quite contended. Collate their use and make them easy to find in /proc/lock_stat for better observability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Clean up lockingChuck Lever1-2/+2
There's no need to protect llist_entry() with a spin lock. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Add an async version of svc_rdma_write_info_free()Chuck Lever1-1/+11
DMA unmapping can take quite some time, so it should not be handled in a single-threaded completion handler. Defer releasing write_info structs to the recently-added workqueue. With this patch, DMA unmapping can be handled in parallel, and it does not cause head-of-queue blocking of Write completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Add an async version of svc_rdma_send_ctxt_put()Chuck Lever1-9/+25
DMA unmapping can take quite some time, so it should not be handled in a single-threaded completion handler. Defer releasing send_ctxts to the recently-added workqueue. With this patch, DMA unmapping can be handled in parallel, and it does not cause head-of-queue blocking of Send completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Add a utility workqueue to svcrdmaChuck Lever2-8/+25
To handle work in the background, set up an UNBOUND workqueue for svcrdma. Subsequent patches will make use of it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Pre-allocate svc_rdma_recv_ctxt objectsChuck Lever1-11/+21
The original reason for allocating svc_rdma_recv_ctxt objects during Receive completion was to ensure the objects were allocated on the NUMA node closest to the underlying IB device. Since commit c5d68d25bd6b ("svcrdma: Clean up allocation of svc_rdma_recv_ctxt"), however, the device's favored node is explicitly passed to the memory allocator. To enable switching Receive completion to soft IRQ context, move memory allocation out of completion handling, since it can be costly, and it can sleep. A limited number of objects is now allocated at "accept" time. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Eliminate allocation of recv_ctxt objects in backchannelChuck Lever2-21/+21
The svc_rdma_recv_ctxt free list uses a lockless list to avoid the need for a spin lock in the fast path. llist_del_first(), which is used by svc_rdma_recv_ctxt_get(), requires serialization, however, when there are multiple list producers that are unserialized. I mistakenly thought there was only one caller of svc_rdma_recv_ctxt_get() (svc_rdma_refresh_recvs()), thus explicit serialization would not be necessary. But there is another caller: svc_rdma_bc_sendto(), and these two are not serialized against each other. I haven't seen ill effects that I could directly ascribe to a lack of serialization. It's just an observation based on code audit. When DMA-mapping before sending a Reply, the passed-in struct svc_rdma_recv_ctxt is used only for its write and reply PCLs. These are currently always empty in the backchannel case. So, instead of passing a full svc_rdma_recv_ctxt object to svc_rdma_map_reply_msg(), let's pass in just the Write and Reply PCLs. This change makes it unnecessary for the backchannel to acquire a dummy svc_rdma_recv_ctxt object when sending an RPC Call. The need for svc_rdma_recv_ctxt free list serialization is now completely avoided. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08SUNRPC: Remove RQ_SPLICE_OKChuck Lever2-12/+0
This flag is no longer used. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08SUNRPC: Add a server-side API for retrieving an RPC's pseudoflavorChuck Lever2-0/+22
NFSD will use this new API to determine whether nfsd_splice_read is safe to use. This avoids the need to add a dependency to NFSD for CONFIG_SUNRPC_GSS. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-05Merge tag 'net-6.7-rc9' of ↵Linus Torvalds16-37/+100
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and netfilter. We haven't accumulated much over the break. If it wasn't for the uninterrupted stream of fixes for Intel drivers this PR would be very slim. There was a handful of user reports, however, either they stood out because of the lower traffic or users have had more time to test over the break. The ones which are v6.7-relevant should be wrapped up. Current release - regressions: - Revert "net: ipv6/addrconf: clamp preferred_lft to the minimum required", it caused issues on networks where routers send prefixes with preferred_lft=0 - wifi: - iwlwifi: pcie: don't synchronize IRQs from IRQ, prevent deadlock - mac80211: fix re-adding debugfs entries during reconfiguration Current release - new code bugs: - tcp: print AO/MD5 messages only if there are any keys Previous releases - regressions: - virtio_net: fix missing dma unmap for resize, prevent OOM Previous releases - always broken: - mptcp: prevent tcp diag from closing listener subflows - nf_tables: - set transport header offset for egress hook, fix IPv4 mangling - skip set commit for deleted/destroyed sets, avoid double deactivation - nat: make sure action is set for all ct states, fix openvswitch matching on ICMP packets in related state - eth: mlxbf_gige: fix receive hang under heavy traffic - eth: r8169: fix PCI error on system resume for RTL8168FP - net: add missing getsockopt(SO_TIMESTAMPING_NEW) and cmsg handling" * tag 'net-6.7-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits) net/tcp: Only produce AO/MD5 logs if there are any keys net: Implement missing SO_TIMESTAMPING_NEW cmsg support bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters() net: ravb: Wait for operating mode to be applied asix: Add check for usbnet_get_endpoints octeontx2-af: Re-enable MAC TX in otx2_stop processing octeontx2-af: Always configure NIX TX link credits based on max frame size net/smc: fix invalid link access in dumping SMC-R connections net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues virtio_net: fix missing dma unmap for resize igc: Fix hicredit calculation ice: fix Get link status data length i40e: Restore VF MSI-X state during PCI reset i40e: fix use-after-free in i40e_aqc_add_filters() net: Save and restore msg_namelen in sock_sendmsg netfilter: nft_immediate: drop chain reference counter on error netfilter: nf_nat: fix action not being set for all ct states net: bcmgenet: Fix FCS generation for fragmented skbuffs mptcp: prevent tcp diag from closing listener subflows MAINTAINERS: add Geliang as reviewer for MPTCP ...
2024-01-04net: Implement missing SO_TIMESTAMPING_NEW cmsg supportThomas Lange1-0/+1
Commit 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") added the new socket option SO_TIMESTAMPING_NEW. However, it was never implemented in __sock_cmsg_send thus breaking SO_TIMESTAMPING cmsg for platforms using SO_TIMESTAMPING_NEW. Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") Link: https://lore.kernel.org/netdev/6a7281bf-bc4a-4f75-bb88-7011908ae471@app.fastmail.com/ Signed-off-by: Thomas Lange <thomas@corelatus.se> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240104085744.49164-1-thomas@corelatus.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Merge tag 'nf-24-01-03' of ↵Jakub Kicinski2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix nat packets in the related state in OVS, from Brad Cowie. 2) Drop chain reference counter on error path in case chain binding fails. * tag 'nf-24-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_immediate: drop chain reference counter on error netfilter: nf_nat: fix action not being set for all ct states ==================== Link: https://lore.kernel.org/r/20240103113001.137936-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04net/smc: fix invalid link access in dumping SMC-R connectionsWen Gu1-2/+1
A crash was found when dumping SMC-R connections. It can be reproduced by following steps: - environment: two RNICs on both sides. - run SMC-R between two sides, now a SMC_LGR_SYMMETRIC type link group will be created. - set the first RNIC down on either side and link group will turn to SMC_LGR_ASYMMETRIC_LOCAL then. - run 'smcss -R' and the crash will be triggered. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000101fdd067 P4D 8000000101fdd067 PUD 10ce46067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 1810 Comm: smcss Kdump: loaded Tainted: G W E 6.7.0-rc6+ #51 RIP: 0010:__smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag] Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x66/0x150 ? exc_page_fault+0x69/0x140 ? asm_exc_page_fault+0x26/0x30 ? __smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag] smc_diag_dump_proto+0xd0/0xf0 [smc_diag] smc_diag_dump+0x26/0x60 [smc_diag] netlink_dump+0x19f/0x320 __netlink_dump_start+0x1dc/0x300 smc_diag_handler_dump+0x6a/0x80 [smc_diag] ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag] sock_diag_rcv_msg+0x121/0x140 ? __pfx_sock_diag_rcv_msg+0x10/0x10 netlink_rcv_skb+0x5a/0x110 sock_diag_rcv+0x28/0x40 netlink_unicast+0x22a/0x330 netlink_sendmsg+0x240/0x4a0 __sock_sendmsg+0xb0/0xc0 ____sys_sendmsg+0x24e/0x300 ? copy_msghdr_from_user+0x62/0x80 ___sys_sendmsg+0x7c/0xd0 ? __do_fault+0x34/0x1a0 ? do_read_fault+0x5f/0x100 ? do_fault+0xb0/0x110 __sys_sendmsg+0x4d/0x80 do_syscall_64+0x45/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 When the first RNIC is set down, the lgr->lnk[0] will be cleared and an asymmetric link will be allocated in lgr->link[SMC_LINKS_PER_LGR_MAX - 1] by smc_llc_alloc_alt_link(). Then when we try to dump SMC-R connections in __smc_diag_dump(), the invalid lgr->lnk[0] will be accessed, resulting in this issue. So fix it by accessing the right link. Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets") Reported-by: henaumars <henaumars@sina.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7616 Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Link: https://lore.kernel.org/r/1703662835-53416-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03net: Save and restore msg_namelen in sock_sendmsgMarc Dionne1-0/+2
Commit 86a7e0b69bd5 ("net: prevent rewrite of msg_name in sock_sendmsg()") made sock_sendmsg save the incoming msg_name pointer and restore it before returning, to insulate the caller against msg_name being changed by the called code. If the address length was also changed however, we may return with an inconsistent structure where the length doesn't match the address, and attempts to reuse it may lead to lost packets. For example, a kernel that doesn't have commit 1c5950fc6fe9 ("udp6: fix potential access to stale information") will replace a v4 mapped address with its ipv4 equivalent, and shorten namelen accordingly from 28 to 16. If the caller attempts to reuse the resulting msg structure, it will have the original ipv6 (v4 mapped) address but an incorrect v4 length. Fixes: 86a7e0b69bd5 ("net: prevent rewrite of msg_name in sock_sendmsg()") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-03netfilter: nft_immediate: drop chain reference counter on errorPablo Neira Ayuso1-1/+1
In the init path, nft_data_init() bumps the chain reference counter, decrement it on error by following the error path which calls nft_data_release() to restore it. Fixes: 4bedf9eee016 ("netfilter: nf_tables: fix chain binding transaction logic") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-03netfilter: nf_nat: fix action not being set for all ct statesBrad Cowie1-1/+2
This fixes openvswitch's handling of nat packets in the related state. In nf_ct_nat_execute(), which is called from nf_ct_nat(), ICMP/ICMPv6 packets in the IP_CT_RELATED or IP_CT_RELATED_REPLY state, which have not been dropped, will follow the goto, however the placement of the goto label means that updating the action bit field will be bypassed. This causes ovs_nat_update_key() to not be called from ovs_ct_nat() which means the openvswitch match key for the ICMP/ICMPv6 packet is not updated and the pre-nat value will be retained for the key, which will result in the wrong openflow rule being matched for that packet. Move the goto label above where the action bit field is being set so that it is updated in all cases where the packet is accepted. Fixes: ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc") Signed-off-by: Brad Cowie <brad@faucet.nz> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-03mptcp: prevent tcp diag from closing listener subflowsPaolo Abeni1-0/+13
The MPTCP protocol does not expect that any other entity could change the first subflow status when such socket is listening. Unfortunately the TCP diag interface allows aborting any TCP socket, including MPTCP listeners subflows. As reported by syzbot, that trigger a WARN() and could lead to later bigger trouble. The MPTCP protocol needs to do some MPTCP-level cleanup actions to properly shutdown the listener. To keep the fix simple, prevent entirely the diag interface from stopping such listeners. We could refine the diag callback in a later, larger patch targeting net-next. Fixes: 57fc0f1ceaa4 ("mptcp: ensure listener is unhashed before updating the sk status") Cc: stable@vger.kernel.org Reported-by: <syzbot+5a01c3a666e726bc8752@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/netdev/0000000000004f4579060c68431b@google.com/ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Link: https://lore.kernel.org/r/20231226-upstream-net-20231226-mptcp-prevent-warn-v1-2-1404dcc431ea@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03Revert "net: ipv6/addrconf: clamp preferred_lft to the minimum required"Alex Henrie1-13/+5
The commit had a bug and might not have been the right approach anyway. Fixes: 629df6701c8a ("net: ipv6/addrconf: clamp preferred_lft to the minimum required") Fixes: ec575f885e3e ("Documentation: networking: explain what happens if temp_prefered_lft is too small or too large") Reported-by: Dan Moulding <dan@danm.net> Closes: https://lore.kernel.org/netdev/20231221231115.12402-1-dan@danm.net/ Link: https://lore.kernel.org/netdev/CAMMLpeTdYhd=7hhPi2Y7pwdPCgnnW5JYh-bu3hSc7im39uxnEA@mail.gmail.com/ Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231230043252.10530-1-alexhenrie24@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-02net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)Jörn-Thorben Hinz1-2/+9
Commit 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") added the new socket option SO_TIMESTAMPING_NEW. Setting the option is handled in sk_setsockopt(), querying it was not handled in sk_getsockopt(), though. Following remarks on an earlier submission of this patch, keep the old behavior of getsockopt(SO_TIMESTAMPING_OLD) which returns the active flags even if they actually have been set through SO_TIMESTAMPING_NEW. The new getsockopt(SO_TIMESTAMPING_NEW) is stricter, returning flags only if they have been set through the same option. Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") Link: https://lore.kernel.org/lkml/20230703175048.151683-1-jthinz@mailbox.tu-berlin.de/ Link: https://lore.kernel.org/netdev/0d7cddc9-03fa-43db-a579-14f3e822615b@app.fastmail.com/ Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>