kernel/linux.git/include/net/ip_vs.h, branch v6.1.176

ipvs: clear the svc scheduler ptr early on edit

2026-06-19T11:37:08+00:00

[ Upstream commit 193989cc6d80dd8e0460fb3992e69fa03bf0ff9b ] ip_vs_edit_service() while unbinding the old scheduler clears the svc->scheduler ptr after the scheduler module initiates RCU callbacks. This can cause packets to use the old scheduler at the time when svc->sched_data is already freed after RCU grace period. Fix it by clearing the ptr early in ip_vs_unbind_scheduler(), before the done_service method schedules any RCU callbacks. Also, if the new scheduler fails to initialize when replacing the old scheduler, try to restore the old scheduler while still returning the error code. Link: https://sashiko.dev/#/patchset/20260519015506.634185-1-rosenp%40gmail.com Fixes: 05f00505a89a ("ipvs: fix crash if scheduler is changed") Signed-off-by: Julian Anastasov Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin

ipvs: Update width of source for ip_vs_sync_conn_options

2023-05-24T16:32:39+00:00

[ Upstream commit e3478c68f6704638d08f437cbc552ca5970c151a ] In ip_vs_sync_conn_v0() copy is made to struct ip_vs_sync_conn_options. That structure looks like this: struct ip_vs_sync_conn_options { struct ip_vs_seq in_seq; struct ip_vs_seq out_seq; }; The source of the copy is the in_seq field of struct ip_vs_conn. Whose type is struct ip_vs_seq. Thus we can see that the source - is not as wide as the amount of data copied, which is the width of struct ip_vs_sync_conn_option. The copy is safe because the next field in is another struct ip_vs_seq. Make use of struct_group() to annotate this. Flagged by gcc-13 as: In file included from ./include/linux/string.h:254, from ./include/linux/bitmap.h:11, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/paravirt.h:17, from ./arch/x86/include/asm/cpuid.h:62, from ./arch/x86/include/asm/processor.h:19, from ./arch/x86/include/asm/timex.h:5, from ./include/linux/timex.h:67, from ./include/linux/time32.h:13, from ./include/linux/time.h:60, from ./include/linux/stat.h:19, from ./include/linux/module.h:13, from net/netfilter/ipvs/ip_vs_sync.c:38: In function 'fortify_memcpy_chk', inlined from 'ip_vs_sync_conn_v0' at net/netfilter/ipvs/ip_vs_sync.c:606:3: ./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 529 | __read_overflow2_field(q_size_field, size); | Compile tested only. Signed-off-by: Simon Horman Reviewed-by: Horatiu Vultur Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin

ipvs: use u64_stats_t for the per-cpu counters

2022-12-31T12:32:26+00:00

[ Upstream commit 1dbd8d9a82e3f26b9d063292d47ece673f48fce2 ] Use the provided u64_stats_t type to avoid load/store tearing. Fixes: 316580b69d0a ("u64_stats: provide u64_stats_t type") Signed-off-by: Julian Anastasov Cc: yunhong-cgl jiang Cc: "dust.li" Reviewed-by: Jiri Wiesner Tested-by: Jiri Wiesner Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin

ipvs: add sysctl_run_estimation to support disable estimation

2021-10-07T17:52:58+00:00

estimation_timer will iterate the est_list to do estimation for each ipvs stats. When there are lots of services, the list can be very large. We found that estimation_timer() run for more then 200ms on a machine with 104 CPU and 50K services. yunhong-cgl jiang report the same phenomenon before: https://www.spinics.net/lists/lvs-devel/msg05426.html In some cases(for example a large K8S cluster with many ipvs services), ipvs estimation may not be needed. So adding a sysctl blob to allow users to disable this completely. Default is: 1 (enable) Cc: yunhong-cgl jiang Signed-off-by: Dust Li Acked-by: Julian Anastasov Acked-by: Simon Horman Signed-off-by: Pablo Neira Ayuso

netfilter: move handlers to net/ip_vs.h

2021-02-05T02:37:57+00:00

Fix the following compilation warnings: net/netfilter/ipvs/ip_vs_proto_tcp.c:147:1: warning: no previous prototype for 'tcp_snat_handler' [-Wmissing-prototypes] 147 | tcp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp, | ^~~~~~~~~~~~~~~~ net/netfilter/ipvs/ip_vs_proto_udp.c:136:1: warning: no previous prototype for 'udp_snat_handler' [-Wmissing-prototypes] 136 | udp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp, | ^~~~~~~~~~~~~~~~ Signed-off-by: Leon Romanovsky Signed-off-by: Jakub Kicinski

ipvs: remove dependency on ip6_tables

2020-08-31T21:06:51+00:00

This dependency was added because ipv6_find_hdr was in iptables specific code but is no longer required Fixes: f8f626754ebe ("ipv6: Move ipv6_find_hdr() out of Netfilter code.") Fixes: 63dca2c0b0e7 ("ipvs: Fix faulty IPv6 extension header handling in IPVS") Signed-off-by: Yaroslav Bolyukin Acked-by: Julian Anastasov Signed-off-by: Pablo Neira Ayuso

ipvs: queue delayed work to expire no destination connections if expire_nodest_conn=1

2020-07-21T23:17:59+00:00

When expire_nodest_conn=1 and a destination is deleted, IPVS does not expire the existing connections until the next matching incoming packet. If there are many connection entries from a single client to a single destination, many packets may get dropped before all the connections are expired (more likely with lots of UDP traffic). An optimization can be made where upon deletion of a destination, IPVS queues up delayed work to immediately expire any connections with a deleted destination. This ensures any reused source ports from a client (within the IPVS timeouts) are scheduled to new real servers instead of silently dropped. Signed-off-by: Andrew Sy Kim Signed-off-by: Julian Anastasov Signed-off-by: Pablo Neira Ayuso

ipvs: allow connection reuse for unconfirmed conntrack

2020-07-03T23:18:37+00:00

YangYuxi is reporting that connection reuse is causing one-second delay when SYN hits existing connection in TIME_WAIT state. Such delay was added to give time to expire both the IPVS connection and the corresponding conntrack. This was considered a rare case at that time but it is causing problem for some environments such as Kubernetes. As nf_conntrack_tcp_packet() can decide to release the conntrack in TIME_WAIT state and to replace it with a fresh NEW conntrack, we can use this to allow rescheduling just by tuning our check: if the conntrack is confirmed we can not schedule it to different real server and the one-second delay still applies but if new conntrack was created, we are free to select new real server without any delays. YangYuxi lists some of the problem reports: - One second connection delay in masquerading mode: https://marc.info/?t=151683118100004&r=1&w=2 - IPVS low throughput #70747 https://github.com/kubernetes/kubernetes/issues/70747 - Apache Bench can fill up ipvs service proxy in seconds #544 https://github.com/cloudnativelabs/kube-router/issues/544 - Additional 1s latency in `host -> service IP -> pod` https://github.com/kubernetes/kubernetes/issues/90854 Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack") Co-developed-by: YangYuxi Signed-off-by: YangYuxi Signed-off-by: Julian Anastasov Reviewed-by: Simon Horman Signed-off-by: Pablo Neira Ayuso

ipvs: register hooks only with services

2020-06-30T16:37:39+00:00

Keep the IPVS hooks registered in Netfilter only while there are configured virtual services. This saves CPU cycles while IPVS is loaded but not used. Signed-off-by: Julian Anastasov Reviewed-by: Simon Horman Signed-off-by: Pablo Neira Ayuso

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

2019-11-02T20:54:56+00:00

The only slightly tricky merge conflict was the netdevsim because the mutex locking fix overlapped a lot of driver reload reorganization. The rest were (relatively) trivial in nature. Signed-off-by: David S. Miller