summaryrefslogtreecommitdiff
path: root/net/openvswitch
AgeCommit message (Collapse)AuthorFilesLines
2023-02-22net: openvswitch: fix possible memory leak in ovs_meter_cmd_set()Hangyu Hua1-1/+3
commit 2fa28f5c6fcbfc794340684f36d2581b4f2d20b5 upstream. old_meter needs to be free after it is detached regardless of whether the new meter is successfully attached. Fixes: c7c4c44c9a95 ("net: openvswitch: expand the meters supported number") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-15net: openvswitch: fix flow memory leak in ovs_flow_cmd_newFedor Pchelkin1-6/+6
[ Upstream commit 0c598aed445eb45b0ee7ba405f7ece99ee349c30 ] Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is not freed when an allocation of a key fails. BUG: memory leak unreferenced object 0xffff888116668000 (size 632): comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000defa3494>] kmem_cache_zalloc include/linux/slab.h:654 [inline] [<00000000defa3494>] ovs_flow_alloc+0x19/0x180 net/openvswitch/flow_table.c:77 [<00000000c67d8873>] ovs_flow_cmd_new+0x1de/0xd40 net/openvswitch/datapath.c:957 [<0000000010a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 net/netlink/genetlink.c:739 [<00000000dff3302d>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] [<00000000dff3302d>] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800 [<000000000286dd87>] netlink_rcv_skb+0x153/0x430 net/netlink/af_netlink.c:2515 [<0000000061fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811 [<000000009dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] [<000000009dc0f111>] netlink_unicast+0x545/0x7f0 net/netlink/af_netlink.c:1339 [<000000004a5ee816>] netlink_sendmsg+0x8e7/0xde0 net/netlink/af_netlink.c:1934 [<00000000482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline] [<00000000482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671 [<00000000698574ba>] ____sys_sendmsg+0x70a/0x870 net/socket.c:2356 [<00000000d28d9e11>] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410 [<0000000083ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439 [<00000000c00628f8>] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46 [<000000004abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 To fix this the patch rearranges the goto labels to reflect the order of object allocations and adds appropriate goto statements on the error paths. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 68bb10101e6b ("openvswitch: Fix flow lookup to use unmasked key") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230201210218.361970-1-pchelkin@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-14openvswitch: Fix flow lookup to use unmasked keyEelco Chaudron1-9/+16
[ Upstream commit 68bb10101e6b0a6bb44e9c908ef795fc4af99eae ] The commit mentioned below causes the ovs_flow_tbl_lookup() function to be called with the masked key. However, it's supposed to be called with the unmasked key. This due to the fact that the datapath supports installing wider flows, and OVS relies on this behavior. For example if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/ 128.0.0.0) is allowed to be added. However, if we try to add a wildcard rule, the installation fails: $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \ ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2 $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \ ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2 ovs-vswitchd: updating flow table (File exists) The reason is that the key used to determine if the flow is already present in the system uses the original key ANDed with the mask. This results in the IP address not being part of the (miniflow) key, i.e., being substituted with an all-zero value. When doing the actual lookup, this results in the key wrongfully matching the first flow, and therefore the flow does not get installed. This change reverses the commit below, but rather than having the key on the stack, it's allocated. Fixes: 190aa3e77880 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-02netfilter: conntrack: Fix data-races around ct markDaniel Xu1-4/+4
[ Upstream commit 52d1aa8b8249ff477aaa38b6f74a8ced780d079c ] nf_conn:mark can be read from and written to in parallel. Use READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted compiler optimizations. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-03openvswitch: switch from WARN to pr_warnAaron Conole1-1/+2
[ Upstream commit fd954cc1919e35cb92f78671cab6e42d661945a3 ] As noted by Paolo Abeni, pr_warn doesn't generate any splat and can still preserve the warning to the user that feature downgrade occurred. We likely cannot introduce other kinds of checks / enforcement here because syzbot can generate different genl versions to the datapath. Reported-by: syzbot+31cde0bef4bbf8ba2d86@syzkaller.appspotmail.com Fixes: 44da5ae5fbea ("openvswitch: Drop user features if old user space attempted to create datapath") Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26openvswitch: Fix overreporting of drops in dropwatchMike Pattrick1-2/+3
[ Upstream commit c21ab2afa2c64896a7f0e3cbc6845ec63dcfad2e ] Currently queue_userspace_packet will call kfree_skb for all frames, whether or not an error occurred. This can result in a single dropped frame being reported as multiple drops in dropwatch. This functions caller may also call kfree_skb in case of an error. This patch will consume the skbs instead and allow caller's to use kfree_skb. Signed-off-by: Mike Pattrick <mkp@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109957 Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26openvswitch: Fix double reporting of drops in dropwatchMike Pattrick1-3/+10
[ Upstream commit 1100248a5c5ccd57059eb8d02ec077e839a23826 ] Frames sent to userspace can be reported as dropped in ovs_dp_process_packet, however, if they are dropped in the netlink code then netlink_attachskb will report the same frame as dropped. This patch checks for error codes which indicate that the frame has already been freed. Signed-off-by: Mike Pattrick <mkp@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109946 Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-29net: openvswitch: fix parsing of nw_proto for IPv6 fragmentsRosemarie O'Riorden1-1/+1
commit 12378a5a75e33f34f8586706eb61cca9e6d4690c upstream. When a packet enters the OVS datapath and does not match any existing flows installed in the kernel flow cache, the packet will be sent to userspace to be parsed, and a new flow will be created. The kernel and OVS rely on each other to parse packet fields in the same way so that packets will be handled properly. As per the design document linked below, OVS expects all later IPv6 fragments to have nw_proto=44 in the flow key, so they can be correctly matched on OpenFlow rules. OpenFlow controllers create pipelines based on this design. This behavior was changed by the commit in the Fixes tag so that nw_proto equals the next_header field of the last extension header. However, there is no counterpart for this change in OVS userspace, meaning that this field is parsed differently between OVS and the kernel. This is a problem because OVS creates actions based on what is parsed in userspace, but the kernel-provided flow key is used as a match criteria, as described in Documentation/networking/openvswitch.rst. This leads to issues such as packets incorrectly matching on a flow and thus the wrong list of actions being applied to the packet. Such changes in packet parsing cannot be implemented without breaking the userspace. The offending commit is partially reverted to restore the expected behavior. The change technically made sense and there is a good reason that it was implemented, but it does not comply with the original design of OVS. If in the future someone wants to implement such a change, then it must be user-configurable and disabled by default to preserve backwards compatibility with existing OVS versions. Cc: stable@vger.kernel.org Fixes: fa642f08839b ("openvswitch: Derive IP protocol number for IPv6 later frags") Link: https://docs.openvswitch.org/en/latest/topics/design/#fragments Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20220621204845.9721-1-roriorden@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-22net: openvswitch: fix misuse of the cached connection on tuple changesIlya Maximets2-1/+8
commit 2061ecfdf2350994e5b61c43e50e98a7a70e95ee upstream. If packet headers changed, the cached nfct is no longer relevant for the packet and attempt to re-use it leads to the incorrect packet classification. This issue is causing broken connectivity in OpenStack deployments with OVS/OVN due to hairpin traffic being unexpectedly dropped. The setup has datapath flows with several conntrack actions and tuple changes between them: actions:ct(commit,zone=8,mark=0/0x1,nat(src)), set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)), set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)), ct(zone=8),recirc(0x4) After the first ct() action the packet headers are almost fully re-written. The next ct() tries to re-use the existing nfct entry and marks the packet as invalid, so it gets dropped later in the pipeline. Clearing the cached conntrack entry whenever packet tuple is changed to avoid the issue. The flow key should not be cleared though, because we should still be able to match on the ct_state if the recirculation happens after the tuple change but before the next ct() action. Cc: stable@vger.kernel.org Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Frode Nordahl <frode.nordahl@canonical.com> Link: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-May/051829.html Link: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856 Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Link: https://lore.kernel.org/r/20220606221140.488984-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [Backport to 5.10: minor rebase in ovs_ct_clear function. This version also applicable to and tested on 5.4 and 4.19.] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27openvswitch: fix OOB access in reserve_sfa_size()Paolo Valerio1-1/+1
commit cefa91b2332d7009bc0be5d951d6cbbf349f90f8 upstream. Given a sufficiently large number of actions, while copying and reserving memory for a new action of a new flow, if next_offset is greater than MAX_ACTIONS_BUFSIZE, the function reserve_sfa_size() does not return -EMSGSIZE as expected, but it allocates MAX_ACTIONS_BUFSIZE bytes increasing actions_len by req_size. This can then lead to an OOB write access, especially when further actions need to be copied. Fix it by rearranging the flow action size check. KASAN splat below: ================================================================== BUG: KASAN: slab-out-of-bounds in reserve_sfa_size+0x1ba/0x380 [openvswitch] Write of size 65360 at addr ffff888147e4001c by task handler15/836 CPU: 1 PID: 836 Comm: handler15 Not tainted 5.18.0-rc1+ #27 ... Call Trace: <TASK> dump_stack_lvl+0x45/0x5a print_report.cold+0x5e/0x5db ? __lock_text_start+0x8/0x8 ? reserve_sfa_size+0x1ba/0x380 [openvswitch] kasan_report+0xb5/0x130 ? reserve_sfa_size+0x1ba/0x380 [openvswitch] kasan_check_range+0xf5/0x1d0 memcpy+0x39/0x60 reserve_sfa_size+0x1ba/0x380 [openvswitch] __add_action+0x24/0x120 [openvswitch] ovs_nla_add_action+0xe/0x20 [openvswitch] ovs_ct_copy_action+0x29d/0x1130 [openvswitch] ? __kernel_text_address+0xe/0x30 ? unwind_get_return_address+0x56/0xa0 ? create_prof_cpu_mask+0x20/0x20 ? ovs_ct_verify+0xf0/0xf0 [openvswitch] ? prep_compound_page+0x198/0x2a0 ? __kasan_check_byte+0x10/0x40 ? kasan_unpoison+0x40/0x70 ? ksize+0x44/0x60 ? reserve_sfa_size+0x75/0x380 [openvswitch] __ovs_nla_copy_actions+0xc26/0x2070 [openvswitch] ? __zone_watermark_ok+0x420/0x420 ? validate_set.constprop.0+0xc90/0xc90 [openvswitch] ? __alloc_pages+0x1a9/0x3e0 ? __alloc_pages_slowpath.constprop.0+0x1da0/0x1da0 ? unwind_next_frame+0x991/0x1e40 ? __mod_node_page_state+0x99/0x120 ? __mod_lruvec_page_state+0x2e3/0x470 ? __kasan_kmalloc_large+0x90/0xe0 ovs_nla_copy_actions+0x1b4/0x2c0 [openvswitch] ovs_flow_cmd_new+0x3cd/0xb10 [openvswitch] ... Cc: stable@vger.kernel.org Fixes: f28cd2af22a0 ("openvswitch: fix flow actions reallocation") Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13net: openvswitch: fix leak of nested actionsIlya Maximets1-5/+90
[ Upstream commit 1f30fb9166d4f15a1aa19449b9da871fe0ed4796 ] While parsing user-provided actions, openvswitch module may dynamically allocate memory and store pointers in the internal copy of the actions. So this memory has to be freed while destroying the actions. Currently there are only two such actions: ct() and set(). However, there are many actions that can hold nested lists of actions and ovs_nla_free_flow_actions() just jumps over them leaking the memory. For example, removal of the flow with the following actions will lead to a leak of the memory allocated by nf_ct_tmpl_alloc(): actions:clone(ct(commit),0) Non-freed set() action may also leak the 'dst' structure for the tunnel info including device references. Under certain conditions with a high rate of flow rotation that may cause significant memory leak problem (2MB per second in reporter's case). The problem is also hard to mitigate, because the user doesn't have direct control over the datapath flows generated by OVS. Fix that by iterating over all the nested actions and freeing everything that needs to be freed recursively. New build time assertion should protect us from this problem if new actions will be added in the future. Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all attributes has to be explicitly checked. sample() and clone() actions are mixing extra attributes into the user-provided action list. That prevents some code generalization too. Fixes: 34ae932a4036 ("openvswitch: Make tunnel set action attach a metadata dst") Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html Reported-by: Stéphane Graber <stgraber@ubuntu.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13net: openvswitch: don't send internal clone attribute to the userspace.Ilya Maximets2-2/+4
[ Upstream commit 3f2a3050b4a3e7f32fc0ea3c9b0183090ae00522 ] 'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for performance optimization inside the kernel. It's added by the kernel while parsing user-provided actions and should not be sent during the flow dump as it's not part of the uAPI. The issue doesn't cause any significant problems to the ovs-vswitchd process, because reported actions are not really used in the application lifecycle and only supposed to be shown to a human via ovs-dpctl flow dump. However, the action list is still incorrect and causes the following error if the user wants to look at the datapath flows: # ovs-dpctl add-dp system@ovs-system # ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)" # ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(bad length 4, expected -1 for: action0(01 00 00 00), ct(commit),0) With the fix: # ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(ct(commit),0) Additionally fixed an incorrect attribute name in the comment. Fixes: b233504033db ("openvswitch: kernel datapath clone action") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20220404104150.2865736-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08openvswitch: Fixed nd target mask field in the flow dump.Martin Varghese1-2/+2
commit f19c44452b58a84d95e209b847f5495d91c9983a upstream. IPv6 nd target mask was not getting populated in flow dump. In the function __ovs_nla_put_key the icmp code mask field was checked instead of icmp code key field to classify the flow as neighbour discovery. ufid:bdfbe3e5-60c2-43b0-a5ff-dfcac1c37328, recirc_id(0),dp_hash(0/0), skb_priority(0/0),in_port(ovs-nm1),skb_mark(0/0),ct_state(0/0), ct_zone(0/0),ct_mark(0/0),ct_label(0/0), eth(src=00:00:00:00:00:00/00:00:00:00:00:00, dst=00:00:00:00:00:00/00:00:00:00:00:00), eth_type(0x86dd), ipv6(src=::/::,dst=::/::,label=0/0,proto=58,tclass=0/0,hlimit=0/0,frag=no), icmpv6(type=135,code=0), nd(target=2001::2/::, sll=00:00:00:00:00:00/00:00:00:00:00:00, tll=00:00:00:00:00:00/00:00:00:00:00:00), packets:10, bytes:860, used:0.504s, dp:ovs, actions:ovs-nm2 Fixes: e64457191a25 (openvswitch: Restructure datapath.c and flow.c) Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Link: https://lore.kernel.org/r/20220328054148.3057-1-martinvarghesenokia@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08openvswitch: always update flow key after natAaron Conole1-59/+59
[ Upstream commit 60b44ca6bd7518dd38fa2719bc9240378b6172c3 ] During NAT, a tuple collision may occur. When this happens, openvswitch will make a second pass through NAT which will perform additional packet modification. This will update the skb data, but not the flow key that OVS uses. This means that future flow lookups, and packet matches will have incorrect data. This has been supported since 5d50aa83e2c8 ("openvswitch: support asymmetric conntrack"). That commit failed to properly update the sw_flow_key attributes, since it only called the ovs_ct_nat_update_key once, rather than each time ovs_ct_nat_execute was called. As these two operations are linked, the ovs_ct_nat_execute() function should always make sure that the sw_flow_key is updated after a successful call through NAT infrastructure. Fixes: 5d50aa83e2c8 ("openvswitch: support asymmetric conntrack") Cc: Dumitru Ceara <dceara@redhat.com> Cc: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20220318124319.3056455-1-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-02openvswitch: Fix setting ipv6 fields causing hw csum failurePaul Blakey1-8/+38
commit d9b5ae5c1b241b91480aa30408be12fe91af834a upstream. Ipv6 ttl, label and tos fields are modified without first pulling/pushing the ipv6 header, which would have updated the hw csum (if available). This might cause csum validation when sending the packet to the stack, as can be seen in the trace below. Fix this by updating skb->csum if available. Trace resulted by ipv6 ttl dec and then sending packet to conntrack [actions: set(ipv6(hlimit=63)),ct(zone=99)]: [295241.900063] s_pf0vf2: hw csum failure [295241.923191] Call Trace: [295241.925728] <IRQ> [295241.927836] dump_stack+0x5c/0x80 [295241.931240] __skb_checksum_complete+0xac/0xc0 [295241.935778] nf_conntrack_tcp_packet+0x398/0xba0 [nf_conntrack] [295241.953030] nf_conntrack_in+0x498/0x5e0 [nf_conntrack] [295241.958344] __ovs_ct_lookup+0xac/0x860 [openvswitch] [295241.968532] ovs_ct_execute+0x4a7/0x7c0 [openvswitch] [295241.979167] do_execute_actions+0x54a/0xaa0 [openvswitch] [295242.001482] ovs_execute_actions+0x48/0x100 [openvswitch] [295242.006966] ovs_dp_process_packet+0x96/0x1d0 [openvswitch] [295242.012626] ovs_vport_receive+0x6c/0xc0 [openvswitch] [295242.028763] netdev_frame_hook+0xc0/0x180 [openvswitch] [295242.034074] __netif_receive_skb_core+0x2ca/0xcb0 [295242.047498] netif_receive_skb_internal+0x3e/0xc0 [295242.052291] napi_gro_receive+0xba/0xe0 [295242.056231] mlx5e_handle_rx_cqe_mpwrq_rep+0x12b/0x250 [mlx5_core] [295242.062513] mlx5e_poll_rx_cq+0xa0f/0xa30 [mlx5_core] [295242.067669] mlx5e_napi_poll+0xe1/0x6b0 [mlx5_core] [295242.077958] net_rx_action+0x149/0x3b0 [295242.086762] __do_softirq+0xd7/0x2d6 [295242.090427] irq_exit+0xf7/0x100 [295242.093748] do_IRQ+0x7f/0xd0 [295242.096806] common_interrupt+0xf/0xf [295242.100559] </IRQ> [295242.102750] RIP: 0033:0x7f9022e88cbd [295242.125246] RSP: 002b:00007f9022282b20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda [295242.132900] RAX: 0000000000000005 RBX: 0000000000000010 RCX: 0000000000000000 [295242.140120] RDX: 00007f9022282ba8 RSI: 00007f9022282a30 RDI: 00007f9014005c30 [295242.147337] RBP: 00007f9014014d60 R08: 0000000000000020 R09: 00007f90254a8340 [295242.154557] R10: 00007f9022282a28 R11: 0000000000000246 R12: 0000000000000000 [295242.161775] R13: 00007f902308c000 R14: 000000000000002b R15: 00007f9022b71f40 Fixes: 3fdbd1ce11e5 ("openvswitch: add ipv6 'set' action") Signed-off-by: Paul Blakey <paulb@nvidia.com> Link: https://lore.kernel.org/r/20220223163416.24096-1-paulb@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-26ovs: clear skb->tstamp in forwarding pathkaixi.fan1-0/+1
[ Upstream commit 01634047bf0d5c2d9b7d8095bb4de1663dbeedeb ] fq qdisc requires tstamp to be cleared in the forwarding path. Now ovs doesn't clear skb->tstamp. We encountered a problem with linux version 5.4.56 and ovs version 2.14.1, and packets failed to dequeue from qdisc when fq qdisc was attached to ovs port. Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: kaixi.fan <fankaixi.li@bytedance.com> Signed-off-by: xiexiaohui <xiexiaohui.xxh@bytedance.com> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03openvswitch: meter: fix race when getting now_ms.Tao Liu1-0/+8
[ Upstream commit e4df1b0c24350a0f00229ff895a91f1072bd850d ] We have observed meters working unexpected if traffic is 3+Gbit/s with multiple connections. now_ms is not pretected by meter->lock, we may get a negative long_delta_ms when another cpu updated meter->used, then: delta_ms = (u32)long_delta_ms; which will be a large value. band->bucket += delta_ms * band->rate; then we get a wrong band->bucket. OpenVswitch userspace datapath has fixed the same issue[1] some time ago, and we port the implementation to kernel datapath. [1] https://patchwork.ozlabs.org/project/openvswitch/patch/20191025114436.9746-1-i.maximets@ovn.org/ Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure") Signed-off-by: Tao Liu <thomas.liu@ucloud.cn> Suggested-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11openvswitch: fix stack OOB read while fragmenting IPv4 packetsDavide Caratti1-4/+4
commit 7c0ea5930c1c211931819d83cfb157bff1539a4c upstream. running openvswitch on kernels built with KASAN, it's possible to see the following splat while testing fragmentation of IPv4 packets: BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60 Read of size 1 at addr ffff888112fc713c by task handler2/1367 CPU: 0 PID: 1367 Comm: handler2 Not tainted 5.12.0-rc6+ #418 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 Call Trace: dump_stack+0x92/0xc1 print_address_description.constprop.7+0x1a/0x150 kasan_report.cold.13+0x7f/0x111 ip_do_fragment+0x1b03/0x1f60 ovs_fragment+0x5bf/0x840 [openvswitch] do_execute_actions+0x1bd5/0x2400 [openvswitch] ovs_execute_actions+0xc8/0x3d0 [openvswitch] ovs_packet_cmd_execute+0xa39/0x1150 [openvswitch] genl_family_rcv_msg_doit.isra.15+0x227/0x2d0 genl_rcv_msg+0x287/0x490 netlink_rcv_skb+0x120/0x380 genl_rcv+0x24/0x40 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f957079db07 Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 eb ec ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 24 ed ff ff 48 RSP: 002b:00007f956ce35a50 EFLAGS: 00000293 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f957079db07 RDX: 0000000000000000 RSI: 00007f956ce35ae0 RDI: 0000000000000019 RBP: 00007f956ce35ae0 R08: 0000000000000000 R09: 00007f9558006730 R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 00007f956ce37308 R14: 00007f956ce35f80 R15: 00007f956ce35ae0 The buggy address belongs to the page: page:00000000af2a1d93 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x112fc7 flags: 0x17ffffc0000000() raw: 0017ffffc0000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected addr ffff888112fc713c is located in stack of task handler2/1367 at offset 180 in frame: ovs_fragment+0x0/0x840 [openvswitch] this frame has 2 objects: [32, 144) 'ovs_dst' [192, 424) 'ovs_rt' Memory state around the buggy address: ffff888112fc7000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888112fc7080: 00 f1 f1 f1 f1 00 00 00 00 00 00 00 00 00 00 00 >ffff888112fc7100: 00 00 00 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 ^ ffff888112fc7180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888112fc7200: 00 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 for IPv4 packets, ovs_fragment() uses a temporary struct dst_entry. Then, in the following call graph: ip_do_fragment() ip_skb_dst_mtu() ip_dst_mtu_maybe_forward() ip_mtu_locked() the pointer to struct dst_entry is used as pointer to struct rtable: this turns the access to struct members like rt_mtu_locked into an OOB read in the stack. Fix this changing the temporary variable used for IPv4 packets in ovs_fragment(), similarly to what is done for IPv6 few lines below. Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmt") Cc: <stable@vger.kernel.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-14openvswitch: fix send of uninitialized stack memory in ct limit replyIlya Maximets1-4/+4
[ Upstream commit 4d51419d49930be2701c2633ae271b350397c3ca ] 'struct ovs_zone_limit' has more members than initialized in ovs_ct_limit_get_default_limit(). The rest of the memory is a random kernel stack content that ends up being sent to userspace. Fix that by using designated initializer that will clear all non-specified fields. Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14net: openvswitch: conntrack: simplify the return expression of ↵Zheng Yongjun1-5/+1
ovs_ct_limit_get_default_limit() [ Upstream commit 5e359044c107ecbdc2e9b3fd5ce296006e6de4bc ] Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-23net: openvswitch: fix TTL decrement exception action executionEelco Chaudron1-9/+6
[ Upstream commit 09d6217254c004f6237cc2c2bfe604af58e9a8c5 ] Currently, the exception actions are not processed correctly as the wrong dataset is passed. This change fixes this, including the misleading comment. In addition, a check was added to make sure we work on an IPv4 packet, and not just assume if it's not IPv6 it's IPv4. This was all tested using OVS with patch, https://patchwork.ozlabs.org/project/openvswitch/list/?series=21639, applied and sending packets with a TTL of 1 (and 0), both with IPv4 and IPv6. Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action netlink message format") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/160733569860.3007.12938188180387116741.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-05openvswitch: fix error return code in validate_and_copy_dec_ttl()Wang Hai1-1/+1
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Changing 'return start' to 'return action_start' can fix this bug. Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action netlink message format") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20201204114314.1596-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03net: openvswitch: ensure LSE is pullable before reading itDavide Caratti1-0/+3
when openvswitch is configured to mangle the LSE, the current value is read from the packet dereferencing 4 bytes at mpls_hdr(): ensure that the label is contained in the skb "linear" area. Found by code inspection. Fixes: d27cf5c59a12 ("net: core: add MPLS update core helper and use in OvS") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/aa099f245d93218b84b5c056b67b6058ccf81a66.1606987185.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27net: openvswitch: fix TTL decrement action netlink message formatEelco Chaudron2-23/+58
Currently, the openvswitch module is not accepting the correctly formated netlink message for the TTL decrement action. For both setting and getting the dec_ttl action, the actions should be nested in the OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi. When the original patch was sent, it was tested with a private OVS userspace implementation. This implementation was unfortunately not upstreamed and reviewed, hence an erroneous version of this patch was sent out. Leaving the patch as-is would cause problems as the kernel module could interpret additional attributes as actions and vice-versa, due to the actions not being encapsulated/nested within the actual attribute, but being concatinated after it. Fixes: 744676e77720 ("openvswitch: add TTL decrement action") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/160622121495.27296.888010441924340582.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-04net: openvswitch: silence suspicious RCU usage warningEelco Chaudron2-8/+8
Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize() by replacing rcu_dereference() with rcu_dereference_ovsl(). In addition, when creating a new datapath, make sure it's configured under the ovs_lock. Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable") Reported-by: syzbot+9a8f8bfcc56e8578016c@syzkaller.appspotmail.com Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/160439190002.56943.1418882726496275961.stgit@ebuild Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-18net: openvswitch: fix to make sure flow_lookup() is not preemptedEelco Chaudron2-25/+41
The flow_lookup() function uses per CPU variables, which must be called with BH disabled. However, this is fine in the general NAPI use case where the local BH is disabled. But, it's also called from the netlink context. The below patch makes sure that even in the netlink path, the BH is disabled. In addition, u64_stats_update_begin() requires a lock to ensure one writer which is not ensured here. Making it per-CPU and disabling NAPI (softirq) ensures that there is always only one writer. Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage") Reported-by: Juri Lelli <jlelli@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/160295903253.7789.826736662555102345.stgit@ebuild Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-14net: openvswitch: use new function dev_fetch_sw_netstatsHeiner Kallweit1-19/+1
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/5e52dc91-97b1-82b0-214b-65d404e4a2ec@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-9/+13
Small conflict around locking in rxrpc_process_event() - channel_lock moved to bundle in next, while state lock needs _bh() from net. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-08openvswitch: handle DNAT tuple collisionDumitru Ceara1-9/+13
With multiple DNAT rules it's possible that after destination translation the resulting tuples collide. For example, two openvswitch flows: nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20)) nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20)) Assuming two TCP clients initiating the following connections: 10.0.0.10:5000->10.0.0.10:10 10.0.0.10:5000->10.0.0.20:10 Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing nf_conntrack_confirm() to fail because of tuple collision. Netfilter handles this case by allocating a null binding for SNAT at egress by default. Perform the same operation in openvswitch for DNAT if no explicit SNAT is requested by the user and allocate a null binding for SNAT for packets in the "original" direction. Reported-at: https://bugzilla.redhat.com/1877128 Suggested-by: Florian Westphal <fw@strlen.de> Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-06net: openvswitch: use dev_sw_netstats_rx_add()Fabian Frederick1-7/+1
use new helper for netstats settings Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-05net: openvswitch: Constify static struct genl_small_opsRikard Falkeborn2-2/+2
The only usage of these is to assign their address to the small_ops field in the genl_family struct, which is a const pointer, and applying ARRAY_SIZE() on them. Make them const to allow the compiler to put them in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04net/sched: act_vlan: Add {POP,PUSH}_ETH actionsGuillaume Nault1-18/+10
Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to respectively pop and push a base Ethernet header at the beginning of a frame. POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any, must be stripped before calling POP_ETH. PUSH_ETH is restricted to skbs with no mac_header, and only the MAC addresses can be configured. The Ethertype is automatically set from skb->protocol. These restrictions ensure that all skb's fields remain consistent, so that this action can't confuse other part of the networking stack (like GSO). Since openvswitch already had these actions, consolidate the code in skbuff.c (like for vlan and mpls push/pop). Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03genetlink: move to smaller ops wherever possibleJakub Kicinski3-18/+18
Bulk of the genetlink users can use smaller ops, move them. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-19net: openswitch: reuse the helper variable to improve the code readablityZeng Tao1-2/+2
In the function ovs_ct_limit_exit, there is already a helper vaibale which could be reused to improve the readability, so i fix it in this patch. Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-3/+3
We got slightly different patches removing a double word in a comment in net/ipv4/raw.c - picked the version from net. Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached values instead of VNIC login response buffer (following what commit 507ebe6444a4 ("ibmvnic: Fix use-after-free of VNIC login response buffer") did). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-01net: openvswitch: fixes crash if nf_conncount_init() failsEelco Chaudron1-1/+7
If nf_conncount_init fails currently the dispatched work is not canceled, causing problems when the timer fires. This change fixes this by not scheduling the work until all initialization is successful. Fixes: a65878d6f00b ("net: openvswitch: fixes potential deadlock in dp cleanup code") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01net: openvswitch: remove unused keep_flowsTonghao Zhang2-7/+0
keep_flows was introduced by [1], which used as flag to delete flows or not. When rehashing or expanding the table instance, we will not flush the flows. Now don't use it anymore, remove it. [1] - https://github.com/openvswitch/ovs/commit/acd051f1761569205827dc9b037e15568a8d59f8 Cc: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01net: openvswitch: refactor flow free functionTonghao Zhang1-11/+11
Decrease table->count and ufid_count unconditionally, because we only don't use count or ufid_count to count when flushing the flows. To simplify the codes, we remove the "count" argument of table_instance_flow_free. To avoid a bug when deleting flows in the future, add WARN_ON in flush flows function. Cc: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01net: openvswitch: improve the coding styleTonghao Zhang4-41/+55
Not change the logic, just improve the coding style. Cc: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31openvswitch: using ip6_fragment in ipv6_stubwenxu1-6/+1
Using ipv6_stub->ipv6_fragment to avoid the netfilter dependency Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva2-3/+3
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-14net: openvswitch: introduce common code for flushing flowsTonghao Zhang3-21/+27
To avoid some issues, for example RCU usage warning and double free, we should flush the flows under ovs_lock. This patch refactors table_instance_destroy and introduces table_instance_flow_flush which can be invoked by __dp_destroy or ovs_flow_tbl_flush. Fixes: 50b0e61b32ee ("net: openvswitch: fix possible memleak on destroy flow-table") Reported-by: Johan Knöös <jknoos@google.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.html Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-05net: openvswitch: silence suspicious RCU usage warningTonghao Zhang1-3/+3
ovs_flow_tbl_destroy always is called from RCU callback or error path. It is no need to check if rcu_read_lock or lockdep_ovsl_is_held was held. ovs_dp_cmd_fill_info always is called with ovs_mutex, So use the rcu_dereference_ovsl instead of rcu_dereference in ovs_flow_tbl_masks_cache_size. Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable") Cc: Eelco Chaudron <echaudro@redhat.com> Reported-by: syzbot+c0eb9e7cdde04e4eb4be@syzkaller.appspotmail.com Reported-by: syzbot+f612c02823acb02ff9bc@syzkaller.appspotmail.com Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04net: openvswitch: make masks cache size configurableEelco Chaudron3-14/+114
This patch makes the masks cache size configurable, or with a size of 0, disable it. Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04net: openvswitch: add masks cache hit counterEelco Chaudron4-7/+23
Add a counter that counts the number of masks cache hits, and export it through the megaflow netlink statistics. Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04openvswitch: Prevent kernel-infoleak in ovs_ct_put_key()Peilin Ye1-18/+20
ovs_ct_put_key() is potentially copying uninitialized kernel stack memory into socket buffers, since the compiler may leave a 3-byte hole at the end of `struct ovs_key_ct_tuple_ipv4` and `struct ovs_key_ct_tuple_ipv6`. Fix it by initializing `orig` with memset(). Fixes: 9dd7f8907c37 ("openvswitch: Add original direction conntrack tuple to sw_flow_key.") Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04net/sched: act_ct: fix miss set mru for ovs after defrag in act_ctwenxu1-0/+1
When openvswitch conntrack offload with act_ct action. Fragment packets defrag in the ingress tc act_ct action and miss the next chain. Then the packet pass to the openvswitch datapath without the mru. The over mtu packet will be dropped in output action in openvswitch for over mtu. "kernel: net2: dropped over-mtu packet: 1528 > 1500" This patch add mru in the tc_skb_ext for adefrag and miss next chain situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru to the qdisc_skb_cb when the packet defrag. And When the chain miss, The mru is set to tc_skb_ext which can be got by ovs datapath. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25net: openvswitch: fixes potential deadlock in dp cleanup codeEelco Chaudron2-14/+13
The previous patch introduced a deadlock, this patch fixes it by making sure the work is canceled without holding the global ovs lock. This is done by moving the reorder processing one layer up to the netns level. Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage") Reported-by: syzbot+2c4ff3614695f75ce26c@syzkaller.appspotmail.com Reported-by: syzbot+bad6507e5db05017b008@syzkaller.appspotmail.com Reviewed-by: Paolo <pabeni@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: openvswitch: reorder masks array based on usageEelco Chaudron4-7/+207
This patch reorders the masks array every 4 seconds based on their usage count. This greatly reduces the masks per packet hit, and hence the overall performance. Especially in the OVS/OVN case for OpenShift. Here are some results from the OVS/OVN OpenShift test, which use 8 pods, each pod having 512 uperf connections, each connection sends a 64-byte request and gets a 1024-byte response (TCP). All uperf clients are on 1 worker node while all uperf servers are on the other worker node. Kernel without this patch : 7.71 Gbps Kernel with this patch applied: 14.52 Gbps We also run some tests to verify the rebalance activity does not lower the flow insertion rate, which does not. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Tested-by: Andrew Theurer <atheurer@redhat.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-14net: openvswitch: kerneldoc fixesAndrew Lunn2-4/+5
Simple fixes which require no deep knowledge of the code. Cc: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>