<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/openvswitch/datapath.h, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-07-07T21:30:39+00:00</updated>
<entry>
<title>net: openvswitch: allow providing upcall pid for the 'execute' command</title>
<updated>2025-07-07T21:30:39+00:00</updated>
<author>
<name>Ilya Maximets</name>
<email>i.maximets@ovn.org</email>
</author>
<published>2025-07-02T15:50:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=59f44c9ccc3bb68aa3b062b8e57ce0e1ee2fca75'/>
<id>urn:sha1:59f44c9ccc3bb68aa3b062b8e57ce0e1ee2fca75</id>
<content type='text'>
When a packet enters OVS datapath and there is no flow to handle it,
packet goes to userspace through a MISS upcall.  With per-CPU upcall
dispatch mechanism, we're using the current CPU id to select the
Netlink PID on which to send this packet.  This allows us to send
packets from the same traffic flow through the same handler.

The handler will process the packet, install required flow into the
kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE.

While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a
recirculation action that will pass the (likely modified) packet
through the flow lookup again.  And if the flow is not found, the
packet will be sent to userspace again through another MISS upcall.

However, the handler thread in userspace is likely running on a
different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled
in the syscall context of that thread.  So, when the time comes to
send the packet through another upcall, the per-CPU dispatch will
choose a different Netlink PID, and this packet will end up processed
by a different handler thread on a different CPU.

The process continues as long as there are new recirculations, each
time the packet goes to a different handler thread before it is sent
out of the OVS datapath to the destination port.  In real setups the
number of recirculations can go up to 4 or 5, sometimes more.

There is always a chance to re-order packets while processing upcalls,
because userspace will first install the flow and then re-inject the
original packet.  So, there is a race window when the flow is already
installed and the second packet can match it and be forwarded to the
destination before the first packet is re-injected.  But the fact that
packets are going through multiple upcalls handled by different
userspace threads makes the reordering noticeably more likely, because
we not only have a race between the kernel and a userspace handler
(which is hard to avoid), but also between multiple userspace handlers.

For example, let's assume that 10 packets got enqueued through a MISS
upcall for handler-1, it will start processing them, will install the
flow into the kernel and start re-injecting packets back, from where
they will go through another MISS to handler-2.  Handler-2 will install
the flow into the kernel and start re-injecting the packets, while
handler-1 continues to re-inject the last of the 10 packets, they will
hit the flow installed by handler-2 and be forwarded without going to
the handler-2, while handler-2 still re-injects the first of these 10
packets.  Given multiple recirculations and misses, these 10 packets
may end up completely mixed up on the output from the datapath.

Let's allow userspace to specify on which Netlink PID the packets
should be upcalled while processing OVS_PACKET_CMD_EXECUTE.
This makes it possible to ensure that all the packets are processed
by the same handler thread in the userspace even with them being
upcalled multiple times in the process.  Packets will remain in order
since they will be enqueued to the same socket and re-injected in the
same order.  This doesn't eliminate re-ordering as stated above, since
we still have a race between kernel and the userspace thread, but it
allows to eliminate races between multiple userspace threads.

Userspace knows the PID of the socket on which the original upcall is
received, so there is no need to send it up from the kernel.

Solution requires storing the value somewhere for the duration of the
packet processing.  There are two potential places for this: our skb
extension or the per-CPU storage.  It's not clear which is better,
so just following currently used scheme of storing this kind of things
along the skb.  We still have a decent amount of space in the cb.

Signed-off-by: Ilya Maximets &lt;i.maximets@ovn.org&gt;
Acked-by: Flavio Leitner &lt;fbl@sysclose.org&gt;
Acked-by: Eelco Chaudron &lt;echaudro@redhat.com&gt;
Acked-by: Aaron Conole &lt;aconole@redhat.com&gt;
Link: https://patch.msgid.link/20250702155043.2331772-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>openvswitch: Allocate struct ovs_pcpu_storage dynamically</title>
<updated>2025-06-17T12:47:46+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2025-06-13T12:36:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7b4ac12cc929e281cf7edc22203e0533790ebc2b'/>
<id>urn:sha1:7b4ac12cc929e281cf7edc22203e0533790ebc2b</id>
<content type='text'>
PERCPU_MODULE_RESERVE defines the maximum size that can by used for the
per-CPU data size used by modules. This is 8KiB.

Commit 035fcdc4d240c ("openvswitch: Merge three per-CPU structures into
one") restructured the per-CPU memory allocation for the module and
moved the separate alloc_percpu() invocations at module init time to a
static per-CPU variable which is allocated by the module loader.

The size of the per-CPU data section for openvswitch is 6488 bytes which
is ~80% of the available per-CPU memory. Together with a few other
modules it is easy to exhaust the available 8KiB of memory.

Allocate ovs_pcpu_storage dynamically at module init time.

Reported-by: Gal Pressman &lt;gal@nvidia.com&gt;
Closes: https://lore.kernel.org/all/c401e017-f8db-4f57-a1cd-89beb979a277@nvidia.com
Fixes: 035fcdc4d240c ("openvswitch: Merge three per-CPU structures into one")
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Reviewed-by: Aaron Conole &lt;aconole@redhat.com&gt;
Link: https://patch.msgid.link/20250613123629.-XSoQTCu@linutronix.de
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>openvswitch: Move ovs_frag_data_storage into the struct ovs_pcpu_storage</title>
<updated>2025-05-15T13:23:31+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2025-05-12T09:27:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3af4cdd67f32529c177b885d4ca491710e961928'/>
<id>urn:sha1:3af4cdd67f32529c177b885d4ca491710e961928</id>
<content type='text'>
ovs_frag_data_storage is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.

Move ovs_frag_data_storage into the struct ovs_pcpu_storage which already
provides locking for the structure.

Cc: Aaron Conole &lt;aconole@redhat.com&gt;
Cc: Eelco Chaudron &lt;echaudro@redhat.com&gt;
Cc: Ilya Maximets &lt;i.maximets@ovn.org&gt;
Cc: dev@openvswitch.org
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Reviewed-by: Aaron Conole &lt;aconole@redhat.com&gt;
Link: https://patch.msgid.link/20250512092736.229935-10-bigeasy@linutronix.de
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>openvswitch: Use nested-BH locking for ovs_pcpu_storage</title>
<updated>2025-05-15T13:23:31+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2025-05-12T09:27:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=672318331b44753ab7bd8545558939c38b4c1132'/>
<id>urn:sha1:672318331b44753ab7bd8545558939c38b4c1132</id>
<content type='text'>
ovs_pcpu_storage is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.
The data structure can be referenced recursive and there is a recursion
counter to avoid too many recursions.

Add a local_lock_t to the data structure and use
local_lock_nested_bh() for locking. Add an owner of the struct which is
the current task and acquire the lock only if the structure is not owned
by the current task.

Cc: Aaron Conole &lt;aconole@redhat.com&gt;
Cc: Eelco Chaudron &lt;echaudro@redhat.com&gt;
Cc: Ilya Maximets &lt;i.maximets@ovn.org&gt;
Cc: dev@openvswitch.org
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Reviewed-by: Aaron Conole &lt;aconole@redhat.com&gt;
Link: https://patch.msgid.link/20250512092736.229935-9-bigeasy@linutronix.de
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>openvswitch: Merge three per-CPU structures into one</title>
<updated>2025-05-15T13:23:31+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2025-05-12T09:27:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=035fcdc4d240c873c89b76b752dd9921bc88c1ba'/>
<id>urn:sha1:035fcdc4d240c873c89b76b752dd9921bc88c1ba</id>
<content type='text'>
exec_actions_level is a per-CPU integer allocated at compile time.
action_fifos and flow_keys are per-CPU pointer and have their data
allocated at module init time.
There is no gain in splitting it, once the module is allocated, the
structures are allocated.

Merge the three per-CPU variables into ovs_pcpu_storage, adapt callers.

Cc: Aaron Conole &lt;aconole@redhat.com&gt;
Cc: Eelco Chaudron &lt;echaudro@redhat.com&gt;
Cc: Ilya Maximets &lt;i.maximets@ovn.org&gt;
Cc: dev@openvswitch.org
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Reviewed-by: Aaron Conole &lt;aconole@redhat.com&gt;
Link: https://patch.msgid.link/20250512092736.229935-8-bigeasy@linutronix.de
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>net: openvswitch: fix kernel-doc warnings in internal headers</title>
<updated>2025-03-24T16:30:21+00:00</updated>
<author>
<name>Ilya Maximets</name>
<email>i.maximets@ovn.org</email>
</author>
<published>2025-03-20T22:42:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6bb0dcb3d321c14be7ca33b71a149034d6a2cde8'/>
<id>urn:sha1:6bb0dcb3d321c14be7ca33b71a149034d6a2cde8</id>
<content type='text'>
Some field descriptions were missing, some were not very accurate.
Not touching the uAPI header or .c files for now.

Formatting of those comments isn't great in general, but at least
they are not missing anything now.

Before:
  $ ./scripts/kernel-doc -none -Wall net/openvswitch/*.h 2&gt;&amp;1 | wc -l
  16

After:
  $ ./scripts/kernel-doc -none -Wall net/openvswitch/*.h 2&gt;&amp;1 | wc -l
  0

Signed-off-by: Ilya Maximets &lt;i.maximets@ovn.org&gt;
Acked-by: Eelco Chaudron &lt;echaudro@redhat.com&gt;
Reviewed-by: Aaron Conole &lt;aconole@redhat.com&gt;
Link: https://patch.msgid.link/20250320224431.252489-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Revert "openvswitch: switch to per-action label counting in conntrack"</title>
<updated>2025-03-13T09:29:34+00:00</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2025-03-08T18:05:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1063ae07383c0ddc5bcce170260c143825846b03'/>
<id>urn:sha1:1063ae07383c0ddc5bcce170260c143825846b03</id>
<content type='text'>
Currently, ovs_ct_set_labels() is only called for confirmed conntrack
entries (ct) within ovs_ct_commit(). However, if the conntrack entry
does not have the labels_ext extension, attempting to allocate it in
ovs_ct_get_conn_labels() for a confirmed entry triggers a warning in
nf_ct_ext_add():

  WARN_ON(nf_ct_is_confirmed(ct));

This happens when the conntrack entry is created externally before OVS
increments net-&gt;ct.labels_used. The issue has become more likely since
commit fcb1aa5163b1 ("openvswitch: switch to per-action label counting
in conntrack"), which changed to use per-action label counting and
increment net-&gt;ct.labels_used when a flow with ct action is added.

Since there’s no straightforward way to fully resolve this issue at the
moment, this reverts the commit to avoid breaking existing use cases.

Fixes: fcb1aa5163b1 ("openvswitch: switch to per-action label counting in conntrack")
Reported-by: Jianbo Liu &lt;jianbol@nvidia.com&gt;
Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Acked-by: Aaron Conole &lt;aconole@redhat.com&gt;
Link: https://patch.msgid.link/1bdeb2f3a812bca016a225d3de714427b2cd4772.1741457143.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>openvswitch: switch to per-action label counting in conntrack</title>
<updated>2024-08-16T02:05:19+00:00</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2024-08-12T17:17:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fcb1aa5163b1ae4cf2864b688b08927aac51f51e'/>
<id>urn:sha1:fcb1aa5163b1ae4cf2864b688b08927aac51f51e</id>
<content type='text'>
Similar to commit 70f06c115bcc ("sched: act_ct: switch to per-action
label counting"), we should also switch to per-action label counting
in openvswitch conntrack, as Florian suggested.

The difference is that nf_connlabels_get() is called unconditionally
when creating an ct action in ovs_ct_copy_action(). As with these
flows:

  table=0,ip,actions=ct(commit,table=1)
  table=1,ip,actions=ct(commit,exec(set_field:0xac-&gt;ct_label),table=2)

it needs to make sure the label ext is created in the 1st flow before
the ct is committed in ovs_ct_commit(). Otherwise, the warning in
nf_ct_ext_add() when creating the label ext in the 2nd flow will
be triggered:

   WARN_ON(nf_ct_is_confirmed(ct));

Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Reviewed-by: Aaron Conole &lt;aconole@redhat.com&gt;
Acked-by: Florian Westphal &lt;fw@strlen.de&gt;
Link: https://patch.msgid.link/6b9347d5c1a0b364e88d900b29a616c3f8e5b1ca.1723483073.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: openvswitch: store sampling probability in cb.</title>
<updated>2024-07-06T00:45:47+00:00</updated>
<author>
<name>Adrian Moreno</name>
<email>amorenoz@redhat.com</email>
</author>
<published>2024-07-04T08:56:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=71763d8a8203c28178d7be7f18af73d4dddb36ba'/>
<id>urn:sha1:71763d8a8203c28178d7be7f18af73d4dddb36ba</id>
<content type='text'>
When a packet sample is observed, the sampling rate that was used is
important to estimate the real frequency of such event.

Store the probability of the parent sample action in the skb's cb area
and use it in psample action to pass it down to psample module.

Reviewed-by: Aaron Conole &lt;aconole@redhat.com&gt;
Acked-by: Eelco Chaudron &lt;echaudro@redhat.com&gt;
Reviewed-by: Ilya Maximets &lt;i.maximets@ovn.org&gt;
Signed-off-by: Adrian Moreno &lt;amorenoz@redhat.com&gt;
Link: https://patch.msgid.link/20240704085710.353845-7-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/sched: Enable tc skb ext allocation on chain miss only when needed</title>
<updated>2022-02-05T10:12:53+00:00</updated>
<author>
<name>Paul Blakey</name>
<email>paulb@nvidia.com</email>
</author>
<published>2022-02-03T08:44:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=35d39fecbc242150af5587506e58ec1f8541fb68'/>
<id>urn:sha1:35d39fecbc242150af5587506e58ec1f8541fb68</id>
<content type='text'>
Currently tc skb extension is used to send miss info from
tc to ovs datapath module, and driver to tc. For the tc to ovs
miss it is currently always allocated even if it will not
be used by ovs datapath (as it depends on a requested feature).

Export the static key which is used by openvswitch module to
guard this code path as well, so it will be skipped if ovs
datapath doesn't need it. Enable this code path once
ovs datapath needs it.

Signed-off-by: Paul Blakey &lt;paulb@nvidia.com&gt;
Reviewed-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
