<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/tipc/msg.h, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2022-01-27T13:53:27+00:00</updated>
<entry>
<title>net: tipc: remove unused static inlines</title>
<updated>2022-01-27T13:53:27+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-01-26T19:11:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5e4eca5d929a168085140fe93b1e8cef841008d5'/>
<id>urn:sha1:5e4eca5d929a168085140fe93b1e8cef841008d5</id>
<content type='text'>
IIUC the TIPC msg helpers are not meant to provide
and exhaustive API, so remove the unused ones.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: tipc: fix FB_MTU eat two pages</title>
<updated>2021-06-28T20:31:57+00:00</updated>
<author>
<name>Menglong Dong</name>
<email>dong.menglong@zte.com.cn</email>
</author>
<published>2021-06-28T06:37:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0c6de0c943dbb42831bf7502eb5c007f71e752d2'/>
<id>urn:sha1:0c6de0c943dbb42831bf7502eb5c007f71e752d2</id>
<content type='text'>
FB_MTU is used in 'tipc_msg_build()' to alloc smaller skb when memory
allocation fails, which can avoid unnecessary sending failures.

The value of FB_MTU now is 3744, and the data size will be:

  (3744 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + \
    SKB_DATA_ALIGN(BUF_HEADROOM + BUF_TAILROOM + 3))

which is larger than one page(4096), and two pages will be allocated.

To avoid it, replace '3744' with a calculation:

  (PAGE_SIZE - SKB_DATA_ALIGN(BUF_OVERHEAD) - \
    SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))

What's more, alloc_skb_fclone() will call SKB_DATA_ALIGN for data size,
and it's not necessary to make alignment for buf_size in
tipc_buf_acquire(). So, just remove it.

Fixes: 4c94cc2d3d57 ("tipc: fall back to smaller MTU if allocation of local send skb fails")
Signed-off-by: Menglong Dong &lt;dong.menglong@zte.com.cn&gt;
Acked-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: add automatic session key exchange</title>
<updated>2020-09-18T20:58:37+00:00</updated>
<author>
<name>Tuong Lien</name>
<email>tuong.t.lien@dektech.com.au</email>
</author>
<published>2020-09-18T01:17:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1ef6f7c9390ff5308c940ff8d0a53533a4673ad9'/>
<id>urn:sha1:1ef6f7c9390ff5308c940ff8d0a53533a4673ad9</id>
<content type='text'>
With support from the master key option in the previous commit, it
becomes easy to make frequent updates/exchanges of session keys between
authenticated cluster nodes.
Basically, there are two situations where the key exchange will take in
place:

- When a new node joins the cluster (with the master key), it will need
  to get its peer's TX key, so that be able to decrypt further messages
  from that peer.

- When a new session key is generated (by either user manual setting or
  later automatic rekeying feature), the key will be distributed to all
  peer nodes in the cluster.

A key to be exchanged is encapsulated in the data part of a 'MSG_CRYPTO
/KEY_DISTR_MSG' TIPC v2 message, then xmit-ed as usual and encrypted by
using the master key before sending out. Upon receipt of the message it
will be decrypted in the same way as regular messages, then attached as
the sender's RX key in the receiver node.

In this way, the key exchange is reliable by the link layer, as well as
security, integrity and authenticity by the crypto layer.

Also, the forward security will be easily achieved by user changing the
master key actively but this should not be required very frequently.

The key exchange feature is independent on the presence of a master key
Note however that the master key still is needed for new nodes to be
able to join the cluster. It is also optional, and can be turned off/on
via the sysfs: 'net/tipc/key_exchange_enabled' [default 1: enabled].

Backward compatibility is guaranteed because for nodes that do not have
master key support, key exchange using master key ie. tx_key = 0 if any
will be shortly discarded at the message validation step. In other
words, the key exchange feature will be automatically disabled to those
nodes.

v2: fix the "implicit declaration of function 'tipc_crypto_key_flush'"
error in node.c. The function only exists when built with the TIPC
"CONFIG_TIPC_CRYPTO" option.

v3: use 'info-&gt;extack' for a message emitted due to netlink operations
instead (- David's comment).

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Acked-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Signed-off-by: Tuong Lien &lt;tuong.t.lien@dektech.com.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: introduce encryption master key</title>
<updated>2020-09-18T20:58:37+00:00</updated>
<author>
<name>Tuong Lien</name>
<email>tuong.t.lien@dektech.com.au</email>
</author>
<published>2020-09-18T01:17:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=daef1ee3798b25e8464b8eb618eaa74b8f423ac7'/>
<id>urn:sha1:daef1ee3798b25e8464b8eb618eaa74b8f423ac7</id>
<content type='text'>
In addition to the supported cluster &amp; per-node encryption keys for the
en/decryption of TIPC messages, we now introduce one option for user to
set a cluster key as 'master key', which is simply a symmetric key like
the former but has a longer life cycle. It has two purposes:

- Authentication of new member nodes in the cluster. New nodes, having
  no knowledge of current session keys in the cluster will still be
  able to join the cluster as long as they know the master key. This is
  because all neighbor discovery (LINK_CONFIG) messages must be
  encrypted with this key.

- Encryption of session encryption keys during automatic exchange and
  update of those.This is a feature we will introduce in a later commit
  in this series.

We insert the new key into the currently unused slot 0 in the key array
and start using it immediately once the user has set it.
After joining, a node only knowing the master key should be fully
communicable to existing nodes in the cluster, although those nodes may
have their own session keys activated (i.e. not the master one). To
support this, we define a 'grace period', starting from the time a node
itself reports having no RX keys, so the existing nodes will use the
master key for encryption instead. The grace period can be extended but
will automatically stop after e.g. 5 seconds without a new report. This
is also the basis for later key exchanging feature as the new node will
be impossible to decrypt anything without the support from master key.

For user to set a master key, we define a new netlink flag -
'TIPC_NLA_NODE_KEY_MASTER', so it can be added to the current 'set key'
netlink command to specify the setting key to be a master key.

Above all, the traditional cluster/per-node key mechanism is guaranteed
to work when user comes not to use this master key option. This is also
compatible to legacy nodes without the feature supported.

Even this master key can be updated without any interruption of cluster
connectivity but is so is needed, this has to be coordinated and set by
the user.

Acked-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Signed-off-by: Tuong Lien &lt;tuong.t.lien@dektech.com.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: Use struct_size() helper</title>
<updated>2020-06-20T03:15:25+00:00</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavoars@kernel.org</email>
</author>
<published>2020-06-18T13:35:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e034c6d23bc43266af1fa983212218f4aa38f995'/>
<id>urn:sha1:e034c6d23bc43266af1fa983212218f4aa38f995</id>
<content type='text'>
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.

This code was detected with the help of Coccinelle and, audited and
fixed manually.

Signed-off-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: update a binding service via broadcast</title>
<updated>2020-06-17T15:53:34+00:00</updated>
<author>
<name>Hoang Huu Le</name>
<email>hoang.h.le@dektech.com.au</email>
</author>
<published>2020-06-17T06:56:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cad2929dc4321b1f237767e9bd271b61a2eaa752'/>
<id>urn:sha1:cad2929dc4321b1f237767e9bd271b61a2eaa752</id>
<content type='text'>
Currently, updating binding table (add service binding to
name table/withdraw a service binding) is being sent over replicast.
However, if we are scaling up clusters to &gt; 100 nodes/containers this
method is less affection because of looping through nodes in a cluster one
by one.

It is worth to use broadcast to update a binding service. This way, the
binding table can be updated on all peer nodes in one shot.

Broadcast is used when all peer nodes, as indicated by a new capability
flag TIPC_NAMED_BCAST, support reception of this message type.

Four problems need to be considered when introducing this feature.
1) When establishing a link to a new peer node we still update this by a
unicast 'bulk' update. This may lead to race conditions, where a later
broadcast publication/withdrawal bypass the 'bulk', resulting in
disordered publications, or even that a withdrawal may arrive before the
corresponding publication. We solve this by adding an 'is_last_bulk' bit
in the last bulk messages so that it can be distinguished from all other
messages. Only when this message has arrived do we open up for reception
of broadcast publications/withdrawals.

2) When a first legacy node is added to the cluster all distribution
will switch over to use the legacy 'replicast' method, while the
opposite happens when the last legacy node leaves the cluster. This
entails another risk of message disordering that has to be handled. We
solve this by adding a sequence number to the broadcast/replicast
messages, so that disordering can be discovered and corrected. Note
however that we don't need to consider potential message loss or
duplication at this protocol level.

3) Bulk messages don't contain any sequence numbers, and will always
arrive in order. Hence we must exempt those from the sequence number
control and deliver them unconditionally. We solve this by adding a new
'is_bulk' bit in those messages so that they can be recognized.

4) Legacy messages, which don't contain any new bits or sequence
numbers, but neither can arrive out of order, also need to be exempt
from the initial synchronization and sequence number check, and
delivered unconditionally. Therefore, we add another 'is_not_legacy' bit
to all new messages so that those can be distinguished from legacy
messages and the latter delivered directly.

v1-&gt;v2:
 - fix warning issue reported by kbuild test robot &lt;lkp@intel.com&gt;
 - add santiy check to drop the publication message with a sequence
number that is lower than the agreed synch point

Signed-off-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Hoang Huu Le &lt;hoang.h.le@dektech.com.au&gt;
Acked-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: add test for Nagle algorithm effectiveness</title>
<updated>2020-05-26T22:16:52+00:00</updated>
<author>
<name>Tuong Lien</name>
<email>tuong.t.lien@dektech.com.au</email>
</author>
<published>2020-05-26T09:38:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0a3e060f340dbe232ffa290c40f879b7f7db595b'/>
<id>urn:sha1:0a3e060f340dbe232ffa290c40f879b7f7db595b</id>
<content type='text'>
When streaming in Nagle mode, we try to bundle small messages from user
as many as possible if there is one outstanding buffer, i.e. not ACK-ed
by the receiving side, which helps boost up the overall throughput. So,
the algorithm's effectiveness really depends on when Nagle ACK comes or
what the specific network latency (RTT) is, compared to the user's
message sending rate.

In a bad case, the user's sending rate is low or the network latency is
small, there will not be many bundles, so making a Nagle ACK or waiting
for it is not meaningful.
For example: a user sends its messages every 100ms and the RTT is 50ms,
then for each messages, we require one Nagle ACK but then there is only
one user message sent without any bundles.

In a better case, even if we have a few bundles (e.g. the RTT = 300ms),
but now the user sends messages in medium size, then there will not be
any difference at all, that says 3 x 1000-byte data messages if bundled
will still result in 3 bundles with MTU = 1500.

When Nagle is ineffective, the delay in user message sending is clearly
wasted instead of sending directly.

Besides, adding Nagle ACKs will consume some processor load on both the
sending and receiving sides.

This commit adds a test on the effectiveness of the Nagle algorithm for
an individual connection in the network on which it actually runs.
Particularly, upon receipt of a Nagle ACK we will compare the number of
bundles in the backlog queue to the number of user messages which would
be sent directly without Nagle. If the ratio is good (e.g. &gt;= 2), Nagle
mode will be kept for further message sending. Otherwise, we will leave
Nagle and put a 'penalty' on the connection, so it will have to spend
more 'one-way' messages before being able to re-enter Nagle.

In addition, the 'ack-required' bit is only set when really needed that
the number of Nagle ACKs will be reduced during Nagle mode.

Testing with benchmark showed that with the patch, there was not much
difference in throughput for small messages since the tool continuously
sends messages without a break, so Nagle would still take in effect.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Acked-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Signed-off-by: Tuong Lien &lt;tuong.t.lien@dektech.com.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: add support for broadcast rcv stats dumping</title>
<updated>2020-05-26T22:16:52+00:00</updated>
<author>
<name>Tuong Lien</name>
<email>tuong.t.lien@dektech.com.au</email>
</author>
<published>2020-05-26T09:38:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=03b6fefd9bb4844c75faeb10df8496794e2fd5da'/>
<id>urn:sha1:03b6fefd9bb4844c75faeb10df8496794e2fd5da</id>
<content type='text'>
This commit enables dumping the statistics of a broadcast-receiver link
like the traditional 'broadcast-link' one (which is for broadcast-
sender). The link dumping can be triggered via netlink (e.g. the
iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the
indicator.

The name of a broadcast-receiver link of a specific peer will be in the
format: 'broadcast-link:&lt;peer-id&gt;'.

For example:

Link &lt;broadcast-link:1001002&gt;
  Window:50 packets
  RX packets:7841 fragments:2408/440 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:124 dups:0
  TX naks:21 acks:0 retrans:0
  Congestion link:0  Send queue max:0 avg:0

In addition, the broadcast-receiver link statistics can be reset in the
usual way via netlink by specifying that link name in command.

Note: the 'tipc_link_name_ext()' is removed because the link name can
now be retrieved simply via the 'l-&gt;name'.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Acked-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Signed-off-by: Tuong Lien &lt;tuong.t.lien@dektech.com.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: introduce Gap ACK blocks for broadcast link</title>
<updated>2020-05-26T22:16:52+00:00</updated>
<author>
<name>Tuong Lien</name>
<email>tuong.t.lien@dektech.com.au</email>
</author>
<published>2020-05-26T09:38:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d7626b5acff9227e2a65da636a53e09bdafdc0aa'/>
<id>urn:sha1:d7626b5acff9227e2a65da636a53e09bdafdc0aa</id>
<content type='text'>
As achieved through commit 9195948fbf34 ("tipc: improve TIPC throughput
by Gap ACK blocks"), we apply the same mechanism for the broadcast link
as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will
consist of two parts built for both the broadcast and unicast types:

 31                       16 15                        0
+-------------+-------------+-------------+-------------+
|  bgack_cnt  |  ugack_cnt  |            len            |
+-------------+-------------+-------------+-------------+  -
|            gap            |            ack            |   |
+-------------+-------------+-------------+-------------+    &gt; bc gacks
:                           :                           :   |
+-------------+-------------+-------------+-------------+  -
|            gap            |            ack            |   |
+-------------+-------------+-------------+-------------+    &gt; uc gacks
:                           :                           :   |
+-------------+-------------+-------------+-------------+  -

which is "automatically" backward-compatible.

We also increase the max number of Gap ACK blocks to 128, allowing upto
64 blocks per type (total buffer size = 516 bytes).

Besides, the 'tipc_link_advance_transmq()' function is refactored which
is applicable for both the unicast and broadcast cases now, so some old
functions can be removed and the code is optimized.

With the patch, TIPC broadcast is more robust regardless of packet loss
or disorder, latency, ... in the underlying network. Its performance is
boost up significantly.
For example, experiment with a 5% packet loss rate results:

$ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
real    0m 42.46s
user    0m 1.16s
sys     0m 17.67s

Without the patch:

$ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
real    8m 27.94s
user    0m 0.55s
sys     0m 2.38s

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Acked-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Signed-off-by: Tuong Lien &lt;tuong.t.lien@dektech.com.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: Add a missing case of TIPC_DIRECT_MSG type</title>
<updated>2020-03-26T18:21:02+00:00</updated>
<author>
<name>Hoang Le</name>
<email>hoang.h.le@dektech.com.au</email>
</author>
<published>2020-03-26T02:50:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8b1e5b0a99f04bda2d6c85ecfe5e68a356c10914'/>
<id>urn:sha1:8b1e5b0a99f04bda2d6c85ecfe5e68a356c10914</id>
<content type='text'>
In the commit f73b12812a3d
("tipc: improve throughput between nodes in netns"), we're missing a check
to handle TIPC_DIRECT_MSG type, it's still using old sending mechanism for
this message type. So, throughput improvement is not significant as
expected.

Besides that, when sending a large message with that type, we're also
handle wrong receiving queue, it should be enqueued in socket receiving
instead of multicast messages.

Fix this by adding the missing case for TIPC_DIRECT_MSG.

Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns")
Reported-by: Tuong Lien &lt;tuong.t.lien@dektech.com.au&gt;
Signed-off-by: Hoang Le &lt;hoang.h.le@dektech.com.au&gt;
Acked-by: Jon Maloy &lt;jmaloy@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
