<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/trace/events/sock.h, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-03-17T08:56:37+00:00</updated>
<entry>
<title>inet: preserve const qualifier in inet_sk()</title>
<updated>2023-03-17T08:56:37+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-03-16T15:31:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=abc17a11ed29b0471e428d86189acca8d1a213c6'/>
<id>urn:sha1:abc17a11ed29b0471e428d86189acca8d1a213c6</id>
<content type='text'>
We can change inet_sk() to propagate const qualifier of its argument.

This should avoid some potential errors caused by accidental
(const -&gt; not_const) promotion.

Other helpers like tcp_sk(), udp_sk(), raw_sk() will be handled
in separate patch series.

v2: use container_of_const() as advised by Jakub and Linus

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/netdev/20230315142841.3a2ac99a@kernel.org/
Link: https://lore.kernel.org/netdev/CAHk-=wiOf12nrYEF2vJMcucKjWPN-Ns_SW9fA7LwST_2Dzp7rw@mail.gmail.com/
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/sock: Introduce trace_sk_data_ready()</title>
<updated>2023-01-23T11:26:50+00:00</updated>
<author>
<name>Peilin Ye</name>
<email>peilin.ye@bytedance.com</email>
</author>
<published>2023-01-20T00:45:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=40e0b09081420853542571c38875b48b60404ebb'/>
<id>urn:sha1:40e0b09081420853542571c38875b48b60404ebb</id>
<content type='text'>
As suggested by Cong, introduce a tracepoint for all -&gt;sk_data_ready()
callback implementations.  For example:

&lt;...&gt;
  iperf-609  [002] .....  70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable
  iperf-609  [002] .....  70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable
&lt;...&gt;

Suggested-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: Peilin Ye &lt;peilin.ye@bytedance.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sock: add tracepoint for send recv length</title>
<updated>2023-01-13T10:25:10+00:00</updated>
<author>
<name>Yunhui Cui</name>
<email>cuiyunhui@bytedance.com</email>
</author>
<published>2023-01-11T06:59:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6e6eda44b939c0931533d6681d9f2ed41b44cde9'/>
<id>urn:sha1:6e6eda44b939c0931533d6681d9f2ed41b44cde9</id>
<content type='text'>
Add 2 tracepoints to monitor the tcp/udp traffic
of per process and per cgroup.

Regarding monitoring the tcp/udp traffic of each process, there are two
existing solutions, the first one is https://www.atoptool.nl/netatop.php.
The second is via kprobe/kretprobe.

Netatop solution is implemented by registering the hook function at the
hook point provided by the netfilter framework.

These hook functions may be in the soft interrupt context and cannot
directly obtain the pid. Some data structures are added to bind packets
and processes. For example, struct taskinfobucket, struct taskinfo ...

Every time the process sends and receives packets it needs multiple
hashmaps,resulting in low performance and it has the problem fo inaccurate
tcp/udp traffic statistics(for example: multiple threads share sockets).

We can obtain the information with kretprobe, but as we know, kprobe gets
the result by trappig in an exception, which loses performance compared
to tracepoint.

We compared the performance of tracepoints with the above two methods, and
the results are as follows:

ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html
without trace:
Time per request: 39.660 [ms] (mean)
Time per request: 0.040 [ms] (mean, across all concurrent requests)

netatop:
Time per request: 50.717 [ms] (mean)
Time per request: 0.051 [ms] (mean, across all concurrent requests)

kr:
Time per request: 43.168 [ms] (mean)
Time per request: 0.043 [ms] (mean, across all concurrent requests)

tracepoint:
Time per request: 41.004 [ms] (mean)
Time per request: 0.041 [ms] (mean, across all concurrent requests

It can be seen that tracepoint has better performance.

Signed-off-by: Yunhui Cui &lt;cuiyunhui@bytedance.com&gt;
Signed-off-by: Xiongchun Duan &lt;duanxiongchun@bytedance.com&gt;
Reviewed-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer</title>
<updated>2022-07-08T11:06:17+00:00</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2022-07-06T14:50:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=820b8963adaea34a87abbecb906d1f54c0aabfb7'/>
<id>urn:sha1:820b8963adaea34a87abbecb906d1f54c0aabfb7</id>
<content type='text'>
The trace event sock_exceed_buf_limit saves the prot-&gt;sysctl_mem pointer
and then dereferences it in the TP_printk() portion. This is unsafe as the
TP_printk() portion is executed at the time the buffer is read. That is,
it can be seconds, minutes, days, months, even years later. If the proto
is freed, then this dereference will can also lead to a kernel crash.

Instead, save the sysctl_mem array into the ring buffer and have the
TP_printk() reference that instead. This is the proper and safe way to
read pointers in trace events.

Link: https://lore.kernel.org/all/20220706052130.16368-12-kuniyu@amazon.com/

Cc: stable@vger.kernel.org
Fixes: 3847ce32aea9f ("core: add tracepoints for queueing skb to rcvbuf")
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Acked-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sock: add trace for socket errors</title>
<updated>2021-06-29T18:28:21+00:00</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2021-06-27T22:48:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e6a3e4434000de5c36d606e5b5da5f7ba49444bd'/>
<id>urn:sha1:e6a3e4434000de5c36d606e5b5da5f7ba49444bd</id>
<content type='text'>
This patch will add tracers to trace inet socket errors only. A user
space monitor application can track connection errors indepedent from
socket lifetime and do additional handling. For example a cluster
manager can fence a node if errors occurs in a specific heuristic.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Define IPPROTO_MPTCP</title>
<updated>2020-01-10T02:41:41+00:00</updated>
<author>
<name>Mat Martineau</name>
<email>mathew.j.martineau@linux.intel.com</email>
</author>
<published>2020-01-09T15:59:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=faf391c3826cd29feae02078ca2022d2f912f7cc'/>
<id>urn:sha1:faf391c3826cd29feae02078ca2022d2f912f7cc</id>
<content type='text'>
To open a MPTCP socket with socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP),
IPPROTO_MPTCP needs a value that differs from IPPROTO_TCP. The existing
IPPROTO numbers mostly map directly to IANA-specified protocol numbers.
MPTCP does not have a protocol number allocated because MPTCP packets
use the TCP protocol number. Use private number not used OTA.

Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Mat Martineau &lt;mathew.j.martineau@linux.intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sock: Make sk_protocol a 16-bit value</title>
<updated>2020-01-10T02:41:41+00:00</updated>
<author>
<name>Mat Martineau</name>
<email>mathew.j.martineau@linux.intel.com</email>
</author>
<published>2020-01-09T15:59:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bf9765145b856fa2e238a5b8a54453795ba30ad6'/>
<id>urn:sha1:bf9765145b856fa2e238a5b8a54453795ba30ad6</id>
<content type='text'>
Match the 16-bit width of skbuff-&gt;protocol. Fills an 8-bit hole so
sizeof(struct sock) does not change.

Also take care of BPF field access for sk_type/sk_protocol. Both of them
are now outside the bitfield, so we can use load instructions without
further shifting/masking.

v5 -&gt; v6:
 - update eBPF accessors, too (Intel's kbuild test robot)
v2 -&gt; v3:
 - keep 'sk_type' 2 bytes aligned (Eric)
v1 -&gt; v2:
 - preserve sk_pacing_shift as bit field (Eric)

Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: bpf@vger.kernel.org
Co-developed-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Co-developed-by: Matthieu Baerts &lt;matthieu.baerts@tessares.net&gt;
Signed-off-by: Matthieu Baerts &lt;matthieu.baerts@tessares.net&gt;
Signed-off-by: Mat Martineau &lt;mathew.j.martineau@linux.intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: annotate sk-&gt;sk_wmem_queued lockless reads</title>
<updated>2019-10-13T17:13:08+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-10-11T03:17:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ab4e846a82d0ae00176de19f2db3c5c64f8eb5f2'/>
<id>urn:sha1:ab4e846a82d0ae00176de19f2db3c5c64f8eb5f2</id>
<content type='text'>
For the sake of tcp_poll(), there are few places where we fetch
sk-&gt;sk_wmem_queued while this field can change from IRQ or other cpu.

We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.

sk_wmem_queued_add() helper is added so that we can in
the future convert to ADD_ONCE() or equivalent if/when
available.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: annotate sk-&gt;sk_rcvbuf lockless reads</title>
<updated>2019-10-13T17:13:08+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-10-11T03:17:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ebb3b78db7bf842270a46fd4fe7cc45c78fa5ed6'/>
<id>urn:sha1:ebb3b78db7bf842270a46fd4fe7cc45c78fa5ed6</id>
<content type='text'>
For the sake of tcp_poll(), there are few places where we fetch
sk-&gt;sk_rcvbuf while this field can change from IRQ or other cpu.

We need to add READ_ONCE() annotations, and also make sure write
sides use corresponding WRITE_ONCE() to avoid store-tearing.

Note that other transports probably need similar fixes.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: expose sk wmem in sock_exceed_buf_limit tracepoint</title>
<updated>2018-07-02T13:40:56+00:00</updated>
<author>
<name>Yafang Shao</name>
<email>laoar.shao@gmail.com</email>
</author>
<published>2018-07-01T15:31:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d6f19938eb031ee2158272757db33258153ae59c'/>
<id>urn:sha1:d6f19938eb031ee2158272757db33258153ae59c</id>
<content type='text'>
Currently trace_sock_exceed_buf_limit() only show rmem info,
but wmem limit may also be hit.
So expose wmem info in this tracepoint as well.

Regarding memcg, I think it is better to introduce a new tracepoint(if
that is needed), i.e. trace_memcg_limit_hit other than show memcg info in
trace_sock_exceed_buf_limit.

Signed-off-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
