<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/btf.h, branch linux-6.5.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.5.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.5.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-05-20T05:44:14+00:00</updated>
<entry>
<title>bpf: Add kfunc filter function to 'struct btf_kfunc_id_set'</title>
<updated>2023-05-20T05:44:14+00:00</updated>
<author>
<name>Aditi Ghag</name>
<email>aditi.ghag@isovalent.com</email>
</author>
<published>2023-05-19T22:51:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e924e80ee6a39bc28d2ef8f51e19d336a98e3be0'/>
<id>urn:sha1:e924e80ee6a39bc28d2ef8f51e19d336a98e3be0</id>
<content type='text'>
This commit adds the ability to filter kfuncs to certain BPF program
types. This is required to limit bpf_sock_destroy kfunc implemented in
follow-up commits to programs with attach type 'BPF_TRACE_ITER'.

The commit adds a callback filter to 'struct btf_kfunc_id_set'.  The
filter has access to the `bpf_prog` construct including its properties
such as `expected_attached_type`.

Signed-off-by: Aditi Ghag &lt;aditi.ghag@isovalent.com&gt;
Link: https://lore.kernel.org/r/20230519225157.760788-7-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Remove KF_KPTR_GET kfunc flag</title>
<updated>2023-04-16T15:51:24+00:00</updated>
<author>
<name>David Vernet</name>
<email>void@manifault.com</email>
</author>
<published>2023-04-16T08:49:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7b4ddf3920d247c2949073b9c274301c8131332a'/>
<id>urn:sha1:7b4ddf3920d247c2949073b9c274301c8131332a</id>
<content type='text'>
We've managed to improve the UX for kptrs significantly over the last 9
months. All of the existing use cases which previously had KF_KPTR_GET
kfuncs (struct bpf_cpumask *, struct task_struct *, and struct cgroup *)
have all been updated to be synchronized using RCU. In other words,
their KF_KPTR_GET kfuncs have been removed in favor of KF_RCU |
KF_ACQUIRE kfuncs, with the pointers themselves also being readable from
maps in an RCU read region thanks to the types being RCU safe.

While KF_KPTR_GET was a logical starting point for kptrs, it's become
clear that they're not the correct abstraction. KF_KPTR_GET is a flag
that essentially does nothing other than enforcing that the argument to
a function is a pointer to a referenced kptr map value. At first glance,
that's a useful thing to guarantee to a kfunc. It gives kfuncs the
ability to try and acquire a reference on that kptr without requiring
the BPF prog to do something like this:

struct kptr_type *in_map, *new = NULL;

in_map = bpf_kptr_xchg(&amp;map-&gt;value, NULL);
if (in_map) {
        new = bpf_kptr_type_acquire(in_map);
        in_map = bpf_kptr_xchg(&amp;map-&gt;value, in_map);
        if (in_map)
                bpf_kptr_type_release(in_map);
}

That's clearly a pretty ugly (and racy) UX, and if using KF_KPTR_GET is
the only alternative, it's better than nothing. However, the problem
with any KF_KPTR_GET kfunc lies in the fact that it always requires some
kind of synchronization in order to safely do an opportunistic acquire
of the kptr in the map. This is because a BPF program running on another
CPU could do a bpf_kptr_xchg() on that map value, and free the kptr
after it's been read by the KF_KPTR_GET kfunc. For example, the
now-removed bpf_task_kptr_get() kfunc did the following:

struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
{
            struct task_struct *p;

        rcu_read_lock();
        p = READ_ONCE(*pp);
        /* If p is non-NULL, it could still be freed by another CPU,
         * so we have to do an opportunistic refcount_inc_not_zero()
         * and return NULL if the task will be freed after the
         * current RCU read region.
         */
        |f (p &amp;&amp; !refcount_inc_not_zero(&amp;p-&gt;rcu_users))
                p = NULL;
        rcu_read_unlock();

        return p;
}

In other words, the kfunc uses RCU to ensure that the task remains valid
after it's been peeked from the map. However, this is completely
redundant with just defining a KF_RCU kfunc that itself does a
refcount_inc_not_zero(), which is exactly what bpf_task_acquire() now
does.

So, the question of whether KF_KPTR_GET is useful is actually, "Are
there any synchronization mechanisms / safety flags that are required by
certain kptrs, but which are not provided by the verifier to kfuncs?"
The answer to that question today is "No", because every kptr we
currently care about is RCU protected.

Even if the answer ever became "yes", the proper way to support that
referenced kptr type would be to add support for whatever
synchronization mechanism it requires in the verifier, rather than
giving kfuncs a flag that says, "Here's a pointer to a referenced kptr
in a map, do whatever you need to do."

With all that said -- so as to allow us to consolidate the kfunc API,
and simplify the verifier a bit, this patch removes KF_KPTR_GET, and all
relevant logic from the verifier.

Signed-off-by: David Vernet &lt;void@manifault.com&gt;
Link: https://lore.kernel.org/r/20230416084928.326135-3-void@manifault.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Remove btf_field_offs, use btf_record's fields instead</title>
<updated>2023-04-16T00:36:49+00:00</updated>
<author>
<name>Dave Marchevsky</name>
<email>davemarchevsky@fb.com</email>
</author>
<published>2023-04-15T20:18:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cd2a8079014aced27da9b2e669784f31680f1351'/>
<id>urn:sha1:cd2a8079014aced27da9b2e669784f31680f1351</id>
<content type='text'>
The btf_field_offs struct contains (offset, size) for btf_record fields,
sorted by offset. btf_field_offs is always used in conjunction with
btf_record, which has btf_field 'fields' array with (offset, type), the
latter of which btf_field_offs' size is derived from via
btf_field_type_size.

This patch adds a size field to struct btf_field and sorts btf_record's
fields by offset, making it possible to get rid of btf_field_offs. Less
data duplication and less code complexity results.

Since btf_field_offs' lifetime closely followed the btf_record used to
populate it, most complexity wins are from removal of initialization
code like:

  if (btf_record_successfully_initialized) {
    foffs = btf_parse_field_offs(rec);
    if (IS_ERR_OR_NULL(foffs))
      // free the btf_record and return err
  }

Other changes in this patch are pretty mechanical:

  * foffs-&gt;field_off[i] -&gt; rec-&gt;fields[i].offset
  * foffs-&gt;field_sz[i] -&gt; rec-&gt;fields[i].size
  * Sort rec-&gt;fields in btf_parse_fields before returning
    * It's possible that this is necessary independently of other
      changes in this patch. btf_record_find in syscall.c expects
      btf_record's fields to be sorted by offset, yet there's no
      explicit sorting of them before this patch, record's fields are
      populated in the order they're read from BTF struct definition.
      BTF docs don't say anything about the sortedness of struct fields.
  * All functions taking struct btf_field_offs * input now instead take
    struct btf_record *. All callsites of these functions already have
    access to the correct btf_record.

Signed-off-by: Dave Marchevsky &lt;davemarchevsky@fb.com&gt;
Link: https://lore.kernel.org/r/20230415201811.343116-2-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Add log_true_size output field to return necessary log buffer size</title>
<updated>2023-04-11T16:05:43+00:00</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2023-04-06T23:41:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=47a71c1f9af0a334c9dfa97633c41de4feda4287'/>
<id>urn:sha1:47a71c1f9af0a334c9dfa97633c41de4feda4287</id>
<content type='text'>
Add output-only log_true_size and btf_log_true_size field to
BPF_PROG_LOAD and BPF_BTF_LOAD commands, respectively. It will return
the size of log buffer necessary to fit in all the log contents at
specified log_level. This is very useful for BPF loader libraries like
libbpf to be able to size log buffer correctly, but could be used by
users directly, if necessary, as well.

This patch plumbs all this through the code, taking into account actual
bpf_attr size provided by user to determine if these new fields are
expected by users. And if they are, set them from kernel on return.

We refactory btf_parse() function to accommodate this, moving attr and
uattr handling inside it. The rest is very straightforward code, which
is split from the logging accounting changes in the previous patch to
make it simpler to review logic vs UAPI changes.

Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Lorenz Bauer &lt;lmb@isovalent.com&gt;
Link: https://lore.kernel.org/bpf/20230406234205.323208-13-andrii@kernel.org
</content>
</entry>
<entry>
<title>bpf: Support __kptr to local kptrs</title>
<updated>2023-03-11T00:38:05+00:00</updated>
<author>
<name>Dave Marchevsky</name>
<email>davemarchevsky@fb.com</email>
</author>
<published>2023-03-10T23:07:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c8e18754091479fac3f5b6c053c6bc4be0b7fb11'/>
<id>urn:sha1:c8e18754091479fac3f5b6c053c6bc4be0b7fb11</id>
<content type='text'>
If a PTR_TO_BTF_ID type comes from program BTF - not vmlinux or module
BTF - it must have been allocated by bpf_obj_new and therefore must be
free'd with bpf_obj_drop. Such a PTR_TO_BTF_ID is considered a "local
kptr" and is tagged with MEM_ALLOC type tag by bpf_obj_new.

This patch adds support for treating __kptr-tagged pointers to "local
kptrs" as having an implicit bpf_obj_drop destructor for referenced kptr
acquire / release semantics. Consider the following example:

  struct node_data {
          long key;
          long data;
          struct bpf_rb_node node;
  };

  struct map_value {
          struct node_data __kptr *node;
  };

  struct {
          __uint(type, BPF_MAP_TYPE_ARRAY);
          __type(key, int);
          __type(value, struct map_value);
          __uint(max_entries, 1);
  } some_nodes SEC(".maps");

If struct node_data had a matching definition in kernel BTF, the verifier would
expect a destructor for the type to be registered. Since struct node_data does
not match any type in kernel BTF, the verifier knows that there is no kfunc
that provides a PTR_TO_BTF_ID to this type, and that such a PTR_TO_BTF_ID can
only come from bpf_obj_new. So instead of searching for a registered dtor,
a bpf_obj_drop dtor can be assumed.

This allows the runtime to properly destruct such kptrs in
bpf_obj_free_fields, which enables maps to clean up map_vals w/ such
kptrs when going away.

Implementation notes:
  * "kernel_btf" variable is renamed to "kptr_btf" in btf_parse_kptr.
    Before this patch, the variable would only ever point to vmlinux or
    module BTFs, but now it can point to some program BTF for local kptr
    type. It's later used to populate the (btf, btf_id) pair in kptr btf
    field.
  * It's necessary to btf_get the program BTF when populating btf_field
    for local kptr. btf_record_free later does a btf_put.
  * Behavior for non-local referenced kptrs is not modified, as
    bpf_find_btf_id helper only searches vmlinux and module BTFs for
    matching BTF type. If such a type is found, btf_field_kptr's btf will
    pass btf_is_kernel check, and the associated release function is
    some one-argument dtor. If btf_is_kernel check fails, associated
    release function is two-arg bpf_obj_drop_impl. Before this patch
    only btf_field_kptr's w/ kernel or module BTFs were created.

Signed-off-by: Dave Marchevsky &lt;davemarchevsky@fb.com&gt;
Link: https://lore.kernel.org/r/20230310230743.2320707-2-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: add iterator kfuncs registration and validation logic</title>
<updated>2023-03-09T00:19:50+00:00</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2023-03-08T18:41:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=215bf4962f6c9605710012fad222a5fec001b3ad'/>
<id>urn:sha1:215bf4962f6c9605710012fad222a5fec001b3ad</id>
<content type='text'>
Add ability to register kfuncs that implement BPF open-coded iterator
contract and enforce naming and function proto convention. Enforcement
happens at the time of kfunc registration and significantly simplifies
the rest of iterators logic in the verifier.

More details follow in subsequent patches, but we enforce the following
conditions.

All kfuncs (constructor, next, destructor) have to be named consistenly
as bpf_iter_&lt;type&gt;_{new,next,destroy}(), respectively. &lt;type&gt; represents
iterator type, and iterator state should be represented as a matching
`struct bpf_iter_&lt;type&gt;` state type. Also, all iter kfuncs should have
a pointer to this `struct bpf_iter_&lt;type&gt;` as the very first argument.

Additionally:
  - Constructor, i.e., bpf_iter_&lt;type&gt;_new(), can have arbitrary extra
  number of arguments. Return type is not enforced either.
  - Next method, i.e., bpf_iter_&lt;type&gt;_next(), has to return a pointer
  type and should have exactly one argument: `struct bpf_iter_&lt;type&gt; *`
  (const/volatile/restrict and typedefs are ignored).
  - Destructor, i.e., bpf_iter_&lt;type&gt;_destroy(), should return void and
  should have exactly one argument, similar to the next method.
  - struct bpf_iter_&lt;type&gt; size is enforced to be positive and
  a multiple of 8 bytes (to fit stack slots correctly).

Such strictness and consistency allows to build generic helpers
abstracting important, but boilerplate, details to be able to use
open-coded iterators effectively and ergonomically (see bpf_for_each()
in subsequent patches). It also simplifies the verifier logic in some
places. At the same time, this doesn't hurt generality of possible
iterator implementations. Win-win.

Constructor kfunc is marked with a new KF_ITER_NEW flags, next method is
marked with KF_ITER_NEXT (and should also have KF_RET_NULL, of course),
while destructor kfunc is marked as KF_ITER_DESTROY.

Additionally, we add a trivial kfunc name validation: it should be
a valid non-NULL and non-empty string.

Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/r/20230308184121.1165081-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Introduce kptr_rcu.</title>
<updated>2023-03-03T16:42:20+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2023-03-03T04:14:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=20c09d92faeefb8536f705d3a4629e0dc314c8a1'/>
<id>urn:sha1:20c09d92faeefb8536f705d3a4629e0dc314c8a1</id>
<content type='text'>
The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
Derefrence of other kptr-s returns PTR_UNTRUSTED.

For example:
struct map_value {
   struct cgroup __kptr *cgrp;
};

SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
{
  struct cgroup *cg, *cg2;

  cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id &gt; 0
  bpf_kptr_xchg(&amp;v-&gt;cgrp, cg);

  cg2 = v-&gt;cgrp; // This is new feature introduced by this patch.
  // cg2 is PTR_MAYBE_NULL | MEM_RCU.
  // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero

  if (cg2)
    bpf_cgroup_ancestor(cg2, level); // safe to do.
}

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: David Vernet &lt;void@manifault.com&gt;
Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Add __bpf_kfunc tag for marking kernel functions as kfuncs</title>
<updated>2023-02-01T23:25:13+00:00</updated>
<author>
<name>David Vernet</name>
<email>void@manifault.com</email>
</author>
<published>2023-02-01T17:30:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=57e7c169cd6afa093d858b8edfb9bceaf2e1c93b'/>
<id>urn:sha1:57e7c169cd6afa093d858b8edfb9bceaf2e1c93b</id>
<content type='text'>
kfuncs are functions defined in the kernel, which may be invoked by BPF
programs. They may or may not also be used as regular kernel functions,
implying that they may be static (in which case the compiler could e.g.
inline it away, or elide one or more arguments), or it could have
external linkage, but potentially be elided in an LTO build if a
function is observed to never be used, and is stripped from the final
kernel binary.

This has already resulted in some issues, such as those discussed in [0]
wherein changes in DWARF that identify when a parameter has been
optimized out can break BTF encodings (and in general break the kfunc).

[0]: https://lore.kernel.org/all/1675088985-20300-2-git-send-email-alan.maguire@oracle.com/

We therefore require some convenience macro that kfunc developers can
use just add to their kfuncs, and which will prevent all of the above
issues from happening. This is in contrast with what we have today,
where some kfunc definitions have "noinline", some have "__used", and
others are static and have neither.

Note that longer term, this mechanism may be replaced by a macro that
more closely resembles EXPORT_SYMBOL_GPL(), as described in [1]. For
now, we're going with this shorter-term approach to fix existing issues
in kfuncs.

[1]: https://lore.kernel.org/lkml/Y9AFT4pTydKh+PD3@maniforge.lan/

Note as well that checkpatch complains about this patch with the
following:

ERROR: Macros with complex values should be enclosed in parentheses
+#define __bpf_kfunc __used noinline

There seems to be a precedent for using this pattern in other places
such as compiler_types.h (see e.g. __randomize_layout and noinstr), so
it seems appropriate.

Signed-off-by: David Vernet &lt;void@manifault.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Link: https://lore.kernel.org/bpf/20230201173016.342758-2-void@manifault.com
</content>
</entry>
<entry>
<title>bpf: btf: Add BTF_FMODEL_SIGNED_ARG flag</title>
<updated>2023-01-28T20:45:15+00:00</updated>
<author>
<name>Ilya Leoshkevich</name>
<email>iii@linux.ibm.com</email>
</author>
<published>2023-01-28T00:06:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=49f67f393ff264e8d83f6fcec0728a6aa8eed102'/>
<id>urn:sha1:49f67f393ff264e8d83f6fcec0728a6aa8eed102</id>
<content type='text'>
s390x eBPF JIT needs to know whether a function return value is signed
and which function arguments are signed, in order to generate code
compliant with the s390x ABI.

Signed-off-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20230128000650.1516334-26-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge "do not rely on ALLOW_ERROR_INJECTION for fmod_ret" into bpf-next</title>
<updated>2022-12-07T21:49:21+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2022-12-07T21:49:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0a6ea1ce8260f08079b0940350a21e4ad95c2378'/>
<id>urn:sha1:0a6ea1ce8260f08079b0940350a21e4ad95c2378</id>
<content type='text'>
Merge commit 5b481acab4ce ("bpf: do not rely on ALLOW_ERROR_INJECTION for fmod_ret")
from hid tree into bpf-next.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
</feed>
