<feed xmlns='http://www.w3.org/2005/Atom'>
<title>BMC/Intel-BMC/linux.git/kernel/bpf, branch dev-5.14-intel</title>
<subtitle>Intel OpenBMC Linux kernel source tree (mirror)</subtitle>
<id>https://git.radix-linux.su/BMC/Intel-BMC/linux.git/atom?h=dev-5.14-intel</id>
<link rel='self' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/atom?h=dev-5.14-intel'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/'/>
<updated>2021-10-07T05:53:13+00:00</updated>
<entry>
<title>bpf: Exempt CAP_BPF from checks against bpf_jit_limit</title>
<updated>2021-10-07T05:53:13+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@cloudflare.com</email>
</author>
<published>2021-09-22T11:11:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=b822ce7334d5de4741f388cbf4580fd3ab040458'/>
<id>urn:sha1:b822ce7334d5de4741f388cbf4580fd3ab040458</id>
<content type='text'>
[ Upstream commit 8a98ae12fbefdb583a7696de719a1d57e5e940a2 ]

When introducing CAP_BPF, bpf_jit_charge_modmem() was not changed to treat
programs with CAP_BPF as privileged for the purpose of JIT memory allocation.
This means that a program without CAP_BPF can block a program with CAP_BPF
from loading a program.

Fix this by checking bpf_capable() in bpf_jit_charge_modmem().

Fixes: 2c78ee898d8f ("bpf: Implement CAP_BPF")
Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20210922111153.19843-1-lmb@cloudflare.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Handle return value of BPF_PROG_TYPE_STRUCT_OPS prog</title>
<updated>2021-10-07T05:53:09+00:00</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2021-09-14T02:33:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=67b07e7b490ffce43f73f07dffe2fc1c79821a43'/>
<id>urn:sha1:67b07e7b490ffce43f73f07dffe2fc1c79821a43</id>
<content type='text'>
[ Upstream commit 356ed64991c6847a0c4f2e8fa3b1133f7a14f1fc ]

Currently if a function ptr in struct_ops has a return value, its
caller will get a random return value from it, because the return
value of related BPF_PROG_TYPE_STRUCT_OPS prog is just dropped.

So adding a new flag BPF_TRAMP_F_RET_FENTRY_RET to tell bpf trampoline
to save and return the return value of struct_ops prog if ret_size of
the function ptr is greater than 0. Also restricting the flag to be
used alone.

Fixes: 85d33df357b6 ("bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS")
Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Link: https://lore.kernel.org/bpf/20210914023351.3664499-1-houtao1@huawei.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Add oversize check before call kvcalloc()</title>
<updated>2021-09-30T08:13:04+00:00</updated>
<author>
<name>Bixuan Cui</name>
<email>cuibixuan@huawei.com</email>
</author>
<published>2021-09-11T00:55:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=b5fe7cdfee5901ce5513c30e554d51536e003bde'/>
<id>urn:sha1:b5fe7cdfee5901ce5513c30e554d51536e003bde</id>
<content type='text'>
[ Upstream commit 0e6491b559704da720f6da09dd0a52c4df44c514 ]

Commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls") add the
oversize check. When the allocation is larger than what kmalloc() supports,
the following warning triggered:

WARNING: CPU: 0 PID: 8408 at mm/util.c:597 kvmalloc_node+0x108/0x110 mm/util.c:597
Modules linked in:
CPU: 0 PID: 8408 Comm: syz-executor221 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:kvmalloc_node+0x108/0x110 mm/util.c:597
Call Trace:
 kvmalloc include/linux/mm.h:806 [inline]
 kvmalloc_array include/linux/mm.h:824 [inline]
 kvcalloc include/linux/mm.h:829 [inline]
 check_btf_line kernel/bpf/verifier.c:9925 [inline]
 check_btf_info kernel/bpf/verifier.c:10049 [inline]
 bpf_check+0xd634/0x150d0 kernel/bpf/verifier.c:13759
 bpf_prog_load kernel/bpf/syscall.c:2301 [inline]
 __sys_bpf+0x11181/0x126e0 kernel/bpf/syscall.c:4587
 __do_sys_bpf kernel/bpf/syscall.c:4691 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:4689 [inline]
 __x64_sys_bpf+0x78/0x90 kernel/bpf/syscall.c:4689
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported-by: syzbot+f3e749d4c662818ae439@syzkaller.appspotmail.com
Signed-off-by: Bixuan Cui &lt;cuibixuan@huawei.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/bpf/20210911005557.45518-1-cuibixuan@huawei.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix possible out of bound write in narrow load handling</title>
<updated>2021-09-15T08:02:27+00:00</updated>
<author>
<name>Andrey Ignatov</name>
<email>rdna@fb.com</email>
</author>
<published>2021-08-20T16:39:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=b0d7d8029720ceaea02e272353afc55e301b25bf'/>
<id>urn:sha1:b0d7d8029720ceaea02e272353afc55e301b25bf</id>
<content type='text'>
[ Upstream commit d7af7e497f0308bc97809cc48b58e8e0f13887e1 ]

Fix a verifier bug found by smatch static checker in [0].

This problem has never been seen in prod to my best knowledge. Fixing it
still seems to be a good idea since it's hard to say for sure whether
it's possible or not to have a scenario where a combination of
convert_ctx_access() and a narrow load would lead to an out of bound
write.

When narrow load is handled, one or two new instructions are added to
insn_buf array, but before it was only checked that

	cnt &gt;= ARRAY_SIZE(insn_buf)

And it's safe to add a new instruction to insn_buf[cnt++] only once. The
second try will lead to out of bound write. And this is what can happen
if `shift` is set.

Fix it by making sure that if the BPF_RSH instruction has to be added in
addition to BPF_AND then there is enough space for two more instructions
in insn_buf.

The full report [0] is below:

kernel/bpf/verifier.c:12304 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array
kernel/bpf/verifier.c:12311 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array

kernel/bpf/verifier.c
    12282
    12283 			insn-&gt;off = off &amp; ~(size_default - 1);
    12284 			insn-&gt;code = BPF_LDX | BPF_MEM | size_code;
    12285 		}
    12286
    12287 		target_size = 0;
    12288 		cnt = convert_ctx_access(type, insn, insn_buf, env-&gt;prog,
    12289 					 &amp;target_size);
    12290 		if (cnt == 0 || cnt &gt;= ARRAY_SIZE(insn_buf) ||
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Bounds check.

    12291 		    (ctx_field_size &amp;&amp; !target_size)) {
    12292 			verbose(env, "bpf verifier is misconfigured\n");
    12293 			return -EINVAL;
    12294 		}
    12295
    12296 		if (is_narrower_load &amp;&amp; size &lt; target_size) {
    12297 			u8 shift = bpf_ctx_narrow_access_offset(
    12298 				off, size, size_default) * 8;
    12299 			if (ctx_field_size &lt;= 4) {
    12300 				if (shift)
    12301 					insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
                                                         ^^^^^
increment beyond end of array

    12302 									insn-&gt;dst_reg,
    12303 									shift);
--&gt; 12304 				insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn-&gt;dst_reg,
                                                 ^^^^^
out of bounds write

    12305 								(1 &lt;&lt; size * 8) - 1);
    12306 			} else {
    12307 				if (shift)
    12308 					insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH,
    12309 									insn-&gt;dst_reg,
    12310 									shift);
    12311 				insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn-&gt;dst_reg,
                                        ^^^^^^^^^^^^^^^
Same.

    12312 								(1ULL &lt;&lt; size * 8) - 1);
    12313 			}
    12314 		}
    12315
    12316 		new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
    12317 		if (!new_prog)
    12318 			return -ENOMEM;
    12319
    12320 		delta += cnt - 1;
    12321
    12322 		/* keep walking new program and skip insns we just inserted */
    12323 		env-&gt;prog = new_prog;
    12324 		insn      = new_prog-&gt;insnsi + i + delta;
    12325 	}
    12326
    12327 	return 0;
    12328 }

[0] https://lore.kernel.org/bpf/20210817050843.GA21456@kili/

v1-&gt;v2:
- clarify that problem was only seen by static checker but not in prod;

Fixes: 46f53a65d2de ("bpf: Allow narrow loads with offset &gt; 0")
Reported-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Andrey Ignatov &lt;rdna@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20210820163935.1902398-1-rdna@fb.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix potential memleak and UAF in the verifier.</title>
<updated>2021-09-15T08:02:11+00:00</updated>
<author>
<name>He Fengqing</name>
<email>hefengqing@huawei.com</email>
</author>
<published>2021-07-14T10:18:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=746f0ad44bc965e4064201eb42fc7b0b3d21c770'/>
<id>urn:sha1:746f0ad44bc965e4064201eb42fc7b0b3d21c770</id>
<content type='text'>
[ Upstream commit 75f0fc7b48ad45a2e5736bcf8de26c8872fe8695 ]

In bpf_patch_insn_data(), we first use the bpf_patch_insn_single() to
insert new instructions, then use adjust_insn_aux_data() to adjust
insn_aux_data. If the old env-&gt;prog have no enough room for new inserted
instructions, we use bpf_prog_realloc to construct new_prog and free the
old env-&gt;prog.

There have two errors here. First, if adjust_insn_aux_data() return
ENOMEM, we should free the new_prog. Second, if adjust_insn_aux_data()
return ENOMEM, bpf_patch_insn_data() will return NULL, and env-&gt;prog has
been freed in bpf_prog_realloc, but we will use it in bpf_check().

So in this patch, we make the adjust_insn_aux_data() never fails. In
bpf_patch_insn_data(), we first pre-malloc memory for the new
insn_aux_data, then call bpf_patch_insn_single() to insert new
instructions, at last call adjust_insn_aux_data() to adjust
insn_aux_data.

Fixes: 8041902dae52 ("bpf: adjust insn_aux_data when patching insns")
Signed-off-by: He Fengqing &lt;hefengqing@huawei.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Link: https://lore.kernel.org/bpf/20210714101815.164322-1-hefengqing@huawei.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix ringbuf helper function compatibility</title>
<updated>2021-08-23T21:09:10+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2021-08-23T19:02:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=5b029a32cfe4600f5e10e36b41778506b90fd4de'/>
<id>urn:sha1:5b029a32cfe4600f5e10e36b41778506b90fd4de</id>
<content type='text'>
Commit 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support
for it") extended check_map_func_compatibility() by enforcing map -&gt; helper
function match, but not helper -&gt; map type match.

Due to this all of the bpf_ringbuf_*() helper functions could be used with
a wrong map type such as array or hash map, leading to invalid access due
to type confusion.

Also, both BPF_FUNC_ringbuf_{submit,discard} have ARG_PTR_TO_ALLOC_MEM as
argument and not a BPF map. Therefore, their check_map_func_compatibility()
presence is incorrect since it's only for map type checking.

Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: Ryota Shiga (Flatt Security)
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Clear zext_dst of dead insns</title>
<updated>2021-08-13T15:43:43+00:00</updated>
<author>
<name>Ilya Leoshkevich</name>
<email>iii@linux.ibm.com</email>
</author>
<published>2021-08-12T15:18:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=45c709f8c71b525b51988e782febe84ce933e7e0'/>
<id>urn:sha1:45c709f8c71b525b51988e782febe84ce933e7e0</id>
<content type='text'>
"access skb fields ok" verifier test fails on s390 with the "verifier
bug. zext_dst is set, but no reg is defined" message. The first insns
of the test prog are ...

   0:	61 01 00 00 00 00 00 00 	ldxw %r0,[%r1+0]
   8:	35 00 00 01 00 00 00 00 	jge %r0,0,1
  10:	61 01 00 08 00 00 00 00 	ldxw %r0,[%r1+8]

... and the 3rd one is dead (this does not look intentional to me, but
this is a separate topic).

sanitize_dead_code() converts dead insns into "ja -1", but keeps
zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such
an insn, it sees this discrepancy and bails. This problem can be seen
only with JITs whose bpf_jit_needs_zext() returns true.

Fix by clearning dead insns' zext_dst.

The commits that contributed to this problem are:

1. 5aa5bd14c5f8 ("bpf: add initial suite for selftests"), which
   introduced the test with the dead code.
2. 5327ed3d44b7 ("bpf: verifier: mark verified-insn with
   sub-register zext flag"), which introduced the zext_dst flag.
3. 83a2881903f3 ("bpf: Account for BPF_FETCH in
   insn_has_def32()"), which introduced the sanity check.
4. 9183671af6db ("bpf: Fix leakage under speculation on
   mispredicted branches"), which bisect points to.

It's best to fix this on stable branches that contain the second one,
since that's the point where the inconsistency was introduced.

Fixes: 5327ed3d44b7 ("bpf: verifier: mark verified-insn with sub-register zext flag")
Signed-off-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20210812151811.184086-2-iii@linux.ibm.com
</content>
</entry>
<entry>
<title>bpf: Add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers</title>
<updated>2021-08-11T18:45:43+00:00</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2021-08-10T23:05:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=2d3a1e3615c5449a4583010f41a6f824a4ffa03e'/>
<id>urn:sha1:2d3a1e3615c5449a4583010f41a6f824a4ffa03e</id>
<content type='text'>
Currently, if bpf_get_current_cgroup_id() or
bpf_get_current_ancestor_cgroup_id() helper is
called with sleepable programs e.g., sleepable
fentry/fmod_ret/fexit/lsm programs, a rcu warning
may appear. For example, if I added the following
hack to test_progs/test_lsm sleepable fentry program
test_sys_setdomainname:

  --- a/tools/testing/selftests/bpf/progs/lsm.c
  +++ b/tools/testing/selftests/bpf/progs/lsm.c
  @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
          int buf = 0;
          long ret;

  +       __u64 cg_id = bpf_get_current_cgroup_id();
  +       if (cg_id == 1000)
  +               copy_test++;
  +
          ret = bpf_copy_from_user(&amp;buf, sizeof(buf), ptr);
          if (len == -2 &amp;&amp; ret == 0 &amp;&amp; buf == 1234)
                  copy_test++;

I will hit the following rcu warning:

  include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
  other info that might help us debug this:
    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by test_progs/260:
      #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
    stack backtrace:
    CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    Call Trace:
      dump_stack_lvl+0x56/0x7b
      bpf_get_current_cgroup_id+0x9c/0xb1
      bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
      bpf_trampoline_6442469132_0+0x2d/0x1000
      __x64_sys_setdomainname+0x5/0x110
      do_syscall_64+0x3a/0x80
      entry_SYSCALL_64_after_hwframe+0x44/0xae

I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
syzbot reported a similar issue in [1] for syscall program. Helper
bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
has the following callchain:
   task_dfl_cgroup
     task_css_set
       task_css_set_check
and we have
   #define task_css_set_check(task, __c)                                   \
           rcu_dereference_check((task)-&gt;cgroups,                          \
                   lockdep_is_held(&amp;cgroup_mutex) ||                       \
                   lockdep_is_held(&amp;css_set_lock) ||                       \
                   ((task)-&gt;flags &amp; PF_EXITING) || (__c))
Since cgroup_mutex/css_set_lock is not held and the task
is not existing and rcu read_lock is not held, a warning
will be issued. Note that bpf sleepable program is protected by
rcu_read_lock_trace().

The above sleepable bpf programs are already protected
by migrate_disable(). Adding rcu_read_lock() in these
two helpers will silence the above warning.
I marked the patch fixing 95b861a7935b
("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
which added bpf_get_current_ancestor_cgroup_id() to tracing programs
in 5.14. I think backporting 5.14 is probably good enough as sleepable
progrems are not widely used.

This patch should fix [1] as well since syscall program is a sleepable
program protected with migrate_disable().

 [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/

Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20210810230537.2864668-1-yhs@fb.com
</content>
</entry>
<entry>
<title>bpf, core: Fix kernel-doc notation</title>
<updated>2021-08-10T11:09:28+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2021-08-09T21:52:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=019d0454c61707879cf9853c894e0a191f6b9774'/>
<id>urn:sha1:019d0454c61707879cf9853c894e0a191f6b9774</id>
<content type='text'>
Fix kernel-doc warnings in kernel/bpf/core.c (found by scripts/kernel-doc
and W=1 builds). That is, correct a function name in a comment and add
return descriptions for 2 functions.

Fixes these kernel-doc warnings:

  kernel/bpf/core.c:1372: warning: expecting prototype for __bpf_prog_run(). Prototype was for ___bpf_prog_run() instead
  kernel/bpf/core.c:1372: warning: No description found for return value of '___bpf_prog_run'
  kernel/bpf/core.c:1883: warning: No description found for return value of 'bpf_prog_select_runtime'

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20210809215229.7556-1-rdunlap@infradead.org
</content>
</entry>
<entry>
<title>bpf: Fix potentially incorrect results with bpf_get_local_storage()</title>
<updated>2021-08-10T08:27:16+00:00</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2021-08-10T01:04:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=a2baf4e8bb0f306fbed7b5e6197c02896a638ab5'/>
<id>urn:sha1:a2baf4e8bb0f306fbed7b5e6197c02896a638ab5</id>
<content type='text'>
Commit b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
helper") fixed a bug for bpf_get_local_storage() helper so different tasks
won't mess up with each other's percpu local storage.

The percpu data contains 8 slots so it can hold up to 8 contexts (same or
different tasks), for 8 different program runs, at the same time. This in
general is sufficient. But our internal testing showed the following warning
multiple times:

  [...]
  warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193
     __cgroup_bpf_run_filter_sock_ops+0x13e/0x180
  RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180
  &lt;IRQ&gt;
   tcp_call_bpf.constprop.99+0x93/0xc0
   tcp_conn_request+0x41e/0xa50
   ? tcp_rcv_state_process+0x203/0xe00
   tcp_rcv_state_process+0x203/0xe00
   ? sk_filter_trim_cap+0xbc/0x210
   ? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160
   tcp_v6_do_rcv+0x181/0x3e0
   tcp_v6_rcv+0xc65/0xcb0
   ip6_protocol_deliver_rcu+0xbd/0x450
   ip6_input_finish+0x11/0x20
   ip6_input+0xb5/0xc0
   ip6_sublist_rcv_finish+0x37/0x50
   ip6_sublist_rcv+0x1dc/0x270
   ipv6_list_rcv+0x113/0x140
   __netif_receive_skb_list_core+0x1a0/0x210
   netif_receive_skb_list_internal+0x186/0x2a0
   gro_normal_list.part.170+0x19/0x40
   napi_complete_done+0x65/0x150
   mlx5e_napi_poll+0x1ae/0x680
   __napi_poll+0x25/0x120
   net_rx_action+0x11e/0x280
   __do_softirq+0xbb/0x271
   irq_exit_rcu+0x97/0xa0
   common_interrupt+0x7f/0xa0
   &lt;/IRQ&gt;
   asm_common_interrupt+0x1e/0x40
  RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac
   ? __cgroup_bpf_run_filter_skb+0x378/0x4e0
   ? do_softirq+0x34/0x70
   ? ip6_finish_output2+0x266/0x590
   ? ip6_finish_output+0x66/0xa0
   ? ip6_output+0x6c/0x130
   ? ip6_xmit+0x279/0x550
   ? ip6_dst_check+0x61/0xd0
  [...]

Using drgn [0] to dump the percpu buffer contents showed that on this CPU
slot 0 is still available, but slots 1-7 are occupied and those tasks in
slots 1-7 mostly don't exist any more. So we might have issues in
bpf_cgroup_storage_unset().

Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset().
Currently, it tries to unset "current" slot with searching from the start.
So the following sequence is possible:

  1. A task is running and claims slot 0
  2. Running BPF program is done, and it checked slot 0 has the "task"
     and ready to reset it to NULL (not yet).
  3. An interrupt happens, another BPF program runs and it claims slot 1
     with the *same* task.
  4. The unset() in interrupt context releases slot 0 since it matches "task".
  5. Interrupt is done, the task in process context reset slot 0.

At the end, slot 1 is not reset and the same process can continue to occupy
slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 BPF
program won't be able to claim an empty slot and a warning will be issued.

To fix the issue, for unset() function, we should traverse from the last slot
to the first. This way, the above issue can be avoided.

The same reverse traversal should also be done in bpf_get_local_storage() helper
itself. Otherwise, incorrect local storage may be returned to BPF program.

  [0] https://github.com/osandov/drgn

Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20210810010413.1976277-1-yhs@fb.com
</content>
</entry>
</feed>
