<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/core/sock_map.c, branch linux-6.0.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.0.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.0.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2022-12-31T12:26:09+00:00</updated>
<entry>
<title>bpf, sockmap: fix race in sock_map_free()</title>
<updated>2022-12-31T12:26:09+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-12-02T11:16:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e8b2b392a646bf5cb9413c1cc7a39d99c1b65a62'/>
<id>urn:sha1:e8b2b392a646bf5cb9413c1cc7a39d99c1b65a62</id>
<content type='text'>
[ Upstream commit 0a182f8d607464911756b4dbef5d6cad8de22469 ]

sock_map_free() calls release_sock(sk) without owning a reference
on the socket. This can cause use-after-free as syzbot found [1]

Jakub Sitnicki already took care of a similar issue
in sock_hash_free() in commit 75e68e5bf2c7 ("bpf, sockhash:
Synchronize delete from bucket list on map free")

[1]
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 0 PID: 3785 at lib/refcount.c:31 refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31
Modules linked in:
CPU: 0 PID: 3785 Comm: kworker/u4:6 Not tainted 6.1.0-rc7-syzkaller-00103-gef4d3ea40565 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Workqueue: events_unbound bpf_map_free_deferred
RIP: 0010:refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31
Code: 68 8b 31 c0 e8 75 71 15 fd 0f 0b e9 64 ff ff ff e8 d9 6e 4e fd c6 05 62 9c 3d 0a 01 48 c7 c7 80 bb 68 8b 31 c0 e8 54 71 15 fd &lt;0f&gt; 0b e9 43 ff ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a2 fe ff
RSP: 0018:ffffc9000456fb60 EFLAGS: 00010246
RAX: eae59bab72dcd700 RBX: 0000000000000004 RCX: ffff8880207057c0
RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000
RBP: 0000000000000004 R08: ffffffff816fdabd R09: fffff520008adee5
R10: fffff520008adee5 R11: 1ffff920008adee4 R12: 0000000000000004
R13: dffffc0000000000 R14: ffff88807b1c6c00 R15: 1ffff1100f638dcf
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b30c30000 CR3: 000000000d08e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
&lt;TASK&gt;
__refcount_dec include/linux/refcount.h:344 [inline]
refcount_dec include/linux/refcount.h:359 [inline]
__sock_put include/net/sock.h:779 [inline]
tcp_release_cb+0x2d0/0x360 net/ipv4/tcp_output.c:1092
release_sock+0xaf/0x1c0 net/core/sock.c:3468
sock_map_free+0x219/0x2c0 net/core/sock_map.c:356
process_one_work+0x81c/0xd10 kernel/workqueue.c:2289
worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
&lt;/TASK&gt;

Fixes: 7e81a3530206 ("bpf: Sockmap, ensure sock lock held during tear down")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Cc: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/r/20221202111640.2745533-1-edumazet@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf, sock_map: Move cancel_work_sync() out of sock lock</title>
<updated>2022-11-16T09:03:52+00:00</updated>
<author>
<name>Cong Wang</name>
<email>cong.wang@bytedance.com</email>
</author>
<published>2022-11-02T04:34:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=543e28206b0142086c428b4e955c87a3ae1e832e'/>
<id>urn:sha1:543e28206b0142086c428b4e955c87a3ae1e832e</id>
<content type='text'>
[ Upstream commit 8bbabb3fddcd0f858be69ed5abc9b470a239d6f2 ]

Stanislav reported a lockdep warning, which is caused by the
cancel_work_sync() called inside sock_map_close(), as analyzed
below by Jakub:

psock-&gt;work.func = sk_psock_backlog()
  ACQUIRE psock-&gt;work_mutex
    sk_psock_handle_skb()
      skb_send_sock()
        __skb_send_sock()
          sendpage_unlocked()
            kernel_sendpage()
              sock-&gt;ops-&gt;sendpage = inet_sendpage()
                sk-&gt;sk_prot-&gt;sendpage = tcp_sendpage()
                  ACQUIRE sk-&gt;sk_lock
                    tcp_sendpage_locked()
                  RELEASE sk-&gt;sk_lock
  RELEASE psock-&gt;work_mutex

sock_map_close()
  ACQUIRE sk-&gt;sk_lock
  sk_psock_stop()
    sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED)
    cancel_work_sync()
      __cancel_work_timer()
        __flush_work()
          // wait for psock-&gt;work to finish
  RELEASE sk-&gt;sk_lock

We can move the cancel_work_sync() out of the sock lock protection,
but still before saved_close() was called.

Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Reported-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Tested-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Acked-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Link: https://lore.kernel.org/bpf/20221102043417.279409-1-xiyou.wangcong@gmail.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator</title>
<updated>2022-08-10T17:12:48+00:00</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2022-08-10T08:05:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f0d2b2716d71778d0b0c8eaa433c073287d69d93'/>
<id>urn:sha1:f0d2b2716d71778d0b0c8eaa433c073287d69d93</id>
<content type='text'>
sock_map_iter_attach_target() acquires a map uref, and the uref may be
released before or in the middle of iterating map elements. For example,
the uref could be released in sock_map_iter_detach_target() as part of
bpf_link_release(), or could be released in bpf_map_put_with_uref() as
part of bpf_map_release().

Fixing it by acquiring an extra map uref in .init_seq_private and
releasing it in .fini_seq_private.

Fixes: 0365351524d7 ("net: Allow iterating sockmap and sockhash")
Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/r/20220810080538.1845898-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix sockmap calling sleepable function in teardown path</title>
<updated>2022-06-28T07:30:03+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2022-06-28T03:58:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=697fb80a53642be624f5121b6ca9d66769c180e0'/>
<id>urn:sha1:697fb80a53642be624f5121b6ca9d66769c180e0</id>
<content type='text'>
syzbot reproduced the bug ...

 BUG: sleeping function called from invalid context at kernel/workqueue.c:3010

... with the following stack trace fragment ...

 start_flush_work kernel/workqueue.c:3010 [inline]
 __flush_work+0x109/0xb10 kernel/workqueue.c:3074
 __cancel_work_timer+0x3f9/0x570 kernel/workqueue.c:3162
 sk_psock_stop+0x4cb/0x630 net/core/skmsg.c:802
 sock_map_destroy+0x333/0x760 net/core/sock_map.c:1581
 inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1130
 __tcp_close+0xd5b/0x12b0 net/ipv4/tcp.c:2897
 tcp_close+0x29/0xc0 net/ipv4/tcp.c:2909

... introduced by d8616ee2affc. Do a quick trace of the code path and the
bug is obvious:

   inet_csk_destroy_sock(sk)
     sk_prot-&gt;destroy(sk);      &lt;--- sock_map_destroy
        sk_psock_stop(, true);   &lt;--- true so cancel workqueue
          cancel_work_sync()     &lt;--- splat, because *_bh_disable()

We can not call cancel_work_sync() from inside destroy path. So mark
the sk_psock_stop call to skip this cancel_work_sync(). This will avoid
the BUG, but means we may run sk_psock_backlog after or during the
destroy op. We zapped the ingress_skb queue in sk_psock_stop (safe to
do with local_bh_disable) so its empty and the sk_psock_backlog work
item will not find any pkts to process here. However, because we are
not going to wait for it or clear its -&gt;state its possible it kicks off
or is already running. This should be 'safe' up until psock drops its
refcnt to psock-&gt;sk. The sock_put() that drops this reference is only
done at psock destroy time from sk_psock_destroy(). This is done through
workqueue when sk_psock_drop() is called on psock refnt reaches 0.
And importantly sk_psock_destroy() does a cancel_work_sync(). So trivial
fix works.

I've had hit or miss luck reproducing this caught it once or twice with
the provided reproducer when running with many runners. However, syzkaller
is very good at reproducing so relying on syzkaller to verify fix.

Fixes: d8616ee2affc ("bpf, sockmap: Fix sk-&gt;sk_forward_alloc warn_on in sk_stream_kill_queues")
Reported-by: syzbot+140186ceba0c496183bc@syzkaller.appspotmail.com
Suggested-by: Hillf Danton &lt;hdanton@sina.com&gt;
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Wang Yufen &lt;wangyufen@huawei.com&gt;
Link: https://lore.kernel.org/bpf/20220628035803.317876-1-john.fastabend@gmail.com
</content>
</entry>
<entry>
<title>bpf, sockmap: Fix sk-&gt;sk_forward_alloc warn_on in sk_stream_kill_queues</title>
<updated>2022-06-02T23:26:58+00:00</updated>
<author>
<name>Wang Yufen</name>
<email>wangyufen@huawei.com</email>
</author>
<published>2022-05-24T07:53:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d8616ee2affcff37c5d315310da557a694a3303d'/>
<id>urn:sha1:d8616ee2affcff37c5d315310da557a694a3303d</id>
<content type='text'>
During TCP sockmap redirect pressure test, the following warning is triggered:

WARNING: CPU: 3 PID: 2145 at net/core/stream.c:205 sk_stream_kill_queues+0xbc/0xd0
CPU: 3 PID: 2145 Comm: iperf Kdump: loaded Tainted: G        W         5.10.0+ #9
Call Trace:
 inet_csk_destroy_sock+0x55/0x110
 inet_csk_listen_stop+0xbb/0x380
 tcp_close+0x41b/0x480
 inet_release+0x42/0x80
 __sock_release+0x3d/0xa0
 sock_close+0x11/0x20
 __fput+0x9d/0x240
 task_work_run+0x62/0x90
 exit_to_user_mode_prepare+0x110/0x120
 syscall_exit_to_user_mode+0x27/0x190
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The reason we observed is that:

When the listener is closing, a connection may have completed the three-way
handshake but not accepted, and the client has sent some packets. The child
sks in accept queue release by inet_child_forget()-&gt;inet_csk_destroy_sock(),
but psocks of child sks have not released.

To fix, add sock_map_destroy to release psocks.

Signed-off-by: Wang Yufen &lt;wangyufen@huawei.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20220524075311.649153-1-wangyufen@huawei.com
</content>
</entry>
<entry>
<title>bpf: Compute map_btf_id during build time</title>
<updated>2022-04-26T18:35:21+00:00</updated>
<author>
<name>Menglong Dong</name>
<email>imagedong@tencent.com</email>
</author>
<published>2022-04-25T13:32:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c317ab71facc2cd0a94145973318a4c914e11acc'/>
<id>urn:sha1:c317ab71facc2cd0a94145973318a4c914e11acc</id>
<content type='text'>
For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map
types are computed during vmlinux-btf init:

  btf_parse_vmlinux() -&gt; btf_vmlinux_map_ids_init()

It will lookup the btf_type according to the 'map_btf_name' field in
'struct bpf_map_ops'. This process can be done during build time,
thanks to Jiri's resolve_btfids.

selftest of map_ptr has passed:

  $96 map_ptr:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: support BPF_PROG_QUERY for progs attached to sockmap</title>
<updated>2022-01-21T05:30:58+00:00</updated>
<author>
<name>Di Zhu</name>
<email>zhudi2@huawei.com</email>
</author>
<published>2022-01-19T01:40:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=748cd5729ac7421091316e32dcdffb0578563880'/>
<id>urn:sha1:748cd5729ac7421091316e32dcdffb0578563880</id>
<content type='text'>
Right now there is no way to query whether BPF programs are
attached to a sockmap or not.

we can use the standard interface in libbpf to query, such as:
bpf_prog_query(mapFd, BPF_SK_SKB_STREAM_PARSER, 0, NULL, ...);
the mapFd is the fd of sockmap.

Signed-off-by: Di Zhu &lt;zhudi2@huawei.com&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Link: https://lore.kernel.org/r/20220119014005.1209-1-zhudi2@huawei.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf, sockmap: Fix double bpf_prog_put on error case in map_link</title>
<updated>2022-01-05T19:43:08+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2022-01-04T21:46:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=218d747a4142f281a256687bb513a135c905867b'/>
<id>urn:sha1:218d747a4142f281a256687bb513a135c905867b</id>
<content type='text'>
sock_map_link() is called to update a sockmap entry with a sk. But, if the
sock_map_init_proto() call fails then we return an error to the map_update
op against the sockmap. In the error path though we need to cleanup psock
and dec the refcnt on any programs associated with the map, because we
refcnt them early in the update process to ensure they are pinned for the
psock. (This avoids a race where user deletes programs while also updating
the map with new socks.)

In current code we do the prog refcnt dec explicitely by calling
bpf_prog_put() when the program was found in the map. But, after commit
'38207a5e81230' in this error path we've already done the prog to psock
assignment so the programs have a reference from the psock as well. This
then causes the psock tear down logic, invoked by sk_psock_put() in the
error path, to similarly call bpf_prog_put on the programs there.

To be explicit this logic does the prog-&gt;psock assignment:

  if (msg_*)
    psock_set_prog(...)

Then the error path under the out_progs label does a similar check and
dec with:

  if (msg_*)
     bpf_prog_put(...)

And the teardown logic sk_psock_put() does ...

  psock_set_prog(msg_*, NULL)

... triggering another bpf_prog_put(...). Then KASAN gives us this splat,
found by syzbot because we've created an inbalance between bpf_prog_inc and
bpf_prog_put calling put twice on the program.

  BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline]
  BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline] kernel/bpf/syscall.c:1829
  BUG: KASAN: vmalloc-out-of-bounds in bpf_prog_put+0x8c/0x4f0 kernel/bpf/syscall.c:1829 kernel/bpf/syscall.c:1829
  Read of size 8 at addr ffffc90000e76038 by task syz-executor020/3641

To fix clean up error path so it doesn't try to do the bpf_prog_put in the
error path once progs are assigned then it relies on the normal psock
tear down logic to do complete cleanup.

For completness we also cover the case whereh sk_psock_init_strp() fails,
but this is not expected because it indicates an incorrect socket type
and should be caught earlier.

Fixes: 38207a5e8123 ("bpf, sockmap: Attach map progs to psock early for feature probes")
Reported-by: syzbot+bb73e71cf4b8fd376a4f@syzkaller.appspotmail.com
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20220104214645.290900-1-john.fastabend@gmail.com
</content>
</entry>
<entry>
<title>bpf: Introduce MEM_RDONLY flag</title>
<updated>2021-12-18T21:27:41+00:00</updated>
<author>
<name>Hao Luo</name>
<email>haoluo@google.com</email>
</author>
<published>2021-12-17T00:31:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=20b2aff4bc15bda809f994761d5719827d66c0b4'/>
<id>urn:sha1:20b2aff4bc15bda809f994761d5719827d66c0b4</id>
<content type='text'>
This patch introduce a flag MEM_RDONLY to tag a reg value
pointing to read-only memory. It makes the following changes:

1. PTR_TO_RDWR_BUF -&gt; PTR_TO_BUF
2. PTR_TO_RDONLY_BUF -&gt; PTR_TO_BUF | MEM_RDONLY

Signed-off-by: Hao Luo &lt;haoluo@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20211217003152.48334-6-haoluo@google.com
</content>
</entry>
<entry>
<title>bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL</title>
<updated>2021-12-18T21:27:23+00:00</updated>
<author>
<name>Hao Luo</name>
<email>haoluo@google.com</email>
</author>
<published>2021-12-17T00:31:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c25b2ae136039ffa820c26138ed4a5e5f3ab3841'/>
<id>urn:sha1:c25b2ae136039ffa820c26138ed4a5e5f3ab3841</id>
<content type='text'>
We have introduced a new type to make bpf_reg composable, by
allocating bits in the type to represent flags.

One of the flags is PTR_MAYBE_NULL which indicates a pointer
may be NULL. This patch switches the qualified reg_types to
use this flag. The reg_types changed in this patch include:

1. PTR_TO_MAP_VALUE_OR_NULL
2. PTR_TO_SOCKET_OR_NULL
3. PTR_TO_SOCK_COMMON_OR_NULL
4. PTR_TO_TCP_SOCK_OR_NULL
5. PTR_TO_BTF_ID_OR_NULL
6. PTR_TO_MEM_OR_NULL
7. PTR_TO_RDONLY_BUF_OR_NULL
8. PTR_TO_RDWR_BUF_OR_NULL

Signed-off-by: Hao Luo &lt;haoluo@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/r/20211217003152.48334-5-haoluo@google.com
</content>
</entry>
</feed>
