summaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)AuthorFilesLines
2022-01-05bpf, sockmap: Fix double bpf_prog_put on error case in map_linkJohn Fastabend1-8/+13
sock_map_link() is called to update a sockmap entry with a sk. But, if the sock_map_init_proto() call fails then we return an error to the map_update op against the sockmap. In the error path though we need to cleanup psock and dec the refcnt on any programs associated with the map, because we refcnt them early in the update process to ensure they are pinned for the psock. (This avoids a race where user deletes programs while also updating the map with new socks.) In current code we do the prog refcnt dec explicitely by calling bpf_prog_put() when the program was found in the map. But, after commit '38207a5e81230' in this error path we've already done the prog to psock assignment so the programs have a reference from the psock as well. This then causes the psock tear down logic, invoked by sk_psock_put() in the error path, to similarly call bpf_prog_put on the programs there. To be explicit this logic does the prog->psock assignment: if (msg_*) psock_set_prog(...) Then the error path under the out_progs label does a similar check and dec with: if (msg_*) bpf_prog_put(...) And the teardown logic sk_psock_put() does ... psock_set_prog(msg_*, NULL) ... triggering another bpf_prog_put(...). Then KASAN gives us this splat, found by syzbot because we've created an inbalance between bpf_prog_inc and bpf_prog_put calling put twice on the program. BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline] BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline] kernel/bpf/syscall.c:1829 BUG: KASAN: vmalloc-out-of-bounds in bpf_prog_put+0x8c/0x4f0 kernel/bpf/syscall.c:1829 kernel/bpf/syscall.c:1829 Read of size 8 at addr ffffc90000e76038 by task syz-executor020/3641 To fix clean up error path so it doesn't try to do the bpf_prog_put in the error path once progs are assigned then it relies on the normal psock tear down logic to do complete cleanup. For completness we also cover the case whereh sk_psock_init_strp() fails, but this is not expected because it indicates an incorrect socket type and should be caught earlier. Fixes: 38207a5e8123 ("bpf, sockmap: Attach map progs to psock early for feature probes") Reported-by: syzbot+bb73e71cf4b8fd376a4f@syzkaller.appspotmail.com Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220104214645.290900-1-john.fastabend@gmail.com
2021-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller9-40/+51
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your *net-next* tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg|ret|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31lwtunnel: Validate RTA_ENCAP_TYPE attribute lengthDavid Ahern1-0/+4
lwtunnel_valid_encap_type_attr is used to validate encap attributes within a multipath route. Add length validation checking to the type. lwtunnel_valid_encap_type_attr is called converting attributes to fib{6,}_config struct which means it is used before fib_get_nhs, ip6_route_multipath_add, and ip6_route_multipath_del - other locations that use rtnh_ok and then nla_get_u16 on RTA_ENCAP_TYPE attribute. Fixes: 9ed59592e3e3 ("lwtunnel: fix autoload of lwt modules") Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-30bpf: Allow bpf_local_storage to be used by sleepable programsKP Singh1-1/+7
Other maps like hashmaps are already available to sleepable programs. Sleepable BPF programs run under trace RCU. Allow task, sk and inode storage to be used from sleepable programs. This allows sleepable and non-sleepable programs to provide shareable annotations on kernel objects. Sleepable programs run in trace RCU where as non-sleepable programs run in a normal RCU critical section i.e. __bpf_prog_enter{_sleepable} and __bpf_prog_exit{_sleepable}) (rcu_read_lock or rcu_read_lock_trace). In order to make the local storage maps accessible to both sleepable and non-sleepable programs, one needs to call both call_rcu_tasks_trace and call_rcu to wait for both trace and classical RCU grace periods to expire before freeing memory. Paul's work on call_rcu_tasks_trace allows us to have per CPU queueing for call_rcu_tasks_trace. This behaviour can be achieved by setting rcupdate.rcu_task_enqueue_lim=<num_cpus> boot parameter. In light of these new performance changes and to keep the local storage code simple, avoid adding a new flag for sleepable maps / local storage to select the RCU synchronization (trace / classical). Also, update the dereferencing of the pointers to use rcu_derference_check (with either the trace or normal RCU locks held) with a common bpf_rcu_lock_held helper method. Signed-off-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20211224152916.1550677-2-kpsingh@kernel.org
2021-12-30Merge tag 'for-net-next-2021-12-29' of ↵Jakub Kicinski1-0/+24
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for Foxconn MT7922A - Add support for Realtek RTL8852AE - Rework HCI event handling to use skb_pull_data * tag 'for-net-next-2021-12-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (62 commits) Bluetooth: MGMT: Fix spelling mistake "simultanous" -> "simultaneous" Bluetooth: vhci: Set HCI_QUIRK_VALID_LE_STATES Bluetooth: MGMT: Fix LE simultaneous roles UUID if not supported Bluetooth: hci_sync: Add check simultaneous roles support Bluetooth: hci_sync: Wait for proper events when connecting LE Bluetooth: hci_sync: Add support for waiting specific LE subevents Bluetooth: hci_sync: Add hci_le_create_conn_sync Bluetooth: hci_event: Use skb_pull_data when processing inquiry results Bluetooth: hci_sync: Push sync command cancellation to workqueue Bluetooth: hci_qca: Stop IBS timer during BT OFF Bluetooth: btusb: Add support for Foxconn MT7922A Bluetooth: btintel: Add missing quirks and msft ext for legacy bootloader Bluetooth: btusb: Add two more Bluetooth parts for WCN6855 Bluetooth: L2CAP: Fix using wrong mode Bluetooth: hci_sync: Fix not always pausing advertising when necessary Bluetooth: mgmt: Make use of mgmt_send_event_skb in MGMT_EV_DEVICE_CONNECTED Bluetooth: mgmt: Make use of mgmt_send_event_skb in MGMT_EV_DEVICE_FOUND Bluetooth: mgmt: Introduce mgmt_alloc_skb and mgmt_send_event_skb Bluetooth: btusb: Return error code when getting patch status failed Bluetooth: btusb: Handle download_firmware failure cases ... ==================== Link: https://lore.kernel.org/r/20211229211258.2290966-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-29net: Don't include filter.h from net/sock.hJakub Kicinski5-0/+5
sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-12-29of: net: support NVMEM cells with MAC in text formatRafał Miłecki1-11/+22
Some NVMEM devices have text based cells. In such cases MAC is stored in a XX:XX:XX:XX:XX:XX format. Use mac_pton() to parse such data and support those NVMEM cells. This is required to support e.g. a very popular U-Boot and its environment variables. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-5/+6
include/net/sock.h commit 8f905c0e7354 ("inet: fully convert sk->sk_rx_dst to RCU rules") commit 43f51df41729 ("net: move early demux fields close to sk_refcnt") https://lore.kernel.org/all/20211222141641.0caa0ab3@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22devlink: Add new "event_eq_size" generic device paramShay Drory1-0/+5
Add new device generic parameter to determine the size of the asynchronous control events EQ. For example, to reduce event EQ size to 64, execute: $ devlink dev param set pci/0000:06:00.0 \ name event_eq_size value 64 cmode driverinit $ devlink dev reload pci/0000:06:00.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22devlink: Add new "io_eq_size" generic device paramShay Drory1-0/+5
Add new device generic parameter to determine the size of the I/O completion EQs. For example, to reduce I/O EQ size to 64, execute: $ devlink dev param set pci/0000:06:00.0 \ name io_eq_size value 64 cmode driverinit $ devlink dev reload pci/0000:06:00.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-19flow_offload: add reoffload process to update hw_countBaowen Zheng1-0/+4
Add reoffload process to update hw_count when driver is inserted or removed. We will delete the action if it is with skip_sw flag and not offloaded to any hardware in reoffload process. When reoffloading actions, we still offload the actions that are added independent of filters. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: allow user to offload tc action to net deviceBaowen Zheng1-8/+34
Use flow_indr_dev_register/flow_indr_dev_setup_offload to offload tc action. We need to call tc_cleanup_flow_action to clean up tc action entry since in tc_setup_action, some actions may hold dev refcnt, especially the mirror action. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.Hao Luo1-32/+32
Some helper functions may modify its arguments, for example, bpf_d_path, bpf_get_stack etc. Previously, their argument types were marked as ARG_PTR_TO_MEM, which is compatible with read-only mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate, but technically incorrect, to modify a read-only memory by passing it into one of such helper functions. This patch tags the bpf_args compatible with immutable memory with MEM_RDONLY flag. The arguments that don't have this flag will be only compatible with mutable memory types, preventing the helper from modifying a read-only memory. The bpf_args that have MEM_RDONLY are compatible with both mutable memory and immutable memory. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com
2021-12-19bpf: Introduce MEM_RDONLY flagHao Luo2-2/+2
This patch introduce a flag MEM_RDONLY to tag a reg value pointing to read-only memory. It makes the following changes: 1. PTR_TO_RDWR_BUF -> PTR_TO_BUF 2. PTR_TO_RDONLY_BUF -> PTR_TO_BUF | MEM_RDONLY Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-6-haoluo@google.com
2021-12-19bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULLHao Luo2-2/+2
We have introduced a new type to make bpf_reg composable, by allocating bits in the type to represent flags. One of the flags is PTR_MAYBE_NULL which indicates a pointer may be NULL. This patch switches the qualified reg_types to use this flag. The reg_types changed in this patch include: 1. PTR_TO_MAP_VALUE_OR_NULL 2. PTR_TO_SOCKET_OR_NULL 3. PTR_TO_SOCK_COMMON_OR_NULL 4. PTR_TO_TCP_SOCK_OR_NULL 5. PTR_TO_BTF_ID_OR_NULL 6. PTR_TO_MEM_OR_NULL 7. PTR_TO_RDONLY_BUF_OR_NULL 8. PTR_TO_RDWR_BUF_OR_NULL Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211217003152.48334-5-haoluo@google.com
2021-12-18xdp: move the if dev statements to the firstYajun Deng1-5/+5
The xdp_rxq_info_unreg() called by xdp_rxq_info_reg() is meaningless when dev is NULL, so move the if dev statements to the first. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-18net/sched: flow_dissector: Fix matching on zone id for invalid connsPaul Blakey1-1/+2
If ct rejects a flow, it removes the conntrack info from the skb. act_ct sets the post_ct variable so the dissector will see this case as an +tracked +invalid state, but the zone id is lost with the conntrack info. To restore the zone id on such cases, set the last executed zone, via the tc control block, when passing ct, and read it back in the dissector if there is no ct info on the skb (invalid connection). Fixes: 7baf2429a1a9 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-18net/sched: Extend qdisc control block with tc control blockPaul Blakey1-4/+4
BPF layer extends the qdisc control block via struct bpf_skb_data_end and because of that there is no more room to add variables to the qdisc layer control block without going over the skb->cb size. Extend the qdisc control block with a tc control block, and move all tc related variables to there as a pre-step for extending the tc control block with additional members. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16fib: expand fib_rule_policyFlorian Westphal1-1/+17
Now that there is only one fib nla_policy there is no need to keep the macro around. Place it where its used. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16fib: rules: remove duplicated nla policiesFlorian Westphal1-2/+7
The attributes are identical in all implementations so move the ipv4 one into the core and remove the per-family nla policies. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16net: Fix double 0x prefix print in SKB dumpGal Pressman1-1/+1
When printing netdev features %pNF already takes care of the 0x prefix, remove the explicit one. Fixes: 6413139dfc64 ("skbuff: increase verbosity when dumping skb data") Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15net: linkwatch: be more careful about dev->linkwatch_dev_trackerEric Dumazet1-1/+12
Apparently a concurrent linkwatch_add_event() could run while we are in __linkwatch_run_queue(). We need to free dev->linkwatch_dev_tracker tracker under lweventlist_lock protection to avoid this race. syzbot report: [ 77.935949][ T3661] reference already released. [ 77.941015][ T3661] allocated in: [ 77.944482][ T3661] linkwatch_fire_event+0x202/0x260 [ 77.950318][ T3661] netif_carrier_on+0x9c/0x100 [ 77.955120][ T3661] __ieee80211_sta_join_ibss+0xc52/0x1590 [ 77.960888][ T3661] ieee80211_sta_create_ibss.cold+0xd2/0x11f [ 77.966908][ T3661] ieee80211_ibss_work.cold+0x30e/0x60f [ 77.972483][ T3661] ieee80211_iface_work+0xb70/0xd00 [ 77.977715][ T3661] process_one_work+0x9ac/0x1680 [ 77.982671][ T3661] worker_thread+0x652/0x11c0 [ 77.987371][ T3661] kthread+0x405/0x4f0 [ 77.991465][ T3661] ret_from_fork+0x1f/0x30 [ 77.995895][ T3661] freed in: [ 77.999006][ T3661] linkwatch_do_dev+0x96/0x160 [ 78.004014][ T3661] __linkwatch_run_queue+0x233/0x6a0 [ 78.009496][ T3661] linkwatch_event+0x4a/0x60 [ 78.014099][ T3661] process_one_work+0x9ac/0x1680 [ 78.019034][ T3661] worker_thread+0x652/0x11c0 [ 78.023719][ T3661] kthread+0x405/0x4f0 [ 78.027810][ T3661] ret_from_fork+0x1f/0x30 [ 78.042541][ T3661] ------------[ cut here ]------------ [ 78.048253][ T3661] WARNING: CPU: 0 PID: 3661 at lib/ref_tracker.c:120 ref_tracker_free.cold+0x110/0x14e [ 78.062364][ T3661] Modules linked in: [ 78.066424][ T3661] CPU: 0 PID: 3661 Comm: kworker/0:5 Not tainted 5.16.0-rc4-next-20211210-syzkaller #0 [ 78.076075][ T3661] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ 78.090648][ T3661] Workqueue: events linkwatch_event [ 78.095890][ T3661] RIP: 0010:ref_tracker_free.cold+0x110/0x14e [ 78.102191][ T3661] Code: ea 03 48 c1 e0 2a 0f b6 04 02 84 c0 74 04 3c 03 7e 4c 8b 7b 18 e8 6b 54 e9 fa e8 26 4d 57 f8 4c 89 ee 48 89 ef e8 fb 33 36 00 <0f> 0b 41 bd ea ff ff ff e9 bd 60 e9 fa 4c 89 f7 e8 16 45 a2 f8 e9 [ 78.127211][ T3661] RSP: 0018:ffffc90002b5fb18 EFLAGS: 00010246 [ 78.133684][ T3661] RAX: 0000000000000000 RBX: ffff88807467f700 RCX: 0000000000000000 [ 78.141928][ T3661] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000001 [ 78.150087][ T3661] RBP: ffff888057e105b8 R08: 0000000000000001 R09: ffffffff8ffa1967 [ 78.158211][ T3661] R10: 0000000000000001 R11: 0000000000000000 R12: 1ffff9200056bf65 [ 78.166204][ T3661] R13: 0000000000000292 R14: ffff88807467f718 R15: 00000000c0e0008c [ 78.174321][ T3661] FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 [ 78.183310][ T3661] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 78.190156][ T3661] CR2: 000000c000208800 CR3: 000000007f7b5000 CR4: 00000000003506f0 [ 78.198235][ T3661] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 78.206214][ T3661] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 78.214328][ T3661] Call Trace: [ 78.217679][ T3661] <TASK> [ 78.220621][ T3661] ? __sanitizer_cov_trace_const_cmp4+0x1c/0x70 [ 78.226981][ T3661] ? nlmsg_notify+0xbe/0x280 [ 78.231607][ T3661] ? ref_tracker_dir_exit+0x330/0x330 [ 78.237654][ T3661] ? linkwatch_do_dev+0x96/0x160 [ 78.242628][ T3661] ? __linkwatch_run_queue+0x233/0x6a0 [ 78.248170][ T3661] ? linkwatch_event+0x4a/0x60 [ 78.252946][ T3661] ? process_one_work+0x9ac/0x1680 [ 78.258136][ T3661] ? worker_thread+0x853/0x11c0 [ 78.263020][ T3661] ? kthread+0x405/0x4f0 [ 78.267905][ T3661] ? ret_from_fork+0x1f/0x30 [ 78.272670][ T3661] ? netdev_state_change+0xa1/0x130 [ 78.278019][ T3661] ? netdev_exit+0xd0/0xd0 [ 78.282466][ T3661] ? dev_activate+0x420/0xa60 [ 78.287261][ T3661] linkwatch_do_dev+0x96/0x160 [ 78.292043][ T3661] __linkwatch_run_queue+0x233/0x6a0 [ 78.297505][ T3661] ? linkwatch_do_dev+0x160/0x160 [ 78.302561][ T3661] linkwatch_event+0x4a/0x60 [ 78.307225][ T3661] process_one_work+0x9ac/0x1680 [ 78.312292][ T3661] ? pwq_dec_nr_in_flight+0x2a0/0x2a0 [ 78.317757][ T3661] ? rwlock_bug.part.0+0x90/0x90 [ 78.322726][ T3661] ? _raw_spin_lock_irq+0x41/0x50 [ 78.327844][ T3661] worker_thread+0x853/0x11c0 [ 78.332543][ T3661] ? process_one_work+0x1680/0x1680 [ 78.338500][ T3661] kthread+0x405/0x4f0 [ 78.342610][ T3661] ? set_kthread_struct+0x130/0x130 Fixes: 63f13937cbe9 ("net: linkwatch: add net device refcount tracker") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20211214051955.3569843-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14Revert "pktgen: use min() to make code cleaner"David S. Miller1-2/+3
This reverts commit 13510fef48a3803d9ee8f044b015dacfb06fe0f5. Causes build warnings. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14pktgen: use min() to make code cleanerChangcheng Deng1-3/+2
Use min() in order to make code cleaner. Issue found by coccinelle. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14net: dev: Change the order of the arguments for the contended condition.Sebastian Andrzej Siewior1-1/+1
Change the order of arguments and make qdisc_is_running() appear first. This is more readable for the general case. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEXHangbin Liu1-1/+1
Since commit 94dd016ae538 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device") the user could get bond active interface's PHC index directly. But when there is a failover, the bond active interface will change, thus the PHC index is also changed. This may break the user's program if they did not update the PHC timely. This patch adds a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX. When the user wants to get the bond active interface's PHC, they need to add this flag and be aware the PHC index may be changed. With the new flag. All flag checks in current drivers are removed. Only the checking in net_hwtstamp_validate() is kept. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14bpf: Let bpf_warn_invalid_xdp_action() report more infoPaolo Abeni2-4/+4
In non trivial scenarios, the action id alone is not sufficient to identify the program causing the warning. Before the previous patch, the generated stack-trace pointed out at least the involved device driver. Let's additionally include the program name and id, and the relevant device name. If the user needs additional infos, he can fetch them via a kernel probe, leveraging the arguments added here. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
2021-12-14bpf: Do not WARN in bpf_warn_invalid_xdp_action()Paolo Abeni1-3/+3
The WARN_ONCE() in bpf_warn_invalid_xdp_action() can be triggered by any bugged program, and even attaching a correct program to a NIC not supporting the given action. The resulting splat, beyond polluting the logs, fouls automated tools: e.g. a syzkaller reproducers using an XDP program returning an unsupported action will never pass validation. Replace the WARN_ONCE with a less intrusive pr_warn_once(). Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/016ceec56e4817ebb2a9e35ce794d5c917df572c.1638189075.git.pabeni@redhat.com
2021-12-13net: dev: Always serialize on Qdisc::busylock in __dev_xmit_skb() on PREEMPT_RT.Sebastian Andrzej Siewior1-1/+5
The root-lock is dropped before dev_hard_start_xmit() is invoked and after setting the __QDISC___STATE_RUNNING bit. If the Qdisc owner is preempted by another sender/task with a higher priority then this new sender won't be able to submit packets to the NIC directly instead they will be enqueued into the Qdisc. The NIC will remain idle until the Qdisc owner is scheduled again and finishes the job. By serializing every task on the ->busylock then the task will be preempted by a sender only after the Qdisc has no owner. Always serialize on the busylock on PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12net: Enable neighbor sysctls that is save for userns rootxu xin1-4/+0
Inside netns owned by non-init userns, sysctls about ARP/neighbor is currently not visible and configurable. For the attributes these sysctls correspond to, any modifications make effects on the performance of networking(ARP, especilly) only in the scope of netns, which does not affect other netns. Actually, some tools via netlink can modify these attribute. iproute2 is an example. see as follows: $ unshare -ur -n $ cat /proc/sys/net/ipv4/neigh/lo/retrans_time cat: can't open '/proc/sys/net/ipv4/neigh/lo/retrans_time': No such file or directory $ ip ntable show dev lo inet arp_cache dev lo refcnt 1 reachable 19494 base_reachable 30000 retrans 1000 gc_stale 60000 delay_probe 5000 queue 101 app_probes 0 ucast_probes 3 mcast_probes 3 anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000 inet6 ndisc_cache dev lo refcnt 1 reachable 42394 base_reachable 30000 retrans 1000 gc_stale 60000 delay_probe 5000 queue 101 app_probes 0 ucast_probes 3 mcast_probes 3 anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 0 $ ip ntable change name arp_cache dev <if> retrans 2000 inet arp_cache dev lo refcnt 1 reachable 22917 base_reachable 30000 retrans 2000 gc_stale 60000 delay_probe 5000 queue 101 app_probes 0 ucast_probes 3 mcast_probes 3 anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000 inet6 ndisc_cache dev lo refcnt 1 reachable 35524 base_reachable 30000 retrans 1000 gc_stale 60000 delay_probe 5000 queue 101 app_probes 0 ucast_probes 3 mcast_probes 3 anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 0 Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Acked-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-11sock: Use sock_owned_by_user_nocheck() instead of sk_lock.owned.Kuniyuki Iwashima1-2/+2
This patch moves sock_release_ownership() down in include/net/sock.h and replaces some sk_lock.owned tests with sock_owned_by_user_nocheck(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/r/20211208062158.54132-1-kuniyu@amazon.co.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-11Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-6/+5
Andrii Nakryiko says: ==================== bpf-next 2021-12-10 v2 We've added 115 non-merge commits during the last 26 day(s) which contain a total of 182 files changed, 5747 insertions(+), 2564 deletions(-). The main changes are: 1) Various samples fixes, from Alexander Lobakin. 2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov. 3) A batch of new unified APIs for libbpf, logging improvements, version querying, etc. Also a batch of old deprecations for old APIs and various bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko. 4) BPF documentation reorganization and improvements, from Christoph Hellwig and Dave Tucker. 5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in libbpf, from Hengqi Chen. 6) Verifier log fixes, from Hou Tao. 7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong. 8) Extend branch record capturing to all platforms that support it, from Kajol Jain. 9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi. 10) bpftool doc-generating script improvements, from Quentin Monnet. 11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet. 12) Deprecation warning fix for perf/bpf_counter, from Song Liu. 13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf, from Tiezhu Yang. 14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song. 15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun, and others. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits) libbpf: Add "bool skipped" to struct bpf_map libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition bpftool: Switch bpf_object__load_xattr() to bpf_object__load() selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr() selftests/bpf: Add test for libbpf's custom log_buf behavior selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load() libbpf: Deprecate bpf_object__load_xattr() libbpf: Add per-program log buffer setter and getter libbpf: Preserve kernel error code and remove kprobe prog type guessing libbpf: Improve logging around BPF program loading libbpf: Allow passing user log setting through bpf_object_open_opts libbpf: Allow passing preallocated log_buf when loading BTF into kernel libbpf: Add OPTS-based bpf_btf_load() API libbpf: Fix bpf_prog_load() log_buf logic for log_level 0 samples/bpf: Remove unneeded variable bpf: Remove redundant assignment to pointer t selftests/bpf: Fix a compilation warning perf/bpf_counter: Use bpf_map_create instead of bpf_create_map samples: bpf: Fix 'unknown warning group' build warning on Clang samples: bpf: Fix xdp_sample_user.o linking with Clang ... ==================== Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10net: add netns refcount tracker to struct sockEric Dumazet1-3/+3
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10net: add networking namespace refcount trackerEric Dumazet1-0/+3
We have 100+ syzbot reports about netns being dismantled too soon, still unresolved as of today. We think a missing get_net() or an extra put_net() is the root cause. In order to find the bug(s), and be able to spot future ones, this patch adds CONFIG_NET_NS_REFCNT_TRACKER and new helpers to precisely pair all put_net() with corresponding get_net(). To use these helpers, each data structure owning a refcount should also use a "netns_tracker" to pair the get and put. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-15/+24
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09net-sysfs: warn if new queue objects are being created during device ↵Antoine Tenart1-0/+7
unregistration Calling netdev_queue_update_kobjects is allowed during device unregistration since commit 5c56580b74e5 ("net: Adjust TX queue kobjects if number of queues changes during unregister"). But this is solely to allow queue unregistrations. Any path attempting to add new queues after a device started its unregistration should be fixed. This patch adds a warning to detect such illegal use. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09net-sysfs: update the queue counts in the unregistration pathAntoine Tenart1-0/+3
When updating Rx and Tx queue kobjects, the queue count should always be updated to match the queue kobjects count. This was not done in the net device unregistration path, fix it. Tracking all queue count updates will allow in a following up patch to detect illegal updates. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09net, neigh: clear whole pneigh_entry at alloc timeEric Dumazet1-2/+1
Commit 2c611ad97a82 ("net, neigh: Extend neigh->flags to 32 bit to allow for extensions") enables a new KMSAM warning [1] I think the bug is actually older, because the following intruction only occurred if ndm->ndm_flags had NTF_PROXY set. pn->flags = ndm->ndm_flags; Let's clear all pneigh_entry fields at alloc time. [1] BUG: KMSAN: uninit-value in pneigh_fill_info+0x986/0xb30 net/core/neighbour.c:2593 pneigh_fill_info+0x986/0xb30 net/core/neighbour.c:2593 pneigh_dump_table net/core/neighbour.c:2715 [inline] neigh_dump_info+0x1e3f/0x2c60 net/core/neighbour.c:2832 netlink_dump+0xaca/0x16a0 net/netlink/af_netlink.c:2265 __netlink_dump_start+0xd1c/0xee0 net/netlink/af_netlink.c:2370 netlink_dump_start include/linux/netlink.h:254 [inline] rtnetlink_rcv_msg+0x181b/0x18c0 net/core/rtnetlink.c:5534 netlink_rcv_skb+0x447/0x800 net/netlink/af_netlink.c:2491 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5589 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x1095/0x1360 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x16f3/0x1870 net/netlink/af_netlink.c:1916 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] sock_write_iter+0x594/0x690 net/socket.c:1057 call_write_iter include/linux/fs.h:2162 [inline] new_sync_write fs/read_write.c:503 [inline] vfs_write+0x1318/0x2030 fs/read_write.c:590 ksys_write+0x28c/0x520 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0xdb/0x120 fs/read_write.c:652 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was created at: slab_post_alloc_hook mm/slab.h:524 [inline] slab_alloc_node mm/slub.c:3251 [inline] slab_alloc mm/slub.c:3259 [inline] __kmalloc+0xc3c/0x12d0 mm/slub.c:4437 kmalloc include/linux/slab.h:595 [inline] pneigh_lookup+0x60f/0xd70 net/core/neighbour.c:766 arp_req_set_public net/ipv4/arp.c:1016 [inline] arp_req_set+0x430/0x10a0 net/ipv4/arp.c:1032 arp_ioctl+0x8d4/0xb60 net/ipv4/arp.c:1232 inet_ioctl+0x4ef/0x820 net/ipv4/af_inet.c:947 sock_do_ioctl net/socket.c:1118 [inline] sock_ioctl+0xa3f/0x13e0 net/socket.c:1235 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl+0x2df/0x4a0 fs/ioctl.c:860 __x64_sys_ioctl+0xd8/0x110 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae CPU: 1 PID: 20001 Comm: syz-executor.0 Not tainted 5.16.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 62dd93181aaa ("[IPV6] NDISC: Set per-entry is_router flag in Proxy NA.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20211206165329.1049835-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski2-5/+15
Daniel Borkmann says: ==================== bpf 2021-12-08 We've added 12 non-merge commits during the last 22 day(s) which contain a total of 29 files changed, 659 insertions(+), 80 deletions(-). The main changes are: 1) Fix an off-by-two error in packet range markings and also add a batch of new tests for coverage of these corner cases, from Maxim Mikityanskiy. 2) Fix a compilation issue on MIPS JIT for R10000 CPUs, from Johan Almbladh. 3) Fix two functional regressions and a build warning related to BTF kfunc for modules, from Kumar Kartikeya Dwivedi. 4) Fix outdated code and docs regarding BPF's migrate_disable() use on non- PREEMPT_RT kernels, from Sebastian Andrzej Siewior. 5) Add missing includes in order to be able to detangle cgroup vs bpf header dependencies, from Jakub Kicinski. 6) Fix regression in BPF sockmap tests caused by missing detachment of progs from sockets when they are removed from the map, from John Fastabend. 7) Fix a missing "no previous prototype" warning in x86 JIT caused by BPF dispatcher, from Björn Töpel. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add selftests to cover packet access corner cases bpf: Fix the off-by-two error in range markings treewide: Add missing includes masked by cgroup -> bpf dependency tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets bpf: Fix bpf_check_mod_kfunc_call for built-in modules bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL mips, bpf: Fix reference to non-existing Kconfig symbol bpf: Make sure bpf_disable_instrumentation() is safe vs preemption. Documentation/locking/locktypes: Update migrate_disable() bits. bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap bpf, sockmap: Attach map progs to psock early for feature probes bpf, x86: Fix "no previous prototype" warning ==================== Link: https://lore.kernel.org/r/20211208155125.11826-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08pktgen add net device refcount trackerEric Dumazet1-3/+5
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07skbuff: introduce skb_pull_dataLuiz Augusto von Dentz1-0/+24
Like skb_pull but returns the original data pointer before pulling the data after performing a check against sbk->len. This allows to change code that does "struct foo *p = (void *)skb->data;" which is hard to audit and error prone, to: p = skb_pull_data(skb, sizeof(*p)); if (!p) return; Which is both safer and cleaner. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-12-07devlink: fix netns refcount leak in devlink_nl_cmd_reload()Eric Dumazet1-8/+8
While preparing my patch series adding netns refcount tracking, I spotted bugs in devlink_nl_cmd_reload() Some error paths forgot to release a refcount on a netns. To fix this, we can reduce the scope of get_net()/put_net() section around the call to devlink_reload(). Fixes: ccdf07219da6 ("devlink: Add reload action option to devlink reload command") Fixes: dc64cc7c6310 ("devlink: Add devlink reload limit option") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Moshe Shemesh <moshe@mellanox.com> Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20211205192822.1741045-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07netpoll: add net device refcount tracker to struct netpollEric Dumazet1-2/+2
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: failover: add net device refcount trackerEric Dumazet1-2/+2
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: linkwatch: add net device refcount trackerEric Dumazet1-2/+2
Add a netdevice_tracker inside struct net_device, to track the self reference when a device is in lweventlist. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: add net device refcount tracker to struct netdev_adjacentEric Dumazet1-3/+4
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: add net device refcount tracker to struct neigh_parmsEric Dumazet1-3/+3
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: add net device refcount tracker to struct pneigh_entryEric Dumazet1-4/+4
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: add net device refcount tracker to struct neighbourEric Dumazet1-2/+2
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>