summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-07-22Bluetooth: Convert delayed discov_off to hci_syncBrian Gix2-89/+33
The timed ending of Discoverability was handled in hci_requst.c, with calls using the deprecated hci_req_add() mechanism. Converted to live inside mgmt.c using the same delayed work queue, but with hci_sync version of hci_update_discoverable(). Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: Remove update_scan hci_request dependancyBrian Gix5-28/+16
This removes the remaining calls to HCI_OP_WRITE_SCAN_ENABLE from hci_request call chains, and converts them to hci_sync calls. Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: Remove dead code from hci_request.cBrian Gix2-289/+0
The discov_update work queue is no longer used as a result of the hci_sync rework. The __hci_req_hci_power_on() function is no longer referenced in the code as a result of the hci_sync rework. Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: MGMT: Fix holding hci_conn reference while command is queuedLuiz Augusto von Dentz1-39/+12
This removes the use of hci_conn_hold from Get Conn Info and Get Clock Info since the callback can just do a lookup by address using the cmd data and only then set cmd->user_data to pass to the complete callback. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: mgmt: Fix using hci_conn_abortLuiz Augusto von Dentz2-5/+36
This fixes using hci_conn_abort instead of using hci_conn_abort_sync. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: Use bt_status to convert from errnoLuiz Augusto von Dentz1-1/+1
If a command cannot be sent or there is a internal error an errno maybe set instead of a command status. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: Add bt_statusLuiz Augusto von Dentz1-0/+71
This adds bt_status which can be used to convert Unix errno to Bluetooth status. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: hci_sync: Split hci_dev_open_syncLuiz Augusto von Dentz1-99/+126
This splits hci_dev_open_sync so each stage is handle by its own function so it is easier to identify each stage. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: hci_sync: Refactor remove Adv MonitorManish Mandlik4-174/+73
Make use of hci_cmd_sync_queue for removing an advertisement monitor. Signed-off-by: Manish Mandlik <mmandlik@google.com> Reviewed-by: Miao-chen Chou <mcchou@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: hci_sync: Refactor add Adv MonitorManish Mandlik3-192/+77
Make use of hci_cmd_sync_queue for adding an advertisement monitor. Signed-off-by: Manish Mandlik <mmandlik@google.com> Reviewed-by: Miao-chen Chou <mcchou@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTINGZijun Hu1-3/+0
Core driver addtionally checks LMP feature bit "Erroneous Data Reporting" instead of quirk HCI_QUIRK_BROKEN_ERR_DATA_REPORTING to decide if HCI commands HCI_Read|Write_Default_Erroneous_Data_Reporting are broken, so remove this unnecessary quirk. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Tested-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: hci_sync: Check LMP feature bit instead of quirkZijun Hu1-2/+2
BT core driver should addtionally check LMP feature bit "Erroneous Data Reporting" instead of quirk HCI_QUIRK_BROKEN_ERR_DATA_REPORTING set by BT device driver to decide if HCI commands HCI_Read|Write_Default_Erroneous_Data_Reporting are broken. BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 2, Part C | page 587 This feature indicates whether the device is able to support the Packet_Status_Flag and the HCI commands HCI_Write_Default_- Erroneous_Data_Reporting and HCI_Read_Default_Erroneous_- Data_Reporting. the quirk was introduced by 'commit cde1a8a99287 ("Bluetooth: btusb: Fix and detect most of the Chinese Bluetooth controllers")' to mark HCI commands HCI_Read|Write_Default_Erroneous_Data_Reporting broken by BT device driver, but the reason why these two HCI commands are broken is that feature "Erroneous Data Reporting" is not enabled by firmware, this scenario is illustrated by below log of QCA controllers with USB I/F: @ RAW Open: hcitool (privileged) version 2.22 < HCI Command: Read Local Supported Commands (0x04|0x0002) plen 0 > HCI Event: Command Complete (0x0e) plen 68 Read Local Supported Commands (0x04|0x0002) ncmd 1 Status: Success (0x00) Commands: 288 entries ...... Read Default Erroneous Data Reporting (Octet 18 - Bit 2) Write Default Erroneous Data Reporting (Octet 18 - Bit 3) ...... < HCI Command: Read Default Erroneous Data Reporting (0x03|0x005a) plen 0 > HCI Event: Command Complete (0x0e) plen 4 Read Default Erroneous Data Reporting (0x03|0x005a) ncmd 1 Status: Unknown HCI Command (0x01) < HCI Command: Read Local Supported Features (0x04|0x0003) plen 0 > HCI Event: Command Complete (0x0e) plen 12 Read Local Supported Features (0x04|0x0003) ncmd 1 Status: Success (0x00) Features: 0xff 0xfe 0x0f 0xfe 0xd8 0x3f 0x5b 0x87 3 slot packets ...... Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Tested-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: hci_sync: Correct hci_set_event_mask_page_2_sync() event maskZijun Hu1-2/+2
Event HCI_Truncated_Page_Complete should belong to central and HCI_Peripheral_Page_Response_Timeout should belong to peripheral, but hci_set_event_mask_page_2_sync() take these two events for wrong roles, so correct it by this change. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: hci_sync: Don't remove connected devices from accept listLuiz Augusto von Dentz1-2/+5
These devices are likely going to be reprogrammed when disconnected so this avoid a whole bunch of commands attempt to remove and the add back to the list. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Zhengping Jiang <jiangzp@google.com>
2022-07-22Bluetooth: hci_sync: Fix not updating privacy_modeLuiz Augusto von Dentz1-0/+3
When programming a new entry into the resolving list it shall default to network mode since the params may contain the mode programmed when the device was last added to the resolving list. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209745 Fixes: 853b70b506a20 ("Bluetooth: hci_sync: Set Privacy Mode when updating the resolving list") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Zhengping Jiang <jiangzp@google.com>
2022-07-22Bluetooth: Collect kcov coverage from hci_rx_workTamas Koczka1-1/+9
Annotate hci_rx_work() with kcov_remote_start() and kcov_remote_stop() calls, so remote KCOV coverage is collected while processing the rx_q queue which is the main incoming Bluetooth packet queue. Coverage is associated with the thread which created the packet skb. The collected extra coverage helps kernel fuzzing efforts in finding vulnerabilities. This change only has effect if the kernel is compiled with CONFIG_KCOV, otherwise kcov_ functions don't do anything. Signed-off-by: Tamas Koczka <poprdi@google.com> Tested-by: Aleksandr Nogikh <nogikh@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: hci_sync: Fix resuming scan after suspend resumeZhengping Jiang1-3/+2
After resuming, remove setting scanning_paused to false, because it is checked and set to false in hci_resume_scan_sync. Also move setting the value to false before updating passive scan, because the value is used when resuming passive scan. Fixes: 3b42055388c30 (Bluetooth: hci_sync: Fix attempting to suspend with unfiltered passive scan) Signed-off-by: Zhengping Jiang <jiangzp@google.com> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: mgmt: Fix refresh cached connection infoZhengping Jiang1-5/+5
Set the connection data before calling get_conn_info_sync, so it can be verified the connection is still connected, before refreshing cached values. Fixes: 47db6b42991e6 ("Bluetooth: hci_sync: Convert MGMT_OP_GET_CONN_INFO") Signed-off-by: Zhengping Jiang <jiangzp@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: HCI: Fix not always setting Scan Response/Advertising DataLuiz Augusto von Dentz2-43/+65
The scan response and advertising data needs to be tracked on a per instance (adv_info) since when these instaces are removed so are their data, to fix that new flags are introduced which is used to mark when the data changes and then checked to confirm when the data needs to be synced with the controller. Tested-by: Tedd Ho-Jeong An <tedd.an@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: eir: Fix using strlen with hdev->{dev_name,short_name}Luiz Augusto von Dentz2-17/+28
Both dev_name and short_name are not guaranteed to be NULL terminated so this instead use strnlen and then attempt to determine if the resulting string needs to be truncated or not. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216018 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22Bluetooth: use memset avoid memory leaksXiaohui Zhang1-0/+1
Similar to the handling of l2cap_ecred_connect in commit d3715b2333e9 ("Bluetooth: use memset avoid memory leaks"), we thought a patch might be needed here as well. Use memset to initialize structs to prevent memory leaks in l2cap_le_connect Signed-off-by: Xiaohui Zhang <xiaohuizhang@ruc.edu.cn> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-22Bluetooth: fix an error code in hci_register_dev()Dan Carpenter1-1/+2
Preserve the error code from hci_register_suspend_notifier(). Don't return success. Fixes: d6bb2a91f95b ("Bluetooth: Unregister suspend with userchannel") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-22Bluetooth: Unregister suspend with userchannelAbhishek Pandit-Subedi2-8/+28
When HCI_USERCHANNEL is used, unregister the suspend notifier when binding and register when releasing. The userchannel socket should be left alone after open is completed. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-22Bluetooth: Fix index added after unregisterAbhishek Pandit-Subedi1-1/+7
When a userchannel socket is released, we should check whether the hdev is already unregistered before sending out an IndexAdded. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-22Bluetooth: When HCI work queue is drained, only queue chained workSchspa Shi2-3/+12
The HCI command, event, and data packet processing workqueue is drained to avoid deadlock in commit 76727c02c1e1 ("Bluetooth: Call drain_workqueue() before resetting state"). There is another delayed work, which will queue command to this drained workqueue. Which results in the following error report: Bluetooth: hci2: command 0x040f tx timeout WARNING: CPU: 1 PID: 18374 at kernel/workqueue.c:1438 __queue_work+0xdad/0x1140 Workqueue: events hci_cmd_timeout RIP: 0010:__queue_work+0xdad/0x1140 RSP: 0000:ffffc90002cffc60 EFLAGS: 00010093 RAX: 0000000000000000 RBX: ffff8880b9d3ec00 RCX: 0000000000000000 RDX: ffff888024ba0000 RSI: ffffffff814e048d RDI: ffff8880b9d3ec08 RBP: 0000000000000008 R08: 0000000000000000 R09: 00000000b9d39700 R10: ffffffff814f73c6 R11: 0000000000000000 R12: ffff88807cce4c60 R13: 0000000000000000 R14: ffff8880796d8800 R15: ffff8880796d8800 FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c0174b4000 CR3: 000000007cae9000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? queue_work_on+0xcb/0x110 ? lockdep_hardirqs_off+0x90/0xd0 queue_work_on+0xee/0x110 process_one_work+0x996/0x1610 ? pwq_dec_nr_in_flight+0x2a0/0x2a0 ? rwlock_bug.part.0+0x90/0x90 ? _raw_spin_lock_irq+0x41/0x50 worker_thread+0x665/0x1080 ? process_one_work+0x1610/0x1610 kthread+0x2e9/0x3a0 ? kthread_complete_and_exit+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK> To fix this, we can add a new HCI_DRAIN_WQ flag, and don't queue the timeout workqueue while command workqueue is draining. Fixes: 76727c02c1e1 ("Bluetooth: Call drain_workqueue() before resetting state") Reported-by: syzbot+63bed493aebbf6872647@syzkaller.appspotmail.com Signed-off-by: Schspa Shi <schspa@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-22Bluetooth: clear the temporary linkkey in hci_conn_cleanupAlain Michaud2-2/+5
If a hardware error occurs and the connections are flushed without a disconnection_complete event being signaled, the temporary linkkeys are not flushed. This change ensures that any outstanding flushable linkkeys are flushed when the connection are flushed from the hash table. Additionally, this also makes use of test_and_clear_bit to avoid multiple attempts to delete the link key that's already been flushed. Signed-off-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski40-244/+222
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski34-291/+403
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Simplify nf_ct_get_tuple(), from Jackie Liu. 2) Add format to request_module() call, from Bill Wendling. 3) Add /proc/net/stats/nf_flowtable to monitor in-flight pending hardware offload objects to be processed, from Vlad Buslov. 4) Missing rcu annotation and accessors in the netfilter tree, from Florian Westphal. 5) Merge h323 conntrack helper nat hooks into single object, also from Florian. 6) A batch of update to fix sparse warnings treewide, from Florian Westphal. 7) Move nft_cmp_fast_mask() where it used, from Florian. 8) Missing const in nf_nat_initialized(), from James Yonan. 9) Use bitmap API for Maglev IPVS scheduler, from Christophe Jaillet. 10) Use refcount_inc instead of _inc_not_zero in flowtable, from Florian Westphal. 11) Remove pr_debug in xt_TPROXY, from Nathan Cancellor. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: xt_TPROXY: remove pr_debug invocations netfilter: flowtable: prefer refcount_inc netfilter: ipvs: Use the bitmap API to allocate bitmaps netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn * netfilter: nf_tables: move nft_cmp_fast_mask to where its used netfilter: nf_tables: use correct integer types netfilter: nf_tables: add and use BE register load-store helpers netfilter: nf_tables: use the correct get/put helpers netfilter: x_tables: use correct integer types netfilter: nfnetlink: add missing __be16 cast netfilter: nft_set_bitmap: Fix spelling mistake netfilter: h323: merge nat hook pointers into one netfilter: nf_conntrack: use rcu accessors where needed netfilter: nf_conntrack: add missing __rcu annotations netfilter: nf_flow_table: count pending offload workqueue tasks net/sched: act_ct: set 'net' pointer when creating new nf_flow_table netfilter: conntrack: use correct format characters netfilter: conntrack: use fallthrough to cleanup ==================== Link: https://lore.kernel.org/r/20220720230754.209053-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-21netfilter: xt_TPROXY: remove pr_debug invocationsJustin Stitt1-23/+2
pr_debug calls are no longer needed in this file. Pablo suggested "a patch to remove these pr_debug calls". This patch has some other beneficial collateral as it also silences multiple Clang -Wformat warnings that were present in the pr_debug calls. diff from v1 -> v2: * converted if statement one-liner style * x == NULL is now !x Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-21netfilter: flowtable: prefer refcount_incFlorian Westphal1-8/+3
With refcount_inc_not_zero, we'd also need a smp_rmb or similar, followed by a test of the CONFIRMED bit. However, the ct pointer is taken from skb->_nfct, its refcount must not be 0 (else, we'd already have a use-after-free bug). Use refcount_inc() instead to clarify the ct refcount is expected to be at least 1. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-21netfilter: ipvs: Use the bitmap API to allocate bitmapsChristophe JAILLET1-3/+2
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-20net/sched: cls_api: Fix flow action initializationOz Shlomo1-6/+10
The cited commit refactored the flow action initialization sequence to use an interface method when translating tc action instances to flow offload objects. The refactored version skips the initialization of the generic flow action attributes for tc actions, such as pedit, that allocate more than one offload entry. This can cause potential issues for drivers mapping flow action ids. Populate the generic flow action fields for all the flow action entries. Fixes: c54e1d920f04 ("flow_offload: add ops to tc_action_ops for flow action setup") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> ---- v1 -> v2: - coalese the generic flow action fields initialization to a single loop Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix data-races around sysctl_tcp_max_reordering.Kuniyuki Iwashima1-2/+2
While reading sysctl_tcp_max_reordering, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: dca145ffaa8d ("tcp: allow for bigger reordering level") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix a data-race around sysctl_tcp_abort_on_overflow.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_abort_on_overflow, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix a data-race around sysctl_tcp_rfc1337.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_rfc1337, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix a data-race around sysctl_tcp_stdurg.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_stdurg, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix a data-race around sysctl_tcp_retrans_collapse.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_retrans_collapse, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_slow_start_after_idle, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 35089bb203f4 ("[TCP]: Add tcp_slow_start_after_idle sysctl.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_thin_linear_timeouts, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 36e31b0af587 ("net: TCP thin linear timeouts") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix data-races around sysctl_tcp_recovery.Kuniyuki Iwashima2-3/+6
While reading sysctl_tcp_recovery, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 4f41b1c58a32 ("tcp: use RACK to detect losses") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix a data-race around sysctl_tcp_early_retrans.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_early_retrans, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: eed530b6c676 ("tcp: early retransmit") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20tcp: Fix data-races around sysctl knobs related to SYN option.Kuniyuki Iwashima4-13/+13
While reading these knobs, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. - tcp_sack - tcp_window_scaling - tcp_timestamps Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20ip: Fix data-races around sysctl_ip_prot_sock.Kuniyuki Iwashima1-3/+3
sysctl_ip_prot_sock is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. Fixes: 4548b683b781 ("Introduce a sysctl that modifies the value of PROT_SOCK.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20ipv4: Fix data-races around sysctl_fib_multipath_hash_fields.Kuniyuki Iwashima1-3/+3
While reading sysctl_fib_multipath_hash_fields, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: ce5c9c20d364 ("ipv4: Add a sysctl to control multipath hash fields") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20ipv4: Fix data-races around sysctl_fib_multipath_hash_policy.Kuniyuki Iwashima1-1/+1
While reading sysctl_fib_multipath_hash_policy, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: bf4e0a3db97e ("net: ipv4: add support for ECMP hash policy choice") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh.Kuniyuki Iwashima1-1/+1
While reading sysctl_fib_multipath_use_neigh, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: a6db4494d218 ("net: ipv4: Consider failed nexthops in multipath routes") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20Merge branch 'master' of ↵David S. Miller3-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-07-20 1) Fix a policy refcount imbalance in xfrm_bundle_lookup. From Hangyu Hua. 2) Fix some clang -Wformat warnings. Justin Stitt ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20Merge branch 'io_uring-zerocopy-send' of ↵Jakub Kicinski7-46/+138
git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux Pavel Begunkov says: ==================== io_uring zerocopy send The patchset implements io_uring zerocopy send. It works with both registered and normal buffers, mixing is allowed but not recommended. Apart from usual request completions, just as with MSG_ZEROCOPY, io_uring separately notifies the userspace when buffers are freed and can be reused (see API design below), which is delivered into io_uring's Completion Queue. Those "buffer-free" notifications are not necessarily per request, but the userspace has control over it and should explicitly attaching a number of requests to a single notification. The series also adds some internal optimisations when used with registered buffers like removing page referencing. From the kernel networking perspective there are two main changes. The first one is passing ubuf_info into the network layer from io_uring (inside of an in kernel struct msghdr). This allows extra optimisations, e.g. ubuf_info caching on the io_uring side, but also helps to avoid cross-referencing and synchronisation problems. The second part is an optional optimisation removing page referencing for requests with registered buffers. Benchmarking UDP with an optimised version of the selftest (see [1]), which sends a bunch of requests, waits for completions and repeats. "+ flush" column posts one additional "buffer-free" notification per request, and just "zc" doesn't post buffer notifications at all. NIC (requests / second): IO size | non-zc | zc | zc + flush 4000 | 495134 | 606420 (+22%) | 558971 (+12%) 1500 | 551808 | 577116 (+4.5%) | 565803 (+2.5%) 1000 | 584677 | 592088 (+1.2%) | 560885 (-4%) 600 | 596292 | 598550 (+0.4%) | 555366 (-6.7%) dummy (requests / second): IO size | non-zc | zc | zc + flush 8000 | 1299916 | 2396600 (+84%) | 2224219 (+71%) 4000 | 1869230 | 2344146 (+25%) | 2170069 (+16%) 1200 | 2071617 | 2361960 (+14%) | 2203052 (+6%) 600 | 2106794 | 2381527 (+13%) | 2195295 (+4%) Previously it also brought a massive performance speedup compared to the msg_zerocopy tool (see [3]), which is probably not super interesting. There is also an additional bunch of refcounting optimisations that was omitted from the series for simplicity and as they don't change the picture drastically, they will be sent as follow up, as well as flushing optimisations closing the performance gap b/w two last columns. For TCP on localhost (with hacks enabling localhost zerocopy) and including additional overhead for receive: IO size | non-zc | zc 1200 | 4174 | 4148 4096 | 7597 | 11228 Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the omitted optimisations will somewhat help, should look better for 4000, but couldn't test properly because of setup problems. Links: liburing (benchmark + tests): [1] https://github.com/isilence/liburing/tree/zc_v4 kernel repo: [2] https://github.com/isilence/linux/tree/zc_v4 RFC v1: [3] https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@gmail.com/ RFC v2: https://lore.kernel.org/io-uring/cover.1640029579.git.asml.silence@gmail.com/ Net patches based: git@github.com:isilence/linux.git zc_v4-net-base or https://github.com/isilence/linux/tree/zc_v4-net-base API design overview: The series introduces an io_uring concept of notifactors. From the userspace perspective it's an entity to which it can bind one or more requests and then requesting to flush it. Flushing a notifier makes it impossible to attach new requests to it, and instructs the notifier to post a completion once all requests attached to it are completed and the kernel doesn't need the buffers anymore. Notifications are stored in notification slots, which should be registered as an array in io_uring. Each slot stores only one notifier at any particular moment. Flushing removes it from the slot and the slot automatically replaces it with a new notifier. All operations with notifiers are done by specifying an index of a slot it's currently in. When registering a notification the userspace specifies a u64 tag for each slot, which will be copied in notification completion entries as cqe::user_data. cqe::res is 0 and cqe::flags is equal to wrap around u32 sequence number counting notifiers of a slot. ==================== Link: https://lore.kernel.org/r/cover.1657643355.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-20tcp: support externally provided ubufsPavel Begunkov1-11/+22
Teach tcp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-20ipv6/udp: support externally provided ubufsPavel Begunkov1-13/+31
Teach ipv6/udp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>