summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-10-10wifi: mac80211: add internal handler for wake_tx_queueAlexander Wetzel1-0/+46
Start to align the TX handling to only use internal TX queues (iTXQs): Provide a handler for drivers not having a custom wake_tx_queue callback and update the documentation. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07cfg80211: Update Transition Disable policy during port authorizationVinayak Yadawad5-8/+24
In case of 4way handshake offload, transition disable policy updated by the AP during EAPOL 3/4 is not updated to the upper layer. This results in mismatch between transition disable policy between the upper layer and the driver. This patch addresses this issue by updating transition disable policy as part of port authorization indication. Signed-off-by: Vinayak Yadawad <vinayak.yadawad@broadcom.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: minstrel_ht: remove unused has_mrr member from struct ↵Peter Seiderer2-4/+0
minstrel_priv Remove unused has_mrr (has multi-rate retry capabilities) member from struct minstrel_priv (only set once in minstrel_ht_alloc, never used again). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: fix ifdef symbol nameJohannes Berg1-1/+1
This should of course be CONFIG_, not CPTCFG_, which is an artifact from working with backports. Fixes: 9dd1953846c7 ("wifi: nl80211/mac80211: clarify link ID in control port TX") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: remove support for AddBA with fragmentationJohannes Berg1-19/+0
HE added support for dynamic fragmentation inside aggregation sessions, but no existing driver ever advertises it. Thus, remove the code for now, it cannot work as-is in MLO. For it to properly work in MLO, we'd need to validate that the frag level is identical across all the link bands/iftypes, which is a good amount of complex code that's just not worth it as long as no driver has support for it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: agg-rx: avoid band checkJohannes Berg1-1/+1
If the deflink of the station is on 6 GHz, then it won't have HT. If at the same time we're using MLO, then vif.bss_conf isn't used, and thus vif.bss_conf.chandef.chan is NULL, causing the code to crash. Fix this by just checking for both HT and HE, and refusing the aggregation session if both are not present. This might be a bit wrong since it would accept an aggregation session from a peer that has HE but no HT on 2.4 or 5 GHz, but such a peer shouldn't exist in the first place, and it probably supports aggregation if it has HE support. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: prohibit IEEE80211_HT_CAP_DELAY_BA with MLOJohannes Berg1-0/+10
This won't work right at least with the code as it is, so at least for now just assume it's never set for MLO. It may very well never change, almost no drivers support it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: don't clear DTIM period after setting itJohannes Berg1-13/+12
Fix the code that sets the DTIM period to always propagate it into link->conf->dtim_period and not overwrite it, while still preferring to set it from the beacon data if available. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: change AddBA deny error messageJohannes Berg1-1/+1
If the station has no HT, we deny the aggregation session but the error message talks about QoS; change it to say HT instead. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: mlme: mark assoc link in outputJohannes Berg1-2/+4
It's useful to know which link was used for the association, mark it when printing the links. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: check link ID in auth/assoc continuationJohannes Berg2-2/+6
Ensure that the link ID matches in auth/assoc continuation, otherwise we need to reset all the data. Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: mlme: fix null-ptr deref on failed assocJohannes Berg1-2/+6
If association to an AP without a link 0 fails, then we crash in tracing because it assumes that either ap_mld_addr or link 0 BSS is valid, since we clear sdata->vif.valid_links and then don't add the ap_mld_addr to the struct. Since we clear also sdata->vif.cfg.ap_addr, keep a local copy of it and assign it earlier, before clearing valid_links, to fix this. Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: fix AddBA response addressingJohannes Berg1-1/+1
Since this frame is addressed from/to an MLD, it should be built with the correct AP MLD address (in station mode) to be encrypted properly. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: set internal scan request BSSIDJohannes Berg1-0/+2
If any driver relies entirely on the scan request BSSID, then that would be wrong for internal scans. Initialize it to the broadcast address since we don't otherwise use the field. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: advertise TWT requester only with HW supportHaim Dreyfuss1-6/+24
Currently, we rely only on the AP capability. If the AP supports TWT responder we will advertise TWT requester even if the driver or HW doesn't support it. Fix this by checking the HW capability. Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: use link_id in ieee80211_change_bss()Johannes Berg1-19/+21
We should set the parameters here per link, except unfortunately ap_isolate, but we can't really change that anymore so it'll remain a quirk in the API in that you need to change it on one of the valid links and it'll apply to all. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: nl80211: use link ID in NL80211_CMD_SET_BSSJohannes Berg1-1/+3
We clearly need the link ID here, to know the right BSS to configure. Use/require it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: transmit AddBA with MLD addressJohannes Berg1-1/+1
This management frame is intended for the MLD so we treat it in mac80211 as MLD addressed as well, and should therefore use the MLD address of the AP for the BSSID field in the frame, address translation applies. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: wme: use ap_addr instead of deflink BSSIDJohannes Berg1-1/+1
We use this to look up the destination station, so it needs to be the MLD address of the AP for an MLO; use ap_addr instead of the BSSID. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: Process association status for affiliated linksIlan Peer2-5/+37
In case the AP returned a non success status for one of the links, do not activate the link. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: Parse station profile from association responseIlan Peer3-6/+187
When processing an association response frame for a Multi-Link connection, extract the per station profile for each additional link, and use it for parsing the link elements. As the Multi-Link element might be fragmented, add support for reassembling a fragmented element. To simplify memory management logic, extend 'struct ieee802_11_elems' to hold a scratch buffer, which is used for the defragmentation. Once an element is reconstructed in the scratch area, point the corresponding element pointer to it. Currently only defragmentation of Multi-Link element and the contained per-STA profile subelement is supported. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: cfg80211: support reporting failed linksJohannes Berg3-1/+22
For assoc and connect result APIs, support reporting failed links; they should still come with the BSS pointer in the case of assoc, so they're released correctly. In the case of connect result, this is optional. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: recalc station aggregate data during link switchJohannes Berg3-9/+43
During link switching, the active links change, so we need to recalculate the aggregate data in the stations. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: include link address in debugfsBenjamin Berg1-0/+15
Add the link address to the per-link information, but only if we are using MLO. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: add API to show the link STAs in debugfsBenjamin Berg6-23/+189
Create debugfs data per-link. For drivers, there is a new operation link_sta_add_debugfs which will always be called. For non-MLO, the station directory will be used directly rather than creating a corresponding subdirectory. As such, non-MLO drivers can simply continue to create the data from sta_debugfs_add. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> [add missing inlines if !CONFIG_MAC80211_DEBUGFS] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: add pointer from link STA to STABenjamin Berg1-0/+1
While often not needed, this considerably simplifies going from a link to the STA. This helps in cases such as debugfs where a single pointer should allow accessing a specific link and the STA. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-30Merge tag 'wireless-next-2022-09-30' of ↵Jakub Kicinski19-288/+766
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.1 Few stack changes and lots of driver changes in this round. brcmfmac has more activity as usual and it gets new hardware support. ath11k improves WCN6750 support and also other smaller features. And of course changes all over. Note: in early September wireless tree was merged to wireless-next to avoid some conflicts with mac80211 patches, this shouldn't cause any problems but wanted to mention anyway. Major changes: mac80211 - refactoring and preparation for Wi-Fi 7 Multi-Link Operation (MLO) feature continues brcmfmac - support CYW43439 SDIO chipset - support BCM4378 on Apple platforms - support CYW89459 PCIe chipset rtw89 - more work to get rtw8852c supported - P2P support - support for enabling and disabling MSDU aggregation via nl80211 mt76 - tx status reporting improvements ath11k - cold boot calibration support on WCN6750 - Target Wake Time (TWT) debugfs support for STA interface - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - implement SRAM dump debugfs interface - enable threaded NAPI on all hardware - WoW support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz wcn36xx - add SNR from a received frame as a source of system entropy * tag 'wireless-next-2022-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (231 commits) wifi: rtl8xxxu: Improve rtl8xxxu_queue_select wifi: rtl8xxxu: Fix AIFS written to REG_EDCA_*_PARAM wifi: rtl8xxxu: gen2: Enable 40 MHz channel width wifi: rtw89: 8852b: configure DLE mem wifi: rtw89: check DLE FIFO size with reserved size wifi: rtw89: mac: correct register of report IMR wifi: rtw89: pci: set power cut closed for 8852be wifi: rtw89: pci: add to do PCI auto calibration wifi: rtw89: 8852b: implement chip_ops::{enable,disable}_bb_rf wifi: rtw89: add DMA busy checking bits to chip info wifi: rtw89: mac: define DMA channel mask to avoid unsupported channels wifi: rtw89: pci: mask out unsupported TX channels iwlegacy: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper ipw2x00: Replace zero-length array with DECLARE_FLEX_ARRAY() helper wifi: iwlwifi: Track scan_cmd allocation size explicitly brcmfmac: Remove the call to "dtim_assoc" IOVAR brcmfmac: increase dcmd maximum buffer size brcmfmac: Support 89459 pcie brcmfmac: increase default max WOWL patterns to 16 cw1200: fix incorrect check to determine if no element is found in list ... ==================== Link: https://lore.kernel.org/r/20220930150413.A7984C433D6@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30xsk: Expose min chunk size to driversMaxim Mikityanskiy1-2/+0
Drivers should be aware of the range of valid UMEM chunk sizes to be able to allocate their internal structures of an appropriate size. It will be used by mlx5e in the following patches. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> CC: "Björn Töpel" <bjorn@kernel.org> CC: Magnus Karlsson <magnus.karlsson@intel.com> CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30ip6_vti:Remove the space before the commaHongbin Wang1-1/+1
There should be no space before the comma Signed-off-by: Hongbin Wang <wh_bin@126.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30net: bridge: assign path_cost for 2.5G and 5G link speedSteven Hsieh1-1/+10
As 2.5G, 5G ethernet ports are more common and affordable, these ports are being used in LAN bridge devices. STP port_cost() is missing path_cost assignment for these link speeds, causes highest cost 100 being used. This result in lower speed port being picked when there is loop between 5G and 1G ports. Original path_cost: 10G=2, 1G=4, 100m=19, 10m=100 Adjusted path_cost: 10G=2, 5G=3, 2.5G=4, 1G=5, 100m=19, 10m=100 speed greater than 10G = 1 Signed-off-by: Steven Hsieh <steven.hsieh@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30net-next: skbuff: refactor pskb_pullRichard Gobert3-4/+4
pskb_may_pull already contains all of the checks performed by pskb_pull. Use pskb_may_pull for validation in pskb_pull, eliminating the duplication and making __pskb_pull obsolete. Replace __pskb_pull with pskb_pull where applicable. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30net-sysfs: Convert to use sysfs_emit() APIsWang Yufen1-29/+29
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30net/sched: taprio: allow user input of per-tc max SDUVladimir Oltean1-1/+151
IEEE 802.1Q clause 12.29.1.1 "The queueMaxSDUTable structure and data types" and 8.6.8.4 "Enhancements for scheduled traffic" talk about the existence of a per traffic class limitation of maximum frame sizes, with a fallback on the port-based MTU. As far as I am able to understand, the 802.1Q Service Data Unit (SDU) represents the MAC Service Data Unit (MSDU, i.e. L2 payload), excluding any number of prepended VLAN headers which may be otherwise present in the MSDU. Therefore, the queueMaxSDU is directly comparable to the device MTU (1500 means L2 payload sizes are accepted, or frame sizes of 1518 octets, or 1522 plus one VLAN header). Drivers which offload this are directly responsible of translating into other units of measurement. To keep the fast path checks optimized, we keep 2 arrays in the qdisc, one for max_sdu translated into frame length (so that it's comparable to skb->len), and another for offloading and for dumping back to the user. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/sched: query offload capabilities through ndo_setup_tc()Vladimir Oltean1-0/+17
When adding optional new features to Qdisc offloads, existing drivers must reject the new configuration until they are coded up to act on it. Since modifying all drivers in lockstep with the changes in the Qdisc can create problems of its own, it would be nice if there existed an automatic opt-in mechanism for offloading optional features. Jakub proposes that we multiplex one more kind of call through ndo_setup_tc(): one where the driver populates a Qdisc-specific capability structure. First user will be taprio in further changes. Here we are introducing the definitions for the base functionality. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220923163310.3192733-3-vladimir.oltean@nxp.com/ Suggested-by: Jakub Kicinski <kuba@kernel.org> Co-developed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net/tipc: Remove unused struct distr_queue_itemYuan Can1-8/+0
After commit 09b5678c778f("tipc: remove dead code in tipc_net and relatives"), struct distr_queue_item is not used any more and can be removed as well. Signed-off-by: Yuan Can <yuancan@huawei.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/20220928085636.71749-1-yuancan@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net: skb: introduce and use a single page frag cachePaolo Abeni2-22/+103
After commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") we are observing 10-20% regressions in performance tests with small packets. The perf trace points to high pressure on the slab allocator. This change tries to improve the allocation schema for small packets using an idea originally suggested by Eric: a new per CPU page frag is introduced and used in __napi_alloc_skb to cope with small allocation requests. To ensure that the above does not lead to excessive truesize underestimation, the frag size for small allocation is inflated to 1K and all the above is restricted to build with 4K page size. Note that we need to update accordingly the run-time check introduced with commit fd9ea57f4e95 ("net: add napi_get_frags_check() helper"). Alex suggested a smart page refcount schema to reduce the number of atomic operations and deal properly with pfmemalloc pages. Under small packet UDP flood, I measure a 15% peak tput increases. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Alexander H Duyck <alexanderduyck@fb.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/6b6f65957c59f86a353fc09a5127e83a32ab5999.1664350652.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net: sched: cls_u32: Avoid memcpy() false-positive warningKees Cook1-1/+5
To work around a misbehavior of the compiler's ability to see into composite flexible array structs (as detailed in the coming memcpy() hardening series[1]), use unsafe_memcpy(), as the sizing, bounds-checking, and allocation are all very tightly coupled here. This silences the false-positive reported by syzbot: memcpy: detected field-spanning write (size 80) of single field "&n->sel" at net/sched/cls_u32.c:1043 (size 16) [1] https://lore.kernel.org/linux-hardening/20220901065914.1417829-2-keescook@chromium.org Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Reported-by: syzbot+a2c4601efc75848ba321@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/000000000000a96c0b05e97f0444@google.com/ Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20220927153700.3071688-1-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski11-47/+45
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29Merge tag 'net-6.0-rc8' of ↵Linus Torvalds10-40/+45
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from wifi and can. Current release - regressions: - phy: don't WARN for PHY_UP state in mdio_bus_phy_resume() - wifi: fix locking in mac80211 mlme - eth: - revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()" - mlxbf_gige: fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe Previous releases - regressions: - wifi: fix regression with non-QoS drivers Previous releases - always broken: - mptcp: fix unreleased socket in accept queue - wifi: - don't start TX with fq->lock to fix deadlock - fix memory corruption in minstrel_ht_update_rates() - eth: - macb: fix ZynqMP SGMII non-wakeup source resume failure - mt7531: only do PLL once after the reset - usbnet: fix memory leak in usbnet_disconnect() Misc: - usb: qmi_wwan: add new usb-id for Dell branded EM7455" * tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits) mptcp: fix unreleased socket in accept queue mptcp: factor out __mptcp_close() without socket lock net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2} net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge can: c_can: don't cache TX messages for C_CAN cores ice: xsk: drop power of 2 ring size restriction for AF_XDP ice: xsk: change batched Tx descriptor cleaning net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455 selftests: Fix the if conditions of in test_extra_filter() net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume() net: stmmac: power up/down serdes in stmmac_open/release wifi: mac80211: mlme: Fix double unlock on assoc success handling wifi: mac80211: mlme: Fix missing unlock on beacon RX wifi: mac80211: fix memory corruption in minstrel_ht_update_rates() wifi: mac80211: fix regression with non-QoS drivers wifi: mac80211: ensure vif queues are operational after start wifi: mac80211: don't start TX with fq->lock to fix deadlock wifi: cfg80211: fix MCS divisor value net: hippi: Add missing pci_disable_device() in rr_init_one() net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe ...
2022-09-29mptcp: fix unreleased socket in accept queueMenglong Dong3-27/+9
The mptcp socket and its subflow sockets in accept queue can't be released after the process exit. While the release of a mptcp socket in listening state, the corresponding tcp socket will be released too. Meanwhile, the tcp socket in the unaccept queue will be released too. However, only init subflow is in the unaccept queue, and the joined subflow is not in the unaccept queue, which makes the joined subflow won't be released, and therefore the corresponding unaccepted mptcp socket will not be released to. This can be reproduced easily with following steps: 1. create 2 namespace and veth: $ ip netns add mptcp-client $ ip netns add mptcp-server $ sysctl -w net.ipv4.conf.all.rp_filter=0 $ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1 $ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1 $ ip link add red-client netns mptcp-client type veth peer red-server \ netns mptcp-server $ ip -n mptcp-server address add 10.0.0.1/24 dev red-server $ ip -n mptcp-server address add 192.168.0.1/24 dev red-server $ ip -n mptcp-client address add 10.0.0.2/24 dev red-client $ ip -n mptcp-client address add 192.168.0.2/24 dev red-client $ ip -n mptcp-server link set red-server up $ ip -n mptcp-client link set red-client up 2. configure the endpoint and limit for client and server: $ ip -n mptcp-server mptcp endpoint flush $ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2 $ ip -n mptcp-client mptcp endpoint flush $ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2 $ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \ 1 subflow 3. listen and accept on a port, such as 9999. The nc command we used here is modified, which makes it use mptcp protocol by default. $ ip netns exec mptcp-server nc -l -k -p 9999 4. open another *two* terminal and use each of them to connect to the server with the following command: $ ip netns exec mptcp-client nc 10.0.0.1 9999 Input something after connect to trigger the connection of the second subflow. So that there are two established mptcp connections, with the second one still unaccepted. 5. exit all the nc command, and check the tcp socket in server namespace. And you will find that there is one tcp socket in CLOSE_WAIT state and can't release forever. Fix this by closing all of the unaccepted mptcp socket in mptcp_subflow_queue_clean() with __mptcp_close(). Now, we can ensure that all unaccepted mptcp sockets will be cleaned by __mptcp_close() before they are released, so mptcp_sock_destruct(), which is used to clean the unaccepted mptcp socket, is not needed anymore. The selftests for mptcp is ran for this commit, and no new failures. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Fixes: 6aeed9045071 ("mptcp: fix race on unaccepted mptcp sockets") Cc: stable@vger.kernel.org Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Mengen Sun <mengensun@tencent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29mptcp: factor out __mptcp_close() without socket lockMenglong Dong2-2/+13
Factor out __mptcp_close() from mptcp_close(). The caller of __mptcp_close() should hold the socket lock, and cancel mptcp work when __mptcp_close() returns true. This function will be used in the next commit. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Fixes: 6aeed9045071 ("mptcp: fix race on unaccepted mptcp sockets") Cc: stable@vger.kernel.org Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Mengen Sun <mengensun@tencent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29net: drop the weight argument from netif_napi_addJakub Kicinski1-2/+1
We tell driver developers to always pass NAPI_POLL_WEIGHT as the weight to netif_napi_add(). This may be confusing to newcomers, drop the weight argument, those who really need to tweak the weight can use netif_napi_add_weight(). Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29net: Fix incorrect address comparison when searching for a bind2 bucketMartin KaFai Lau1-0/+10
The v6_rcv_saddr and rcv_saddr are inside a union in the 'struct inet_bind2_bucket'. When searching a bucket by following the bhash2 hashtable chain, eg. inet_bind2_bucket_match, it is only using the sk->sk_family and there is no way to check if the inet_bind2_bucket has a v6 or v4 address in the union. This leads to an uninit-value KMSAN report in [0] and also potentially incorrect matches. This patch fixes it by adding a family member to the inet_bind2_bucket and then tests 'sk->sk_family != tb->family' before matching the sk's address to the tb's address. Cc: Joanne Koong <joannelkoong@gmail.com> Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/r/20220927002544.3381205-1-kafai@fb.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29mptcp: poll allow write call before actual connectBenjamin Hesmans1-0/+4
If fastopen is used, poll must allow a first write that will trigger the SYN+data Similar to what is done in tcp_poll(). Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29mptcp: handle defer connect in mptcp_sendmsgDmytro Shytyi1-0/+22
When TCP_FASTOPEN_CONNECT has been set on the socket before a connect, the defer flag is set and must be handled when sendmsg is called. This is similar to what is done in tcp_sendmsg_locked(). Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29tcp: export tcp_sendmsg_fastopenBenjamin Hesmans1-3/+2
It will be used to support TCP FastOpen with MPTCP in the following commit. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29mptcp: add TCP_FASTOPEN_CONNECT socket optionBenjamin Hesmans1-1/+18
Set the option for the first subflow only. For the other subflows TFO can't be used because a mapping would be needed to cover the data in the SYN. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29net: shrink struct ubuf_infoPavel Begunkov4-20/+24
We can benefit from a smaller struct ubuf_info, so leave only mandatory fields and let users to decide how they want to extend it. Convert MSG_ZEROCOPY to struct ubuf_info_msgzc and remove duplicated fields. This reduces the size from 48 bytes to just 16. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28Revert "net: set proper memcg for net_init hooks allocations"Shakeel Butt1-7/+0
This reverts commit 1d0403d20f6c281cb3d14c5f1db5317caeec48e9. Anatoly Pugachev reported that the commit 1d0403d20f6c ("net: set proper memcg for net_init hooks allocations") is somehow causing the sparc64 VMs failed to boot and the VMs boot fine with that patch reverted. So, revert the patch for now and later we can debug the issue. Link: https://lore.kernel.org/all/20220918092849.GA10314@u164.east.ru/ Reported-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: Shakeel Butt <shakeelb@google.com> Cc: Vasily Averin <vvs@openvz.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: cgroups@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Tested-by: Anatoly Pugachev <matorola@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Fixes: 1d0403d20f6c ("net: set proper memcg for net_init hooks allocations") Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-09-28netfilter: nft_fib: Fix for rpath check with VRF devicesPhil Sutter2-1/+8
Analogous to commit b575b24b8eee3 ("netfilter: Fix rpfilter dropping vrf packets by mistake") but for nftables fib expression: Add special treatment of VRF devices so that typical reverse path filtering via 'fib saddr . iif oif' expression works as expected. Fixes: f6d0cbcf09c50 ("netfilter: nf_tables: add fib expression") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>