summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-06-20wifi: mac80211: reorg some iface data structs for MLDJohannes Berg22-486/+534
Start reorganizing interface related data structures toward MLD. The most complex part here is for the keys, since we have to split the various kinds of GTKs off to the link but still need to use (for WEP) the other keys as a fallback even for multicast frames. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-20wifi: mac80211: move interface config to new structJohannes Berg13-67/+76
We'll use bss_conf for per-link configuration later, so move out all the non-link-specific data out into a new struct ieee80211_vif_cfg used in the vif. Some adjustments were done with the following spatch: @@ expression sdata; struct ieee80211_vif *vifp; identifier var = { assoc, ibss_joined, aid, arp_addr_list, arp_addr_cnt, ssid, ssid_len, s1g, ibss_creator }; @@ ( -sdata->vif.bss_conf.var +sdata->vif.cfg.var | -vifp->bss_conf.var +vifp->cfg.var ) @bss_conf@ struct ieee80211_bss_conf *bss_conf; identifier var = { assoc, ibss_joined, aid, arp_addr_list, arp_addr_cnt, ssid, ssid_len, s1g, ibss_creator }; @@ -bss_conf->var +vif_cfg->var (though more manual fixups were needed, e.g. replacing "vif_cfg->" by "vif->cfg." in many files.) Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-20wifi: mac80211: move some future per-link data to bss_confJohannes Berg20-122/+125
To add MLD, reuse the bss_conf structure later for per-link information, so move some things into it that are per link. Most transformations were done with the following spatch: @@ expression sdata; identifier var = { chanctx_conf, mu_mimo_owner, csa_active, color_change_active, color_change_color }; @@ -sdata->vif.var +sdata->vif.bss_conf.var @@ struct ieee80211_vif *vif; identifier var = { chanctx_conf, mu_mimo_owner, csa_active, color_change_active, color_change_color }; @@ -vif->var +vif->bss_conf.var Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-20wifi: cfg80211: do some rework towards MLO link APIsJohannes Berg19-487/+1084
In order to support multi-link operation with multiple links, start adding some APIs. The notable addition here is to have the link ID in a new nl80211 attribute, that will be used to differentiate the links in many nl80211 operations. So far, this patch adds the netlink NL80211_ATTR_MLO_LINK_ID attribute (as well as the NL80211_ATTR_MLO_LINKS attribute) and plugs it through the system in some places, checking the validity etc. along with other infrastructure needed for it. For now, I've decided to include only the over-the-air link ID in the API. I know we discussed that we eventually need to have to have other ways of identifying a link, but for local AP mode and auth/assoc commands as well as set_key etc. we'll use the OTA ID. Also included in this patch is some refactoring of the data structures in struct wireless_dev, splitting for the first time the data into type dependent pieces, to make reasoning about these things easier. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-20wifi: mac80211: reject WEP or pairwise keys with key ID > 3Johannes Berg1-5/+13
We don't really care too much right now since our data structures are set up to not have a problem with this, but clearly it's wrong to accept WEP and pairwise keys with key ID > 3. However, with MLD we need to split into per-link (GTK, IGTK, BIGTK) and per interface/MLD (including WEP) keys so make sure this is not a problem. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-15net: don't check skb_count twiceSieng Piaw Liew1-3/+4
NAPI cache skb_count is being checked twice without condition. Change to checking the second time only if the first check is run. Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-15net: bridge: allow add/remove permanent mdb entries on disabled portsCasper Andersson1-6/+9
Adding mdb entries on disabled ports allows you to do setup before accepting any traffic, avoiding any time where the port is not in the multicast group. Signed-off-by: Casper Andersson <casper.casan@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-14ethtool: Fix and simplify ethtool_convert_link_mode_to_legacy_u32()Marco Bonelli1-15/+2
Fix the implementation of ethtool_convert_link_mode_to_legacy_u32(), which is supposed to return false if src has bits higher than 31 set. The current implementation uses the complement of bitmap_fill(ext, 32) to test high bits of src, which is wrong as bitmap_fill() fills _with long granularity_, and sizeof(long) can be > 4. No users of this function currently check the return value, so the bug was dormant. Also remove the check for __ETHTOOL_LINK_MODE_MASK_NBITS > 32, as the enum ethtool_link_mode_bit_indices contains far beyond 32 values. Using find_next_bit() to test the src bitmask works regardless of this anyway. Signed-off-by: Marco Bonelli <marco@mebeim.net> Link: https://lore.kernel.org/r/20220609134900.11201-1-marco@mebeim.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-13net: make __sys_accept4_file() staticYajun Deng1-9/+6
__sys_accept4_file() isn't used outside of the file, make it static. As the same time, move file_flags and nofile parameters into __sys_accept4_file(). Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13tcp: sk_forced_mem_schedule() optimizationEric Dumazet1-3/+6
sk_memory_allocated_add() has three callers, and returns to them @memory_allocated. sk_forced_mem_schedule() is one of them, and ignores the returned value. Change sk_memory_allocated_add() to return void. Change sock_reserve_memory() and __sk_mem_raise_allocated() to call sk_memory_allocated(). This removes one cache line miss [1] for RPC workloads, as first skbs in TCP write queue and receive queue go through sk_forced_mem_schedule(). [1] Cache line holding tcp_memory_allocated. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-11net: unexport __sk_mem_{raise|reduce}_allocatedEric Dumazet1-2/+0
These two helpers are only used from core networking. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-11net: keep sk->sk_forward_alloc as small as possibleEric Dumazet10-45/+5
Currently, tcp_memory_allocated can hit tcp_mem[] limits quite fast. Each TCP socket can forward allocate up to 2 MB of memory, even after flow became less active. 10,000 sockets can have reserved 20 GB of memory, and we have no shrinker in place to reclaim that. Instead of trying to reclaim the extra allocations in some places, just keep sk->sk_forward_alloc values as small as possible. This should not impact performance too much now we have per-cpu reserves: Changes to tcp_memory_allocated should not be too frequent. For sockets not using SO_RESERVE_MEM: - idle sockets (no packets in tx/rx queues) have zero forward alloc. - non idle sockets have a forward alloc smaller than one page. Note: - Removal of SK_RECLAIM_CHUNK and SK_RECLAIM_THRESHOLD is left to MPTCP maintainers as a follow up. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-11net: add per_cpu_fw_alloc field to struct protoEric Dumazet11-0/+39
Each protocol having a ->memory_allocated pointer gets a corresponding per-cpu reserve, that following patches will use. Instead of having reserved bytes per socket, we want to have per-cpu reserves. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-11net: remove SK_MEM_QUANTUM and SK_MEM_QUANTUM_SHIFTEric Dumazet7-23/+23
Due to memcg interface, SK_MEM_QUANTUM is effectively PAGE_SIZE. This might change in the future, but it seems better to avoid the confusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-4/+5
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10Merge tag 'wireless-next-2022-06-10' of ↵Jakub Kicinski14-430/+80
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== wireless-next patches for v5.20 Here's a first set of patches for v5.20. This is just a queue flush, before we get things back from net-next that are causing conflicts, and then can start merging a lot of MLO (multi-link operation, part of 802.11be) code. Lots of cleanups all over. The only notable change is perhaps wilc1000 being the first driver to disable WEP (while enabling WPA3). * tag 'wireless-next-2022-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (29 commits) wifi: mac80211_hwsim: Directly use ida_alloc()/free() wifi: mac80211: refactor some key code wifi: mac80211: remove cipher scheme support wifi: nl80211: fix typo in comment wifi: virt_wifi: fix typo in comment rtw89: add new state to CFO state machine for UL-OFDMA rtw89: 8852c: add trigger frame counter ieee80211: add trigger frame definition wifi: wfx: Remove redundant NULL check before release_firmware() call wifi: rtw89: support MULTI_BSSID and correct BSSID mask of H2C wifi: ray_cs: Drop useless status variable in parse_addr() wifi: ray_cs: Utilize strnlen() in parse_addr() wifi: rtw88: use %*ph to print small buffer wifi: wilc1000: add IGTK support wifi: wilc1000: add WPA3 SAE support wifi: wilc1000: remove WEP security support wifi: wilc1000: use correct sequence of RESET for chip Power-UP/Down wifi: rtlwifi: fix error codes in rtl_debugfs_set_write_h2c() wifi: rtw88: Fix Sparse warning for rtw8821c_hw_spec wifi: rtw88: Fix Sparse warning for rtw8723d_hw_spec ... ==================== Link: https://lore.kernel.org/r/20220610142838.330862-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10wifi: mac80211: refactor some key codeJohannes Berg2-33/+41
There's some pretty close code here, with the exception of RCU dereference vs. protected dereference. Refactor this to just return a pointer and then do the deref in the caller later. Change-Id: Ide5315e2792da6ac66eaf852293306a3ac71ced9 Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-10wifi: mac80211: remove cipher scheme supportJohannes Berg14-397/+39
The only driver using this was iwlwifi, where we just removed the support because it was never really used. Remove the code from mac80211 as well. Change-Id: I1667417a5932315ee9d81f5c233c56a354923f09 Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-10Merge tag 'ieee802154-for-net-next-2022-06-09' of ↵Jakub Kicinski15-214/+48
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2022-06-09 This is a separate pull request for 6lowpan changes. We agreed with the bluetooth maintainers to switch the trees these changing are going into from bluetooth to ieee802154. Jukka Rissanen stepped down as a co-maintainer of 6lowpan (Thanks for the work!). Alexander is staying as maintainer. Alexander reworked the nhc_id lookup in 6lowpan to be way simpler. Moved the data structure from rb to an array, which is all we need in this case. * tag 'ieee802154-for-net-next-2022-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next: MAINTAINERS: Remove Jukka Rissanen as 6lowpan maintainer net: 6lowpan: constify lowpan_nhc structures net: 6lowpan: use array for find nhc id net: 6lowpan: remove const from scalars ==================== Link: https://lore.kernel.org/r/20220609202956.1512156-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: seg6: fix seg6_lookup_any_nexthop() to handle VRFs using flowi_l3mdevAndrea Mayer1-0/+1
Commit 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") adds a new entry (flowi_l3mdev) in the common flow struct used for indicating the l3mdev index for later rule and table matching. The l3mdev_update_flow() has been adapted to properly set the flowi_l3mdev based on the flowi_oif/flowi_iif. In fact, when a valid flowi_iif is supplied to the l3mdev_update_flow(), this function can update the flowi_l3mdev entry only if it has not yet been set (i.e., the flowi_l3mdev entry is equal to 0). The SRv6 End.DT6 behavior in VRF mode leverages a VRF device in order to force the routing lookup into the associated routing table. This routing operation is performed by seg6_lookup_any_nextop() preparing a flowi6 data structure used by ip6_route_input_lookup() which, in turn, (indirectly) invokes l3mdev_update_flow(). However, seg6_lookup_any_nexthop() does not initialize the new flowi_l3mdev entry which is filled with random garbage data. This prevents l3mdev_update_flow() from properly updating the flowi_l3mdev with the VRF index, and thus SRv6 End.DT6 (VRF mode)/DT46 behaviors are broken. This patch correctly initializes the flowi6 instance allocated and used by seg6_lookup_any_nexhtop(). Specifically, the entire flowi6 instance is wiped out: in case new entries are added to flowi/flowi6 (as happened with the flowi_l3mdev entry), we should no longer have incorrectly initialized values. As a result of this operation, the value of flowi_l3mdev is also set to 0. The proposed fix can be tested easily. Starting from the commit referenced in the Fixes, selftests [1],[2] indicate that the SRv6 End.DT6 (VRF mode)/DT46 behaviors no longer work correctly. By applying this patch, those behaviors are back to work properly again. [1] - tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh [2] - tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") Reported-by: Anton Makarov <am@3a-alliance.com> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220608091917.20345-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: add napi_get_frags_check() helperEric Dumazet1-0/+18
This is a follow up of commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") When/if we increase MAX_SKB_FRAGS, we better make sure the old bug will not come back. Adding a check in napi_get_frags() would be costly, even if using DEBUG_NET_WARN_ON_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: add debug checks in napi_consume_skb and __napi_alloc_skb()Eric Dumazet1-1/+2
Commit 6454eca81eae ("net: Use lockdep_assert_in_softirq() in napi_consume_skb()") added a check in napi_consume_skb() which is a bit weak. napi_consume_skb() and __napi_alloc_skb() should only be used from BH context, not from hard irq or nmi context, otherwise we could have races. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: use DEBUG_NET_WARN_ON_ONCE() in skb_release_head_state()Eric Dumazet1-1/+1
Remove this check from fast path unless CONFIG_DEBUG_NET=y Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10af_unix: use DEBUG_NET_WARN_ON_ONCE()Eric Dumazet1-4/+4
Replace four WARN_ON() that have not triggered recently with DEBUG_NET_WARN_ON_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: use WARN_ON_ONCE() in sk_stream_kill_queues()Eric Dumazet1-3/+3
sk_stream_kill_queues() has three checks which have been useful to detect kernel bugs in the past. However they are potentially a problem because they could flood the syslog. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: use WARN_ON_ONCE() in inet_sock_destruct()Eric Dumazet1-4/+4
inet_sock_destruct() has four warnings which have been useful to point to kernel bugs in the past. However they are potentially a problem because they could flood the syslog. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: use DEBUG_NET_WARN_ON_ONCE() in dev_loopback_xmit()Eric Dumazet1-1/+1
One check in dev_loopback_xmit() has not caught issues in the past. Keep it for CONFIG_DEBUG_NET=y builds only. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: use DEBUG_NET_WARN_ON_ONCE() in __release_sock()Eric Dumazet1-1/+1
Check against skb dst in socket backlog has never triggered in past years. Keep the check omly for CONFIG_DEBUG_NET=y builds. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10drop_monitor: adopt u64_stats_tEric Dumazet1-9/+9
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10devlink: adopt u64_stats_tEric Dumazet1-12/+16
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: adopt u64_stats_t in struct pcpu_sw_netstatsEric Dumazet4-33/+37
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10ip6_tunnel: use dev_sw_netstats_rx_add()Eric Dumazet1-6/+1
We have a convenient helper, let's use it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10sit: use dev_sw_netstats_rx_add()Eric Dumazet1-7/+1
We have a convenient helper, let's use it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10vlan: adopt u64_stats_tEric Dumazet2-12/+12
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Add READ_ONCE() when reading rx_errors & tx_dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: rename reference+tracking helpersJakub Kicinski41-113/+117
Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which are relatively recent and should be the default for new code. Rename: dev_hold_track() -> netdev_hold() dev_put_track() -> netdev_put() dev_replace_track() -> netdev_ref_replace() Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10tls: Rename TLS_INFO_ZC_SENDFILE to TLS_INFO_ZC_TXMaxim Mikityanskiy1-4/+4
To embrace possible future optimizations of TLS, rename zerocopy sendfile definitions to more generic ones: * setsockopt: TLS_TX_ZEROCOPY_SENDFILE- > TLS_TX_ZEROCOPY_RO * sock_diag: TLS_INFO_ZC_SENDFILE -> TLS_INFO_ZC_RO_TX RO stands for readonly and emphasizes that the application shouldn't modify the data being transmitted with zerocopy to avoid potential disconnection. Fixes: c1318b39c7d3 ("tls: Add opt-in zerocopy mode of sendfile()") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/r/20220608153425.3151146-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski20-95/+107
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: 6lowpan: constify lowpan_nhc structuresAlexander Aring2-11/+11
This patch constify the lowpan_nhc declarations. Since we drop the rb node datastructure there is no need for runtime manipulation of this structure. Signed-off-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Link: https://lore.kernel.org/r/20220428030534.3220410-4-aahringo@redhat.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-09net: 6lowpan: use array for find nhc idAlexander Aring15-203/+37
This patch will remove the complete overengineered and overthinking rb data structure for looking up the nhc by nhcid. Instead we using the existing nhc next header array and iterate over it. It works now for 1 byte values only. However there are only 1 byte nhc id values currently supported and IANA also does not specify large than 1 byte values yet. If there are 2 byte values for nhc ids specified we can revisit this data structure and add support for it. Signed-off-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Link: https://lore.kernel.org/r/20220428030534.3220410-3-aahringo@redhat.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-09net: 6lowpan: remove const from scalarsAlexander Aring1-3/+3
The keyword const makes no sense for scalar types inside the lowpan_nhc structure. Most compilers will ignore it so we remove the keyword from the scalar types. Signed-off-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Link: https://lore.kernel.org/r/20220428030534.3220410-2-aahringo@redhat.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-09Merge tag 'net-5.19-rc2' of ↵Linus Torvalds15-62/+83
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - eth: amt: fix possible null-ptr-deref in amt_rcv() Previous releases - regressions: - tcp: use alloc_large_system_hash() to allocate table_perturb - af_unix: fix a data-race in unix_dgram_peer_wake_me() - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF Previous releases - always broken: - ipv6: fix signed integer overflow in __ip6_append_data - netfilter: - nat: really support inet nat without l3 address - nf_tables: memleak flow rule from commit path - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs - openvswitch: fix misuse of the cached connection on tuple changes - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred - eth: altera: fix refcount leak in altera_tse_mdio_create Misc: - add Quentin Monnet to bpftool maintainers" * tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) net: amd-xgbe: fix clang -Wformat warning tcp: use alloc_large_system_hash() to allocate table_perturb net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY net: dsa: mv88e6xxx: correctly report serdes link failure net: dsa: mv88e6xxx: fix BMSR error to be consistent with others net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete net: altera: Fix refcount leak in altera_tse_mdio_create net: openvswitch: fix misuse of the cached connection on tuple changes net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag ip_gre: test csum_start instead of transport header au1000_eth: stop using virt_to_bus() ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg ipv6: Fix signed integer overflow in __ip6_append_data nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION net: ipv6: unexport __init-annotated seg6_hmac_init() net: xfrm: unexport __init-annotated xfrm4_protocol_init() net: mdio: unexport __init-annotated mdio_bus_init() ...
2022-06-09tcp: use alloc_large_system_hash() to allocate table_perturbMuchun Song1-4/+6
In our server, there may be no high order (>= 6) memory since we reserve lots of HugeTLB pages when booting. Then the system panic. So use alloc_large_system_hash() to allocate table_perturb. Fixes: e9261476184b ("tcp: dynamically allocate the perturb table used by source ports") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220607070214.94443-1-songmuchun@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: openvswitch: fix misuse of the cached connection on tuple changesIlya Maximets2-1/+9
If packet headers changed, the cached nfct is no longer relevant for the packet and attempt to re-use it leads to the incorrect packet classification. This issue is causing broken connectivity in OpenStack deployments with OVS/OVN due to hairpin traffic being unexpectedly dropped. The setup has datapath flows with several conntrack actions and tuple changes between them: actions:ct(commit,zone=8,mark=0/0x1,nat(src)), set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)), set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)), ct(zone=8),recirc(0x4) After the first ct() action the packet headers are almost fully re-written. The next ct() tries to re-use the existing nfct entry and marks the packet as invalid, so it gets dropped later in the pipeline. Clearing the cached conntrack entry whenever packet tuple is changed to avoid the issue. The flow key should not be cleared though, because we should still be able to match on the ct_state if the recirculation happens after the tuple change but before the next ct() action. Cc: stable@vger.kernel.org Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Frode Nordahl <frode.nordahl@canonical.com> Link: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-May/051829.html Link: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856 Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Link: https://lore.kernel.org/r/20220606221140.488984-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09ip_gre: test csum_start instead of transport headerWillem de Bruijn1-6/+5
GRE with TUNNEL_CSUM will apply local checksum offload on CHECKSUM_PARTIAL packets. ipgre_xmit must validate csum_start after an optional skb_pull, else lco_csum may trigger an overflow. The original check was if (csum && skb_checksum_start(skb) < skb->data) return -EINVAL; This had false positives when skb_checksum_start is undefined: when ip_summed is not CHECKSUM_PARTIAL. A discussed refinement was straightforward if (csum && skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_start(skb) < skb->data) return -EINVAL; But was eventually revised more thoroughly: - restrict the check to the only branch where needed, in an uncommon GRE path that uses header_ops and calls skb_pull. - test skb_transport_header, which is set along with csum_start in skb_partial_csum_set in the normal header_ops datapath. Turns out skbs can arrive in this branch without the transport header set, e.g., through BPF redirection. Revise the check back to check csum_start directly, and only if CHECKSUM_PARTIAL. Do leave the check in the updated location. Check field regardless of whether TUNNEL_CSUM is configured. Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ Link: https://lore.kernel.org/all/20210902193447.94039-2-willemdebruijn.kernel@gmail.com/T/#u Fixes: 8a0ed250f911 ("ip_gre: validate csum_start only on pull") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20220606132107.3582565-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski2-10/+3
Daniel Borkmann says: ==================== pull-request: bpf 2022-06-09 We've added 6 non-merge commits during the last 2 day(s) which contain a total of 8 files changed, 49 insertions(+), 15 deletions(-). The main changes are: 1) Fix an illegal copy_to_user() attempt seen by syzkaller through arm64 BPF JIT compiler, from Eric Dumazet. 2) Fix calling global functions from BPF_PROG_TYPE_EXT programs by using the correct program context type, from Toke Høiland-Jørgensen. 3) Fix XSK TX batching invalid descriptor handling, from Maciej Fijalkowski. 4) Fix potential integer overflows in multi-kprobe link code by using safer kvmalloc_array() allocation helpers, from Dan Carpenter. 5) Add Quentin as bpftool maintainer, from Quentin Monnet. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: MAINTAINERS: Add a maintainer for bpftool xsk: Fix handling of invalid descriptors in XSK TX batching API selftests/bpf: Add selftest for calling global functions from freplace bpf: Fix calling global functions from BPF_PROG_TYPE_EXT programs bpf: Use safer kvmalloc_array() where possible bpf, arm64: Clear prog->jited_len along prog->jited ==================== Link: https://lore.kernel.org/r/20220608234133.32265-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08ipv6: Fix signed integer overflow in l2tp_ip6_sendmsgWang Yufen1-2/+3
When len >= INT_MAX - transhdrlen, ulen = len + transhdrlen will be overflow. To fix, we can follow what udpv6 does and subtract the transhdrlen from the max. Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/20220607120028.845916-2-wangyufen@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08ipv6: Fix signed integer overflow in __ip6_append_dataWang Yufen1-3/+3
Resurrect ubsan overflow checks and ubsan report this warning, fix it by change the variable [length] type to size_t. UBSAN: signed-integer-overflow in net/ipv6/ip6_output.c:1489:19 2147479552 + 8567 cannot be represented in type 'int' CPU: 0 PID: 253 Comm: err Not tainted 5.16.0+ #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x214/0x230 show_stack+0x30/0x78 dump_stack_lvl+0xf8/0x118 dump_stack+0x18/0x30 ubsan_epilogue+0x18/0x60 handle_overflow+0xd0/0xf0 __ubsan_handle_add_overflow+0x34/0x44 __ip6_append_data.isra.48+0x1598/0x1688 ip6_append_data+0x128/0x260 udpv6_sendmsg+0x680/0xdd0 inet6_sendmsg+0x54/0x90 sock_sendmsg+0x70/0x88 ____sys_sendmsg+0xe8/0x368 ___sys_sendmsg+0x98/0xe0 __sys_sendmmsg+0xf4/0x3b8 __arm64_sys_sendmmsg+0x34/0x48 invoke_syscall+0x64/0x160 el0_svc_common.constprop.4+0x124/0x300 do_el0_svc+0x44/0xc8 el0_svc+0x3c/0x1e8 el0t_64_sync_handler+0x88/0xb0 el0t_64_sync+0x16c/0x170 Changes since v1: -Change the variable [length] type to unsigned, as Eric Dumazet suggested. Changes since v2: -Don't change exthdrlen type in ip6_make_skb, as Paolo Abeni suggested. Changes since v3: -Don't change ulen type in udpv6_sendmsg and l2tp_ip6_sendmsg, as Jakub Kicinski suggested. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/20220607120028.845916-1-wangyufen@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: ipv6: unexport __init-annotated seg6_hmac_init()Masahiro Yamada1-1/+0
EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it has been broken for a decade. Recently, I fixed modpost so it started to warn it again, then this showed up in linux-next builds. There are two ways to fix it: - Remove __init - Remove EXPORT_SYMBOL I chose the latter for this case because the caller (net/ipv6/seg6.c) and the callee (net/ipv6/seg6_hmac.c) belong to the same module. It seems an internal function call in ipv6.ko. Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: xfrm: unexport __init-annotated xfrm4_protocol_init()Masahiro Yamada1-1/+0
EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it has been broken for a decade. Recently, I fixed modpost so it started to warn it again, then this showed up in linux-next builds. There are two ways to fix it: - Remove __init - Remove EXPORT_SYMBOL I chose the latter for this case because the only in-tree call-site, net/ipv4/xfrm4_policy.c is never compiled as modular. (CONFIG_XFRM is boolean) Fixes: 2f32b51b609f ("xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08xsk: Fix handling of invalid descriptors in XSK TX batching APIMaciej Fijalkowski2-10/+3
xdpxceiver run on a AF_XDP ZC enabled driver revealed a problem with XSK Tx batching API. There is a test that checks how invalid Tx descriptors are handled by AF_XDP. Each valid descriptor is followed by invalid one on Tx side whereas the Rx side expects only to receive a set of valid descriptors. In current xsk_tx_peek_release_desc_batch() function, the amount of available descriptors is hidden inside xskq_cons_peek_desc_batch(). This can be problematic in cases where invalid descriptors are present due to the fact that xskq_cons_peek_desc_batch() returns only a count of valid descriptors. This means that it is impossible to properly update XSK ring state when calling xskq_cons_release_n(). To address this issue, pull out the contents of xskq_cons_peek_desc_batch() so that callers (currently only xsk_tx_peek_release_desc_batch()) will always be able to update the state of ring properly, as total count of entries is now available and use this value as an argument in xskq_cons_release_n(). By doing so, xskq_cons_peek_desc_batch() can be dropped altogether. Fixes: 9349eb3a9d2a ("xsk: Introduce batched Tx descriptor interfaces") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220607142200.576735-1-maciej.fijalkowski@intel.com