summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2024-09-03netfilter: nf_tables: set element timeout update supportPablo Neira Ayuso1-1/+15
Store new timeout and expiration in transaction object, use them to update elements from .commit path. Otherwise, discard update if .abort path is exercised. Use update_flags in the transaction to note whether the timeout, expiration, or both need to be updated. Annotate access to timeout extension now that it can be updated while lockless read access is possible. Reject timeout updates on elements with no timeout extension. Element transaction remains in the 96 bytes kmalloc slab on x86_64 after this update. This patch requires ("netfilter: nf_tables: use timestamp to check for set element timeout") to make sure an element does not expire while transaction is ongoing. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-03netfilter: nf_tables: zero timeout means element never times outPablo Neira Ayuso1-2/+5
This patch uses zero as timeout marker for those elements that never expire when the element is created. If userspace provides no timeout for an element, then the default set timeout applies. However, if no default set timeout is specified and timeout flag is set on, then timeout extension is allocated and timeout is set to zero to allow for future updates. Use of zero a never timeout marker has been suggested by Phil Sutter. Note that, in older kernels, it is already possible to define elements that never expire by declaring a set with the set timeout flag set on and no global set timeout, in this case, new element with no explicit timeout never expire do not allocate the timeout extension, hence, they never expire. This approach makes it complicated to accomodate element timeout update, because element extensions do not support reallocations. Therefore, allocate the timeout extension and use the new marker for this case, but do not expose it to userspace to retain backward compatibility in the set listing. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-03netfilter: nf_tables: consolidate timeout extension for elementsPablo Neira Ayuso1-10/+8
Expiration and timeout are stored in separated set element extensions, but they are tightly coupled. Consolidate them in a single extension to simplify and prepare for set element updates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-03netfilter: nf_tables: annotate data-races around element expirationPablo Neira Ayuso1-1/+1
element expiration can be read-write locklessly, it can be written by dynset and read from netlink dump, add annotation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-03wifi: cfg80211: wext: Update spelling and grammarSimon Horman1-6/+6
Correct spelling in iw_handler.h. As reported by codespell. Also, while the "few shortcomings" line is being updated, correct its grammar. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240903-wifi-spell-v2-1-bfcf7062face@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-09-03netfilter: nf_tables: Add missing Kernel docSimon Horman2-0/+3
- Add missing documentation of struct field and enum items. - Add missing documentation of function parameter. Flagged by ./scripts/kernel-doc -none. No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-03netfilter: nf_tables: Correct spelling in nf_tables.hSimon Horman1-1/+1
Correct spelling in nf_tables.h. As reported by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-03netfilter: nf_tables: drop unused 3rd argument from validate callback opsFlorian Westphal4-9/+4
Since commit a654de8fdc18 ("netfilter: nf_tables: fix chain dependency validation") the validate() callback no longer needs the return pointer argument. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-31xfrm: Unmask upper DSCP bits in xfrm_get_tos()Ido Schimmel1-2/+0
The function returns a value that is used to initialize 'flowi4_tos' before being passed to the FIB lookup API in the following call chain: xfrm_bundle_create() tos = xfrm_get_tos(fl, family) xfrm_dst_lookup(..., tos, ...) __xfrm_dst_lookup(..., tos, ...) xfrm4_dst_lookup(..., tos, ...) __xfrm4_dst_lookup(..., tos, ...) fl4->flowi4_tos = tos __ip_route_output_key(net, fl4) Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value. Remove IPTOS_RT_MASK since it is no longer used. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-31ipv4: Unmask upper DSCP bits in get_rttos()Ido Schimmel1-1/+4
The function is used by a few socket types to retrieve the TOS value with which to perform the FIB lookup for packets sent through the socket (flowi4_tos). If a DS field was passed using the IP_TOS control message, then it is used. Otherwise the one specified via the IP_TOS socket option. Unmask the upper DSCP bits so that in the future the lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-31ipv4: Unmask upper DSCP bits in ip_sock_rt_tos()Ido Schimmel1-1/+2
The function is used to read the DS field that was stored in IPv4 sockets via the IP_TOS socket option so that it could be used to initialize the flowi4_tos field before resolving an output route. Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-31Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE"Luiz Augusto von Dentz1-5/+0
This reverts commit 59b047bc98084f8af2c41483e4d68a5adf2fa7f7 which breaks compatibility with commands like: bluetoothd[46328]: @ MGMT Command: Load.. (0x0013) plen 74 {0x0001} [hci0] Keys: 2 BR/EDR Address: C0:DC:DA:A5:E5:47 (Samsung Electronics Co.,Ltd) Key type: Authenticated key from P-256 (0x03) Central: 0x00 Encryption size: 16 Diversifier[2]: 0000 Randomizer[8]: 0000000000000000 Key[16]: 6ed96089bd9765be2f2c971b0b95f624 LE Address: D7:2A:DE:1E:73:A2 (Static) Key type: Unauthenticated key from P-256 (0x02) Central: 0x00 Encryption size: 16 Diversifier[2]: 0000 Randomizer[8]: 0000000000000000 Key[16]: 87dd2546ededda380ffcdc0a8faa4597 @ MGMT Event: Command Status (0x0002) plen 3 {0x0001} [hci0] Load Long Term Keys (0x0013) Status: Invalid Parameters (0x0d) Cc: stable@vger.kernel.org Link: https://github.com/bluez/bluez/issues/875 Fixes: 59b047bc9808 ("Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-08-31Bluetooth: hci_sync: Introduce hci_cmd_sync_run/hci_cmd_sync_run_onceLuiz Augusto von Dentz1-0/+4
This introduces hci_cmd_sync_run/hci_cmd_sync_run_once which acts like hci_cmd_sync_queue/hci_cmd_sync_queue_once but runs immediately when already on hdev->cmd_sync_work context. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-08-30ieee802154: Correct spelling in nl802154.hSimon Horman1-1/+1
Correct spelling in nl802154.h. As reported by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Message-ID: <20240829-wpan-spell-v1-2-799d840e02c4@kernel.org> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2024-08-30mac802154: Correct spelling in mac802154.hSimon Horman1-2/+2
Correct spelling in mac802154.h. As reported by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Message-ID: <20240829-wpan-spell-v1-1-799d840e02c4@kernel.org> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2024-08-30icmp: icmp_msgs_per_sec and icmp_msgs_burst sysctls become per netnsEric Dumazet2-3/+2
Previous patch made ICMP rate limits per netns, it makes sense to allow each netns to change the associated sysctl. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240829144641.3880376-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-30icmp: move icmp_global.credit and icmp_global.stamp to per netns storageEric Dumazet2-3/+4
Host wide ICMP ratelimiter should be per netns, to provide better isolation. Following patch in this series makes the sysctl per netns. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240829144641.3880376-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-30icmp: change the order of rate limitsEric Dumazet1-0/+2
ICMP messages are ratelimited : After the blamed commits, the two rate limiters are applied in this order: 1) host wide ratelimit (icmp_global_allow()) 2) Per destination ratelimit (inetpeer based) In order to avoid side-channels attacks, we need to apply the per destination check first. This patch makes the following change : 1) icmp_global_allow() checks if the host wide limit is reached. But credits are not yet consumed. This is deferred to 3) 2) The per destination limit is checked/updated. This might add a new node in inetpeer tree. 3) icmp_global_consume() consumes tokens if prior operations succeeded. This means that host wide ratelimit is still effective in keeping inetpeer tree small even under DDOS. As a bonus, I removed icmp_global.lock as the fast path can use a lock-free operation. Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited") Fixes: 4cdf507d5452 ("icmp: add a global rate limitation") Reported-by: Keyu Man <keyu.man@email.ucr.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240829144641.3880376-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-8/+11
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/faraday/ftgmac100.c 4186c8d9e6af ("net: ftgmac100: Ensure tx descriptor updates are visible") e24a6c874601 ("net: ftgmac100: Get link speed and duplex for NC-SI") https://lore.kernel.org/0b851ec5-f91d-4dd3-99da-e81b98c9ed28@kernel.org net/ipv4/tcp.c bac76cf89816 ("tcp: fix forever orphan socket caused by tcp_abort") edefba66d929 ("tcp: rstreason: introduce SK_RST_REASON_TCP_STATE for active reset") https://lore.kernel.org/20240828112207.5c199d41@canb.auug.org.au No adjacent changes. Link: https://patch.msgid.link/20240829130829.39148-1-pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-29Merge tag 'nf-24-08-28' of ↵Paolo Abeni2-6/+9
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: Patch #1 sets on NFT_PKTINFO_L4PROTO for UDP packets less than 4 bytes payload from netdev/egress by subtracting skb_network_offset() when validating IPv4 packet length, otherwise 'meta l4proto udp' never matches. Patch #2 subtracts skb_network_offset() when validating IPv6 packet length for netdev/egress. netfilter pull request 24-08-28 * tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables_ipv6: consider network offset in netdev/egress validation netfilter: nf_tables: restore IP sanity checks for netdev/egress ==================== Link: https://patch.msgid.link/20240828214708.619261-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-29net: busy-poll: use ktime_get_ns() instead of local_clock()Eric Dumazet1-1/+1
Typically, busy-polling durations are below 100 usec. When/if the busy-poller thread migrates to another cpu, local_clock() can be off by +/-2msec or more for small values of HZ, depending on the platform. Use ktimer_get_ns() to ensure deterministic behavior, which is the whole point of busy-polling. Fixes: 060212928670 ("net: add low latency socket poll") Fixes: 9a3c71aa8024 ("net: convert low latency sockets to sched_clock()") Fixes: 37089834528b ("sched, net: Fixup busy_loop_us_clock()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240827114916.223377-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-29tcp: remove volatile qualifier on tw_substateEric Dumazet1-1/+1
Using a volatile qualifier for a specific struct field is unusual. Use instead READ_ONCE()/WRITE_ONCE() where necessary. tcp_timewait_state_process() can change tw_substate while other threads are reading this field. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20240827015250.3509197-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-28xfrm: minor update to sdb and xfrm_policy commentsFlorian Westphal1-5/+35
The spd is no longer maintained as a linear list. We also haven't been caching bundles in the xfrm_policy struct since 2010. While at it, add kdoc style comments for the xfrm_policy structure and extend the description of the current rbtree based search to mention why it needs to search the candidate set. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-28net: mana: Implement get_ringparam/set_ringparam for manaShradha Gupta1-3/+20
Currently the values of WQs for RX and TX queues for MANA devices are hardcoded to default sizes. Allow configuring these values for MANA devices as ringparam configuration(get/set) through ethtool_ops. Pre-allocate buffers at the beginning of this operation, to prevent complete network loss in low-memory conditions. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://patch.msgid.link/1724688461-12203-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27bonding: change ipsec_lock from spin lock to mutexJianbo Liu1-1/+1
In the cited commit, bond->ipsec_lock is added to protect ipsec_list, hence xdo_dev_state_add and xdo_dev_state_delete are called inside this lock. As ipsec_lock is a spin lock and such xfrmdev ops may sleep, "scheduling while atomic" will be triggered when changing bond's active slave. [ 101.055189] BUG: scheduling while atomic: bash/902/0x00000200 [ 101.055726] Modules linked in: [ 101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1 [ 101.058760] Hardware name: [ 101.059434] Call Trace: [ 101.059436] <TASK> [ 101.060873] dump_stack_lvl+0x51/0x60 [ 101.061275] __schedule_bug+0x4e/0x60 [ 101.061682] __schedule+0x612/0x7c0 [ 101.062078] ? __mod_timer+0x25c/0x370 [ 101.062486] schedule+0x25/0xd0 [ 101.062845] schedule_timeout+0x77/0xf0 [ 101.063265] ? asm_common_interrupt+0x22/0x40 [ 101.063724] ? __bpf_trace_itimer_state+0x10/0x10 [ 101.064215] __wait_for_common+0x87/0x190 [ 101.064648] ? usleep_range_state+0x90/0x90 [ 101.065091] cmd_exec+0x437/0xb20 [mlx5_core] [ 101.065569] mlx5_cmd_do+0x1e/0x40 [mlx5_core] [ 101.066051] mlx5_cmd_exec+0x18/0x30 [mlx5_core] [ 101.066552] mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core] [ 101.067163] ? bonding_sysfs_store_option+0x4d/0x80 [bonding] [ 101.067738] ? kmalloc_trace+0x4d/0x350 [ 101.068156] mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core] [ 101.068747] mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core] [ 101.069312] bond_change_active_slave+0x392/0x900 [bonding] [ 101.069868] bond_option_active_slave_set+0x1c2/0x240 [bonding] [ 101.070454] __bond_opt_set+0xa6/0x430 [bonding] [ 101.070935] __bond_opt_set_notify+0x2f/0x90 [bonding] [ 101.071453] bond_opt_tryset_rtnl+0x72/0xb0 [bonding] [ 101.071965] bonding_sysfs_store_option+0x4d/0x80 [bonding] [ 101.072567] kernfs_fop_write_iter+0x10c/0x1a0 [ 101.073033] vfs_write+0x2d8/0x400 [ 101.073416] ? alloc_fd+0x48/0x180 [ 101.073798] ksys_write+0x5f/0xe0 [ 101.074175] do_syscall_64+0x52/0x110 [ 101.074576] entry_SYSCALL_64_after_hwframe+0x4b/0x53 As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called from bond_change_active_slave, which requires holding the RTNL lock. And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user context. So ipsec_lock doesn't have to be spin lock, change it to mutex, and thus the above issue can be resolved. Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20240823031056.110999-4-jianbol@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27netfilter: nf_tables_ipv6: consider network offset in netdev/egress validationPablo Neira Ayuso1-2/+3
From netdev/egress, skb->len can include the ethernet header, therefore, subtract network offset from skb->len when validating IPv6 packet length. Fixes: 42df6e1d221d ("netfilter: Introduce egress hook") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-27wifi: mac80211: export ieee80211_purge_tx_queue() for driversPing-Ke Shih1-0/+13
Drivers need to purge TX SKB when stopping. Using skb_queue_purge() can't report TX status to mac80211, causing ieee80211_free_ack_frame() warns "Have pending ack frames!". Export ieee80211_purge_tx_queue() for drivers to not have to reimplement it. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/20240822014255.10211-1-pkshih@realtek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-08-27wifi: mac80211: Add non-atomic station iteratorRory Little1-0/+18
Drivers may at times want to iterate their stations with a function which requires some non-atomic operations. ieee80211_iterate_stations_mtx() introduces an API to iterate stations while holding that wiphy's mutex. This allows the iterating function to do non-atomic operations safely. Signed-off-by: Rory Little <rory@candelatech.com> Link: https://patch.msgid.link/20240806004024.2014080-2-rory@candelatech.com [unify internal list iteration functions] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-08-27wifi: lib80211: Handle const struct lib80211_crypto_ops in lib80211Christophe JAILLET1-4/+4
lib80211_register_crypto_ops() and lib80211_unregister_crypto_ops() don't modify their "struct lib80211_crypto_ops *ops" argument. So, it can be declared as const. Doing so, some adjustments are needed to also constify some date in "struct lib80211_crypt_data", "struct lib80211_crypto_alg" and the return value of lib80211_get_crypto_ops(). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/c74085e02f33a11327582b19c9f51c3236e85ae2.1722839425.git.christophe.jaillet@wanadoo.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-08-27wifi: mac80211: don't use rate mask for offchannel TX eitherPing-Ke Shih1-3/+4
Like the commit ab9177d83c04 ("wifi: mac80211: don't use rate mask for scanning"), ignore incorrect settings to avoid no supported rate warning reported by syzbot. The syzbot did bisect and found cause is commit 9df66d5b9f45 ("cfg80211: fix default HE tx bitrate mask in 2G band"), which however corrects bitmask of HE MCS and recognizes correctly settings of empty legacy rate plus HE MCS rate instead of returning -EINVAL. As suggestions [1], follow the change of SCAN TX to consider this case of offchannel TX as well. [1] https://lore.kernel.org/linux-wireless/6ab2dc9c3afe753ca6fdcdd1421e7a1f47e87b84.camel@sipsolutions.net/T/#m2ac2a6d2be06a37c9c47a3d8a44b4f647ed4f024 Reported-by: syzbot+8dd98a9e98ee28dc484a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-wireless/000000000000fdef8706191a3f7b@google.com/ Fixes: 9df66d5b9f45 ("cfg80211: fix default HE tx bitrate mask in 2G band") Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/20240729074816.20323-1-pkshih@realtek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-08-27wifi: mac80211: refactor block ack management codeDmitry Antipov1-1/+1
Introduce 'ieee80211_mgmt_ba()' to avoid code duplication between 'ieee80211_send_addba_resp()', 'ieee80211_send_addba_request()', and 'ieee80211_send_delba()', ensure that all related addresses are '__aligned(2)', and prefer convenient 'ether_addr_copy()' over generic 'memcpy()'. No functional changes expected. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Link: https://patch.msgid.link/20240725090925.6022-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-08-26net: Correct spelling in headersSimon Horman15-29/+29
Correct spelling in Networking headers. As reported by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240822-net-spell-v1-12-3a98971ce2d2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26x25: Correct spelling in x25.hSimon Horman1-1/+1
Correct spelling in x25.h As reported by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Martin Schiller <ms@dev.tdt.de> Link: https://patch.msgid.link/20240822-net-spell-v1-11-3a98971ce2d2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26sctp: Correct spelling in headersSimon Horman2-11/+11
Correct spelling in sctp.h and structs.h. As reported by codespell. Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Simon Horman <horms@kernel.org> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20240822-net-spell-v1-10-3a98971ce2d2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26net: sched: Correct spelling in headersSimon Horman2-5/+5
Correct spelling in pkt_cls.h and red.h. As reported by codespell. Cc: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240822-net-spell-v1-9-3a98971ce2d2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26NFC: Correct spelling in headersSimon Horman2-5/+5
Correct spelling in NFC headers. As reported by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20240822-net-spell-v1-8-3a98971ce2d2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26netlabel: Correct spelling in netlabel.hSimon Horman1-1/+1
Correct spelling in netlabel.h. As reported by codespell. Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240822-net-spell-v1-7-3a98971ce2d2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26bonding: Correct spelling in headersSimon Horman2-2/+5
Correct spelling in bond_3ad.h and bond_alb.h. As reported by codespell. Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240822-net-spell-v1-5-3a98971ce2d2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26ipv6: Correct spelling in ipv6.hSimon Horman1-2/+2
Correct spelling in ip_tunnels.h As reported by codespell. Cc: David Ahern <dsahern@kernel.org> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240822-net-spell-v1-4-3a98971ce2d2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26ip_tunnel: Correct spelling in ip_tunnels.hSimon Horman1-1/+1
Correct spelling in ip_tunnels.h As reported by codespell. Cc: David Ahern <dsahern@kernel.org> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240822-net-spell-v1-3-3a98971ce2d2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26s390/iucv: Correct spelling in iucv.hSimon Horman1-1/+1
Correct spelling in iucv.h As reported by codespell. Cc: Alexandra Winter <wintera@linux.ibm.com> Cc: Thorsten Winkler <twinkler@linux.ibm.com> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240822-net-spell-v1-2-3a98971ce2d2@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26Merge tag 'nf-next-24-08-23' of ↵Jakub Kicinski2-5/+7
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following batch contains Netfilter updates for net-next: Patch #1 fix checksum calculation in nfnetlink_queue with SCTP, segment GSO packet since skb_zerocopy() does not support GSO_BY_FRAGS, from Antonio Ojea. Patch #2 extend nfnetlink_queue coverage to handle SCTP packets, from Antonio Ojea. Patch #3 uses consume_skb() instead of kfree_skb() in nfnetlink, from Donald Hunter. Patch #4 adds a dedicate commit list for sets to speed up intra-transaction lookups, from Florian Westphal. Patch #5 skips removal of element from abort path for the pipapo backend, ditching the shadow copy of this datastructure is sufficient. Patch #6 moves nf_ct_netns_get() out of nf_conncount_init() to let users of conncoiunt decide when to enable conntrack, this is needed by openvswitch, from Xin Long. Patch #7 pass context to all nft_parse_register_load() in preparation for the next patch. Patches #8 and #9 reject loads from uninitialized registers from control plane to remove register initialization from datapath. From Florian Westphal. * tag 'nf-next-24-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: don't initialize registers in nft_do_chain() netfilter: nf_tables: allow loads only when register is initialized netfilter: nf_tables: pass context structure to nft_parse_register_load netfilter: move nf_ct_netns_get out of nf_conncount_init netfilter: nf_tables: do not remove elements if set backend implements .abort netfilter: nf_tables: store new sets in dedicated list netfilter: nfnetlink: convert kfree_skb to consume_skb selftests: netfilter: nft_queue.sh: sctp coverage netfilter: nfnetlink_queue: unbreak SCTP traffic ==================== Link: https://patch.msgid.link/20240822221939.157858-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26netfilter: nf_tables: restore IP sanity checks for netdev/egressPablo Neira Ayuso1-4/+6
Subtract network offset to skb->len before performing IPv4 header sanity checks, then adjust transport offset from offset from mac header. Jorge Ortiz says: When small UDP packets (< 4 bytes payload) are sent from eth0, `meta l4proto udp` condition is not met because `NFT_PKTINFO_L4PROTO` is not set. This happens because there is a comparison that checks if the transport header offset exceeds the total length. This comparison does not take into account the fact that the skb network offset might be non-zero in egress mode (e.g., 14 bytes for Ethernet header). Fixes: 0ae8e4cca787 ("netfilter: nf_tables: set transport offset from mac header for netdev/egress") Reported-by: Jorge Ortiz <jorge.ortiz.escribano@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-24xfrm: policy: remove remaining use of inexact listFlorian Westphal1-1/+0
No consumers anymore, remove it. After this, insertion of policies no longer require list walk of all inexact policies but only those that are reachable via the candidate sets. This gives almost linear insertion speeds provided the inserted policies are for non-overlapping networks. Before: Inserted 1000 policies in 70 ms Inserted 10000 policies in 1155 ms Inserted 100000 policies in 216848 ms After: Inserted 1000 policies in 56 ms Inserted 10000 policies in 478 ms Inserted 100000 policies in 4580 ms Insertion of 1m entries takes about ~40s after this change on my test vm. Cc: Noel Kuntze <noel@familie-kuntze.de> Cc: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-23xfrm: Correct spelling in xfrm.hSimon Horman1-2/+2
Correct spelling in xfrm.h. As reported by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-14/+22
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.h c948c0973df5 ("bnxt_en: Don't clear ntuple filters and rss contexts during ethtool ops") f2878cdeb754 ("bnxt_en: Add support to call FW to update a VNIC") Link: https://patch.msgid.link/20240822210125.1542769-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21net: Silence false field-spanning write warning in metadata_dst memcpyGal Pressman1-2/+5
When metadata_dst struct is allocated (using metadata_dst_alloc()), it reserves room for options at the end of the struct. Change the memcpy() to unsafe_memcpy() as it is guaranteed that enough room (md_size bytes) was allocated and the field-spanning write is intentional. This resolves the following warning: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 104) of single field "&new_md->u.tun_info" at include/net/dst_metadata.h:166 (size 96) WARNING: CPU: 2 PID: 391470 at include/net/dst_metadata.h:166 tun_dst_unclone+0x114/0x138 [geneve] Modules linked in: act_tunnel_key geneve ip6_udp_tunnel udp_tunnel act_vlan act_mirred act_skbedit cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress sbsa_gwdt ipmi_devintf ipmi_msghandler xfrm_interface xfrm6_tunnel tunnel6 tunnel4 xfrm_user xfrm_algo nvme_fabrics overlay optee openvswitch nsh nf_conncount ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser rdma_cm ib_umad iw_cm libiscsi ib_ipoib scsi_transport_iscsi ib_cm uio_pdrv_genirq uio mlxbf_pmc pwr_mlxbf mlxbf_bootctl bluefield_edac nft_chain_nat binfmt_misc xt_MASQUERADE nf_nat xt_tcpmss xt_NFLOG nfnetlink_log xt_recent xt_hashlimit xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_comment ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink sch_fq_codel dm_multipath fuse efi_pstore ip_tables btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 nvme nvme_core mlx5_ib ib_uverbs ib_core ipv6 crc_ccitt mlx5_core crct10dif_ce mlxfw psample i2c_mlxbf gpio_mlxbf2 mlxbf_gige mlxbf_tmfifo CPU: 2 PID: 391470 Comm: handler6 Not tainted 6.10.0-rc1 #1 Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.5.0.12993 Dec 6 2023 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : tun_dst_unclone+0x114/0x138 [geneve] lr : tun_dst_unclone+0x114/0x138 [geneve] sp : ffffffc0804533f0 x29: ffffffc0804533f0 x28: 000000000000024e x27: 0000000000000000 x26: ffffffdcfc0e8e40 x25: ffffff8086fa6600 x24: ffffff8096a0c000 x23: 0000000000000068 x22: 0000000000000008 x21: ffffff8092ad7000 x20: ffffff8081e17900 x19: ffffff8092ad7900 x18: 00000000fffffffd x17: 0000000000000000 x16: ffffffdcfa018488 x15: 695f6e75742e753e x14: 2d646d5f77656e26 x13: 6d5f77656e262220 x12: 646c65696620656c x11: ffffffdcfbe33ae8 x10: ffffffdcfbe1baa8 x9 : ffffffdcfa0a4c10 x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001 x5 : ffffff83fdeeb010 x4 : 0000000000000000 x3 : 0000000000000027 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80913f6780 Call trace: tun_dst_unclone+0x114/0x138 [geneve] geneve_xmit+0x214/0x10e0 [geneve] dev_hard_start_xmit+0xc0/0x220 __dev_queue_xmit+0xa14/0xd38 dev_queue_xmit+0x14/0x28 [openvswitch] ovs_vport_send+0x98/0x1c8 [openvswitch] do_output+0x80/0x1a0 [openvswitch] do_execute_actions+0x172c/0x1958 [openvswitch] ovs_execute_actions+0x64/0x1a8 [openvswitch] ovs_packet_cmd_execute+0x258/0x2d8 [openvswitch] genl_family_rcv_msg_doit+0xc8/0x138 genl_rcv_msg+0x1ec/0x280 netlink_rcv_skb+0x64/0x150 genl_rcv+0x40/0x60 netlink_unicast+0x2e4/0x348 netlink_sendmsg+0x1b0/0x400 __sock_sendmsg+0x64/0xc0 ____sys_sendmsg+0x284/0x308 ___sys_sendmsg+0x88/0xf0 __sys_sendmsg+0x70/0xd8 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x50/0x128 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x38/0x100 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x1a4/0x1a8 ---[ end trace 0000000000000000 ]--- Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240818114351.3612692-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20ipv4: Centralize TOS matchingIdo Schimmel1-0/+6
The TOS field in the IPv4 flow information structure ('flowi4_tos') is matched by the kernel against the TOS selector in IPv4 rules and routes. The field is initialized differently by different call sites. Some treat it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC 791 TOS and initialize it using IPTOS_RT_MASK. What is common to all these call sites is that they all initialize the lower three DSCP bits, which fits the TOS definition in the initial IPv4 specification (RFC 791). Therefore, the kernel only allows configuring IPv4 FIB rules that match on the lower three DSCP bits which are always guaranteed to be initialized by all call sites: # ip -4 rule add tos 0x1c table 100 # ip -4 rule add tos 0x3c table 100 Error: Invalid tos. While this works, it is unlikely to be very useful. RFC 791 that initially defined the TOS and IP precedence fields was updated by RFC 2474 over twenty five years ago where these fields were replaced by a single six bits DSCP field. Extending FIB rules to match on DSCP can be done by adding a new DSCP selector while maintaining the existing semantics of the TOS selector for applications that rely on that. A prerequisite for allowing FIB rules to match on DSCP is to adjust all the call sites to initialize the high order DSCP bits and remove their masking along the path to the core where the field is matched on. However, making this change alone will result in a behavior change. For example, a forwarded IPv4 packet with a DS field of 0xfc will no longer match a FIB rule that was configured with 'tos 0x1c'. This behavior change can be avoided by masking the upper three DSCP bits in 'flowi4_tos' before comparing it against the TOS selectors in FIB rules and routes. Implement the above by adding a new function that checks whether a given DSCP value matches the one specified in the IPv4 flow information structure and invoke it from the three places that currently match on 'flowi4_tos'. Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK since the latter is not uAPI and we should be able to remove it at some point. Include <linux/ip.h> in <linux/in_route.h> since the former defines IPTOS_TOS_MASK which is used in the definition of RT_TOS() in <linux/in_route.h>. No regressions in FIB tests: # ./fib_tests.sh [...] Tests passed: 218 Tests failed: 0 And FIB rule tests: # ./fib_rule_tests.sh [...] Tests passed: 116 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20netfilter: nf_tables: allow loads only when register is initializedFlorian Westphal1-0/+1
Reject rules where a load occurs from a register that has not seen a store early in the same rule. commit 4c905f6740a3 ("netfilter: nf_tables: initialize registers in nft_do_chain()") had to add a unconditional memset to the nftables register space to avoid leaking stack information to userspace. This memset shows up in benchmarks. After this change, this commit can be reverted again. Note that this breaks userspace compatibility, because theoretically you can do rule 1: reg2 := meta load iif, reg2 == 1 jump ... rule 2: reg2 == 2 jump ... // read access with no store in this rule ... after this change this is rejected. Neither nftables nor iptables-nft generate such rules, each rule is always standalone. This resuts in a small increase of nft_ctx structure by sizeof(long). To cope with hypothetical rulesets like the example above one could emit on-demand "reg[x] = 0" store when generating the datapath blob in nf_tables_commit_chain_prepare(). A patch that does this is linked to below. For now, lets disable this. In nf_tables, a rule is the smallest unit that can be replaced from userspace, i.e. a hypothetical ruleset that relies on earlier initialisations of registers can't be changed at will as register usage would need to be coordinated. Link: https://lore.kernel.org/netfilter-devel/20240627135330.17039-4-fw@strlen.de/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-20netfilter: nf_tables: pass context structure to nft_parse_register_loadFlorian Westphal1-1/+2
Mechanical transformation, no logical changes intended. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>