summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-09-03net: broadcom: Fix return type for implementation ofGUO Zihua1-1/+1
Since Linux now supports CFI, it will be a good idea to fix mismatched return type for implementation of hooks. Otherwise this might get cought out by CFI and cause a panic. bcm4908_enet_start_xmit() would return either NETDEV_TX_BUSY or NETDEV_TX_OK, so change the return type to netdev_tx_t directly. Signed-off-by: GUO Zihua <guozihua@huawei.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220902075407.52358-1-guozihua@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-03net: ieee802154: Fix compilation error when ↵Gal Pressman1-4/+2
CONFIG_IEEE802154_NL802154_EXPERIMENTAL is disabled When CONFIG_IEEE802154_NL802154_EXPERIMENTAL is disabled, NL802154_CMD_DEL_SEC_LEVEL is undefined and results in a compilation error: net/ieee802154/nl802154.c:2503:19: error: 'NL802154_CMD_DEL_SEC_LEVEL' undeclared here (not in a function); did you mean 'NL802154_CMD_SET_CCA_ED_LEVEL'? 2503 | .resv_start_op = NL802154_CMD_DEL_SEC_LEVEL + 1, | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | NL802154_CMD_SET_CCA_ED_LEVEL Unhide the experimental commands, having them defined in an enum makes no difference. Fixes: 9c5d03d36251 ("genetlink: start to validate reserved header bytes") Signed-off-by: Gal Pressman <gal@nvidia.com> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://lore.kernel.org/r/20220902030620.2737091-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-02net: remove netif_tx_napi_add()Jakub Kicinski1-2/+0
All callers are now gone. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02net: bql: add more documentationEric Dumazet1-6/+26
Add some documentation for netdev_tx_sent_queue() and netdev_tx_completed_queue() Stating that netdev_tx_completed_queue() must be called once per TX completion round is apparently not obvious for everybody. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02Merge branch 'net-ipa-transaction-state-IDs'David S. Miller2-14/+84
Alex Elder says: ==================== net: ipa: use IDs to track transaction state This series is the first of three groups of changes that simplify the way the IPA driver tracks the state of its transactions. Each GSI channel has a fixed number of transactions allocated at initialization time. The number allocated matches the number of TREs in the transfer ring associated with the channel. This is because the transfer ring limits the number of transfers that can ever be underway, and in the worst case, each transaction represents a single TRE. Transactions go through various states during their lifetime. Currently a set of lists keeps track of which transactions are in each state. Initially, all transactions are free. An allocated transaction is placed on the allocated list. Once an allocated transaction is committed, it is moved from the allocated to the committed list. When a committed transaction is sent to hardware (via a doorbell) it is moved to the pending list. When hardware signals that some work has completed, transactions are moved to the completed list. Finally, when a completed transaction is polled it's moved to the polled list before being removed when it becomes free. Changing a transaction's state thus normally involves manipulating two lists, and to prevent corruption a spinlock is held while the lists are updated. Transactions move through their states in a well-defined sequence though, and they do so strictly in order. So transaction 0 is always allocated before transaction 1; transaction 0 is always committed before transaction 1; and so on, through completion, polling, and becoming free. Because of this, it's sufficient to just keep track of which transaction is the first in each state. The rest of the transactions in a given state can be derived from the first transaction in an "adjacent" state. As a result, we can track the state of all transactions with a set of indexes, and can update these without the need for a spinlock. This first group of patches just defines the set of indexes that will be used for this new way of tracking transaction state. Two more groups of patches will follow. I've broken the 17 patches into these three groups to facilitate review. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02net: ipa: track polled transactions with an IDAlex Elder2-16/+24
Add a transaction ID to track the first element in the transaction array that has been polled. Advance the ID when we are releasing a transaction. Temporarily add warnings that verify that the first polled transaction tracked by the ID matches the first element on the polled list, both when polling and freeing. Remove the temporary warnings added by the previous commit. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02net: ipa: track completed transactions with an IDAlex Elder2-17/+24
Add a transaction ID field to track the first element in the transaction array that has completed but has not yet been polled. Advance the ID when we are processing a transaction in the NAPI polling loop (where completed transactions become polled). Temporarily add warnings that verify that the first completed transaction tracked by the ID matches the first element on the completed list, both when pending and completing. Remove the temporary warnings added by the previous commit. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02net: ipa: track pending transactions with an IDAlex Elder2-18/+29
Add a transaction ID field to track the first element in the transaction array that is pending (sent to hardware) but not yet complete. Advance the ID when a completion event for a channel indicates that transactions have completed. Temporarily add warnings that verify that the first pending transaction tracked by the ID matches the first element on the pending list, both when pending and completing, as well as when resetting the channel. Remove the temporary warnings added by the previous commit. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02net: ipa: track committed transactions with an IDAlex Elder2-17/+29
Add a transaction ID field to track the first element in a channel's transaction array that has been committed, but not yet passed to the hardware. Advance the ID when the hardware is notified via doorbell that TREs from a transaction are ready for consumption. Temporarily add warnings that verify that the first committed transaction tracked by the ID matches the first element on the committed list, both when committing and pending (at doorbell). Remove the temporary warnings added by the previous commit. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02net: ipa: track allocated transactions with an IDAlex Elder2-2/+27
Transactions for a channel are now managed in an array, with a free transaction ID indicating which is the next one free. Add another transaction ID field to track the first element in the array that has been allocated. Advance it when a transaction is committed (because that is when that transaction leaves allocated state). Temporarily add warnings that verify that the first allocated transaction tracked by the ID matches the first element on the allocated list, both when allocating and committing a transaction. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02net: ipa: use an array for transactionsAlex Elder2-13/+20
Transactions are always allocated one at a time. The maximum number of them we could ever need occurs if each TRE is assigned to a transaction. So a channel requires no more transactions than the number of TREs in its transfer ring. That number is known to be a power-of-2 less than 65536. The transaction pool abstraction is used for other things, but for transactions we can use a simple array of transaction structures, and use a free index to indicate which entry in the array is the next one free for allocation. By having the number of elements in the array be a power-of-2, we can use an ever-incrementing 16-bit free index, and use it modulo the array size. Distinguish a "trans_id" (whose value can exceed the number of entries in the transaction array) from a "trans_index" (which is less than the number of entries). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02Merge branch 'lan966x-make-reset-optional'David S. Miller2-3/+2
Michael Walle says: ==================== net: lan966x: make reset optional This is the remaining part of the reset rework on the LAN966x targetting the netdev tree. The former series can be found at: https://lore.kernel.org/lkml/20220826115607.1148489-1-michael@walle.cc/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02net: lan966x: make reset optionalMichael Walle1-1/+2
There is no dedicated reset for just the switch core. The reset which is used up until now, is more of a global reset, resetting almost the whole SoC and cause spurious errors by doing so. Make it possible to handle the reset elsewhere and make the reset optional. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02dt-bindings: net: sparx5: don't require a reset lineMichael Walle1-2/+0
Make the reset line optional. It turns out, there is no dedicated reset for the switch. Instead, the reset which was used up until now, was kind of a global reset. This is now handled elsewhere, thus don't require a reset. Signed-off-by: Michael Walle <michael@walle.cc> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02ipv6: tcp: send consistent autoflowlabel in SYN_RECV stateEric Dumazet1-12/+14
This is a followup of commit c67b85558ff2 ("ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state"), but for SYN_RECV state. In some cases, TCP sends a challenge ACK on behalf of a SYN_RECV request. WHen this happens, we want to use the flow label that was used when the prior SYNACK packet was sent, instead of another one. After his patch, following packetdrill passes: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +.2 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > (flowlabel 0x11) S. 0:0(0) ack 1 <...> // Test if a challenge ack is properly sent (same flowlabel than prior SYNACK) +.01 < . 4000000000:4000000000(0) ack 1 win 320 +0 > (flowlabel 0x11) . 1:1(0) ack 1 Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220831203729.458000-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-02selftests: net: dsa: symlink the tc_actions.sh testVladimir Oltean3-1/+4
This has been validated on the Ocelot/Felix switch family (NXP LS1028A) and should be relevant to any switch driver that offloads the tc-flower and/or tc-matchall actions trap, drop, accept, mirred, for which DSA has operations. TEST: gact drop and ok (skip_hw) [ OK ] TEST: mirred egress flower redirect (skip_hw) [ OK ] TEST: mirred egress flower mirror (skip_hw) [ OK ] TEST: mirred egress matchall mirror (skip_hw) [ OK ] TEST: mirred_egress_to_ingress (skip_hw) [ OK ] TEST: gact drop and ok (skip_sw) [ OK ] TEST: mirred egress flower redirect (skip_sw) [ OK ] TEST: mirred egress flower mirror (skip_sw) [ OK ] TEST: mirred egress matchall mirror (skip_sw) [ OK ] TEST: trap (skip_sw) [ OK ] TEST: mirred_egress_to_ingress (skip_sw) [ OK ] Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220831170839.931184-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-02netdevsim: remove redundant variable retJinpeng Cui1-3/+2
Return value directly from nsim_dev_reload_create() instead of getting value from redundant variable ret. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Jinpeng Cui <cui.jinpeng2@zte.com.cn> Link: https://lore.kernel.org/r/20220831154329.305372-1-cui.jinpeng2@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-02net: rtnetlink: use netif_oper_up instead of open codeJuhee Kang1-4/+2
The open code is defined as a new helper function(netif_oper_up) on netdev.h, the code is dev->operstate == IF_OPER_UP || dev->operstate == IF_OPER_UNKNOWN. Thus, replace the open code to netif_oper_up. This patch doesn't change logic. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Link: https://lore.kernel.org/r/20220831125845.1333-1-claudiajkang@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-02net: sched: etf: remove true check in etf_enable_offload()Zhengchao Shao1-3/+0
etf_enable_offload() is only called when q->offload is false in etf_init(). So remove true check in etf_enable_offload(). Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/20220831092919.146149-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski331-1422/+2536
tools/testing/selftests/net/.gitignore sort the net-next version and use it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01Merge tag 'net-6.0-rc4' of ↵Linus Torvalds70-249/+478
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth, bpf and wireless. Current release - regressions: - bpf: - fix wrong last sg check in sk_msg_recvmsg() - fix kernel BUG in purge_effective_progs() - mac80211: - fix possible leak in ieee80211_tx_control_port() - potential NULL dereference in ieee80211_tx_control_port() Current release - new code bugs: - nfp: fix the access to management firmware hanging Previous releases - regressions: - ip: fix triggering of 'icmp redirect' - sched: tbf: don't call qdisc_put() while holding tree lock - bpf: fix corrupted packets for XDP_SHARED_UMEM - bluetooth: hci_sync: fix suspend performance regression - micrel: fix probe failure Previous releases - always broken: - tcp: make global challenge ack rate limitation per net-ns and default disabled - tg3: fix potential hang-up on system reboot - mac802154: fix reception for no-daddr packets Misc: - r8152: add PID for the lenovo onelink+ dock" * tag 'net-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits) net/smc: Remove redundant refcount increase Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb" tcp: make global challenge ack rate limitation per net-ns and default disabled tcp: annotate data-race around challenge_timestamp net: dsa: hellcreek: Print warning only once ip: fix triggering of 'icmp redirect' sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb selftests: net: sort .gitignore file Documentation: networking: correct possessive "its" kcm: fix strp_init() order and cleanup mlxbf_gige: compute MDIO period based on i1clk ethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler net: lan966x: improve error handle in lan966x_fdma_rx_get_frame() nfp: fix the access to management firmware hanging net: phy: micrel: Make the GPIO to be non-exclusive net: virtio_net: fix notification coalescing comments net/sched: fix netdevice reference leaks in attach_default_qdiscs() net: sched: tbf: don't call qdisc_put() while holding tree lock net: Use u64_stats_fetch_begin_irq() for stats fetch. net: dsa: xrs700x: Use irqsave variant for u64 stats update ...
2022-09-01Merge tag 'slab-for-6.0-rc4' of ↵Linus Torvalds1-16/+29
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: - A fix from Waiman Long to avoid a theoretical deadlock reported by lockdep. * tag 'slab-for-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock
2022-09-01Merge tag 'sound-6.0-rc4' of ↵Linus Torvalds7-34/+146
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Just handful changes at this time. The only major change is the regression fix about the x86 WC-page buffer allocation. The rest are trivial data-race fixes for ALSA sequencer core, the possible out-of-bounds access fixes in the new ALSA control hash code, and a few device-specific workarounds and fixes" * tag 'sound-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Add quirk for LH Labs Geek Out HD Audio 1V5 ALSA: hda/realtek: Add speaker AMP init for Samsung laptops with ALC298 ALSA: control: Re-order bounds checking in get_ctl_id_hash() ALSA: control: Fix an out-of-bounds bug in get_ctl_id_hash() ALSA: hda: intel-nhlt: Correct the handling of fmt_config flexible array ALSA: seq: Fix data-race at module auto-loading ALSA: seq: oss: Fix data-race for max_midi_devs access ALSA: memalloc: Revive x86-specific WC page allocations again
2022-09-01net: sched: gred: remove NULL check before free table->tab in gred_destroy()Zhengchao Shao1-4/+3
The kfree invoked by gred_destroy_vq checks whether the input parameter is empty. Therefore, gred_destroy() doesn't need to check table->tab. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220831041452.33026-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-01mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding ↵Waiman Long1-16/+29
slab_mutex/cpu_hotplug_lock A circular locking problem is reported by lockdep due to the following circular locking dependency. +--> cpu_hotplug_lock --> slab_mutex --> kn->active --+ | | +-----------------------------------------------------+ The forward cpu_hotplug_lock ==> slab_mutex ==> kn->active dependency happens in kmem_cache_destroy(): cpus_read_lock(); mutex_lock(&slab_mutex); ==> sysfs_slab_unlink() ==> kobject_del() ==> kernfs_remove() ==> __kernfs_remove() ==> kernfs_drain(): rwsem_acquire(&kn->dep_map, ...); The backward kn->active ==> cpu_hotplug_lock dependency happens in kernfs_fop_write_iter(): kernfs_get_active(); ==> slab_attr_store() ==> cpu_partial_store() ==> flush_all(): cpus_read_lock() One way to break this circular locking chain is to avoid holding cpu_hotplug_lock and slab_mutex while deleting the kobject in sysfs_slab_unlink() which should be equivalent to doing a write_lock and write_unlock pair of the kn->active virtual lock. Since the kobject structures are not protected by slab_mutex or the cpu_hotplug_lock, we can certainly release those locks before doing the delete operation. Move sysfs_slab_unlink() and sysfs_slab_release() to the newly created kmem_cache_release() and call it outside the slab_mutex & cpu_hotplug_lock critical sections. There will be a slight delay in the deletion of sysfs files if kmem_cache_release() is called indirectly from a work function. Fixes: 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context") Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: David Rientjes <rientjes@google.com> Link: https://lore.kernel.org/all/YwOImVd+nRUsSAga@hyeyoo/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-09-01Merge branch 'rk3588-ethernet-support'Paolo Abeni3-0/+163
Sebastian Reichel says: ==================== RK3588 Ethernet Support This adds ethernet support for the RK3588(s) SoCs. Changes since PATCHv2: * Rebased to v6.0-rc1 * Wrap DT bindings additions at 80 characters * Add Acked-by from Krzysztof Changes since PATCHv1: * Drop first patch (not required) and rebase second patch accordingly * Rename DT property rockchip,php_grf -> rockchip,php-grf ==================== Link: https://lore.kernel.org/r/20220830154559.61506-1-sebastian.reichel@collabora.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-01dt-bindings: net: rockchip-dwmac: add rk3588 gmac compatibleSebastian Reichel2-0/+8
Add compatible string for RK3588 gmac, which is similar to the RK3568 one, but needs another syscon device for clock selection. Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-01net: ethernet: stmmac: dwmac-rk: Add gmac support for rk3588David Wu1-0/+155
Add constants and callback functions for the dwmac on RK3588 soc. As can be seen, the base structure is the same, only registers and the bits in them moved slightly. Signed-off-by: David Wu <david.wu@rock-chips.com> [rebase, squash fixes] Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-01net/smc: Remove redundant refcount increaseYacan Liu1-1/+0
For passive connections, the refcount increment has been done in smc_clcsock_accept()-->smc_sock_alloc(). Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc") Signed-off-by: Yacan Liu <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Link: https://lore.kernel.org/r/20220830152314.838736-1-liuyacan@corp.netease.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-01octeontx2-pf: Add egress PFC supportSuman Ghosh5-17/+427
As of now all transmit queues transmit packets out of same scheduler queue hierarchy. Due to this PFC frames sent by peer are not handled properly, either all transmit queues are backpressured or none. To fix this when user enables PFC for a given priority map relavant transmit queue to a different scheduler queue hierarcy, so that backpressure is applied only to the traffic egressing out of that TXQ. Signed-off-by: Suman Ghosh <sumang@marvell.com> Link: https://lore.kernel.org/r/20220830120304.158060-1-sumang@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-01net: sched: remove redundant NULL check in change hook functionZhengchao Shao13-39/+1
Currently, the change function can be called by two ways. The one way is that qdisc_change() will call it. Before calling change function, qdisc_change() ensures tca[TCA_OPTIONS] is not empty. The other way is that .init() will call it. The opt parameter is also checked before calling change function in .init(). Therefore, it's no need to check the input parameter opt in change function. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220829071219.208646-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-01Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb"Jakub Kicinski1-3/+1
This reverts commit 90fabae8a2c225c4e4936723c38857887edde5cc. Patch was applied hastily, revert and let the v2 be reviewed. Fixes: 90fabae8a2c2 ("sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb") Link: https://lore.kernel.org/all/87wnao2ha3.fsf@toke.dk/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01Merge branch 'tcp-tcp-challenge-ack-fixes'Jakub Kicinski4-13/+21
Eric Dumazet says: ==================== tcp: tcp challenge ack fixes syzbot found a typical data-race addressed in the first patch. While we are at it, second patch makes the global rate limit per net-ns and disabled by default. ==================== Link: https://lore.kernel.org/r/20220830185656.268523-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01tcp: make global challenge ack rate limitation per net-ns and default disabledEric Dumazet4-13/+21
Because per host rate limiting has been proven problematic (side channel attacks can be based on it), per host rate limiting of challenge acks ideally should be per netns and turned off by default. This is a long due followup of following commits: 083ae308280d ("tcp: enable per-socket rate limiting of all 'challenge acks'") f2b2c582e824 ("tcp: mitigate ACK loops for connections as tcp_sock") 75ff39ccc1bd ("tcp: make challenge acks less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Baron <jbaron@akamai.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01tcp: annotate data-race around challenge_timestampEric Dumazet1-2/+2
challenge_timestamp can be read an written by concurrent threads. This was expected, but we need to annotate the race to avoid potential issues. Following patch moves challenge_timestamp and challenge_count to per-netns storage to provide better isolation. Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01net: dsa: hellcreek: Print warning only onceKurt Kanzenbach1-1/+1
In case the source port cannot be decoded, print the warning only once. This still brings attention to the user and does not spam the logs at the same time. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220830163448.8921-1-kurt@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01net-next: Fix IP_UNICAST_IF option behavior for connected socketsRichard Gobert3-2/+46
The IP_UNICAST_IF socket option is used to set the outgoing interface for outbound packets. The IP_UNICAST_IF socket option was added as it was needed by the Wine project, since no other existing option (SO_BINDTODEVICE socket option, IP_PKTINFO socket option or the bind function) provided the needed characteristics needed by the IP_UNICAST_IF socket option. [1] The IP_UNICAST_IF socket option works well for unconnected sockets, that is, the interface specified by the IP_UNICAST_IF socket option is taken into consideration in the route lookup process when a packet is being sent. However, for connected sockets, the outbound interface is chosen when connecting the socket, and in the route lookup process which is done when a packet is being sent, the interface specified by the IP_UNICAST_IF socket option is being ignored. This inconsistent behavior was reported and discussed in an issue opened on systemd's GitHub project [2]. Also, a bug report was submitted in the kernel's bugzilla [3]. To understand the problem in more detail, we can look at what happens for UDP packets over IPv4 (The same analysis was done separately in the referenced systemd issue). When a UDP packet is sent the udp_sendmsg function gets called and the following happens: 1. The oif member of the struct ipcm_cookie ipc (which stores the output interface of the packet) is initialized by the ipcm_init_sk function to inet->sk.sk_bound_dev_if (the device set by the SO_BINDTODEVICE socket option). 2. If the IP_PKTINFO socket option was set, the oif member gets overridden by the call to the ip_cmsg_send function. 3. If no output interface was selected yet, the interface specified by the IP_UNICAST_IF socket option is used. 4. If the socket is connected and no destination address is specified in the send function, the struct ipcm_cookie ipc is not taken into consideration and the cached route, that was calculated in the connect function is being used. Thus, for a connected socket, the IP_UNICAST_IF sockopt isn't taken into consideration. This patch corrects the behavior of the IP_UNICAST_IF socket option for connect()ed sockets by taking into consideration the IP_UNICAST_IF sockopt when connecting the socket. In order to avoid reconnecting the socket, this option is still ignored when applied on an already connected socket until connect() is called again by the Richard Gobert. Change the __ip4_datagram_connect function, which is called during socket connection, to take into consideration the interface set by the IP_UNICAST_IF socket option, in a similar way to what is done in the udp_sendmsg function. [1] https://lore.kernel.org/netdev/1328685717.4736.4.camel@edumazet-laptop/T/ [2] https://github.com/systemd/systemd/issues/11935#issuecomment-618691018 [3] https://bugzilla.kernel.org/show_bug.cgi?id=210255 Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220829111554.GA1771@debian Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01ip: fix triggering of 'icmp redirect'Nicolas Dichtel1-2/+2
__mkroute_input() uses fib_validate_source() to trigger an icmp redirect. My understanding is that fib_validate_source() is used to know if the src address and the gateway address are on the same link. For that, fib_validate_source() returns 1 (same link) or 0 (not the same network). __mkroute_input() is the only user of these positive values, all other callers only look if the returned value is negative. Since the below patch, fib_validate_source() didn't return anymore 1 when both addresses are on the same network, because the route lookup returns RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right. Let's adapat the test to return 1 again when both addresses are on the same link. CC: stable@vger.kernel.org Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop") Reported-by: kernel test robot <yujie.liu@intel.com> Reported-by: Heng Qi <hengqi@linux.alibaba.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220829100121.3821-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01Merge branch 'net-sched-remove-unused-variables'Jakub Kicinski4-7/+0
Zhengchao Shao says: ==================== net: sched: remove unused variables The variable "other" is unused, remove it. ==================== Link: https://lore.kernel.org/r/20220830092255.281330-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01net: sched: gred/red: remove unused variables in struct red_statsZhengchao Shao3-5/+0
The variable "other" in the struct red_stats is not used. Remove it. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01net: sched: choke: remove unused variables in struct choke_sched_dataZhengchao Shao1-2/+0
The variable "other" in the struct choke_sched_data is not used. Remove it. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01net: axienet: Switch to 64-bit RX/TX statisticsRobert Hancock2-4/+45
The RX and TX byte/packet statistics in this driver could be overflowed relatively quickly on a 32-bit platform. Switch these stats to use the u64_stats infrastructure to avoid this. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Link: https://lore.kernel.org/r/20220829233901.3429419-1-robert.hancock@calian.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01net/rds: Pass a pointer to virt_to_page()Linus Walleij1-1/+1
Functions that work on a pointer to virtual memory such as virt_to_pfn() and users of that function such as virt_to_page() are supposed to pass a pointer to virtual memory, ideally a (void *) or other pointer. However since many architectures implement virt_to_pfn() as a macro, this function becomes polymorphic and accepts both a (unsigned long) and a (void *). If we instead implement a proper virt_to_pfn(void *addr) function the following happens (occurred on arch/arm): net/rds/message.c:357:56: warning: passing argument 1 of 'virt_to_pfn' makes pointer from integer without a cast [-Wint-conversion] Fix this with an explicit cast. Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: rds-devel@oss.oracle.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20220829132001.114858-1-linus.walleij@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuseJann Horn2-15/+21
anon_vma->degree tracks the combined number of child anon_vmas and VMAs that use the anon_vma as their ->anon_vma. anon_vma_clone() then assumes that for any anon_vma attached to src->anon_vma_chain other than src->anon_vma, it is impossible for it to be a leaf node of the VMA tree, meaning that for such VMAs ->degree is elevated by 1 because of a child anon_vma, meaning that if ->degree equals 1 there are no VMAs that use the anon_vma as their ->anon_vma. This assumption is wrong because the ->degree optimization leads to leaf nodes being abandoned on anon_vma_clone() - an existing anon_vma is reused and no new parent-child relationship is created. So it is possible to reuse an anon_vma for one VMA while it is still tied to another VMA. This is an issue because is_mergeable_anon_vma() and its callers assume that if two VMAs have the same ->anon_vma, the list of anon_vmas attached to the VMAs is guaranteed to be the same. When this assumption is violated, vma_merge() can merge pages into a VMA that is not attached to the corresponding anon_vma, leading to dangling page->mapping pointers that will be dereferenced during rmap walks. Fix it by separately tracking the number of child anon_vmas and the number of VMAs using the anon_vma as their ->anon_vma. Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy") Cc: stable@kernel.org Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-09-01r8152: allow userland to disable multicastSven van Ashbrook1-8/+12
The rtl8152 driver does not disable multicasting when userspace asks it to. For example: $ ifconfig eth0 -multicast -allmulti $ tcpdump -p -i eth0 # will still capture multicast frames Fix by clearing the device multicast filter table when multicast and allmulti are both unset. Tested as follows: - Set multicast on eth0 network interface - verify that multicast packets are coming in: $ tcpdump -p -i eth0 - Clear multicast and allmulti on eth0 network interface - verify that no more multicast packets are coming in: $ tcpdump -p -i eth0 Signed-off-by: Sven van Ashbrook <svenva@chromium.org> Acked-by: Hayes Wang <hayeswang@realtek.com> Link: https://lore.kernel.org/r/20220830045923.net-next.v1.1.I4fee0ac057083d4f848caf0fa3a9fd466fc374a0@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skbToke Høiland-Jørgensen1-1/+3
When the GSO splitting feature of sch_cake is enabled, GSO superpackets will be broken up and the resulting segments enqueued in place of the original skb. In this case, CAKE calls consume_skb() on the original skb, but still returns NET_XMIT_SUCCESS. This can confuse parent qdiscs into assuming the original skb still exists, when it really has been freed. Fix this by adding the __NET_XMIT_STOLEN flag to the return value in this case. Fixes: 0c850344d388 ("sch_cake: Conditionally split GSO segments") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231 Link: https://lore.kernel.org/r/20220831092103.442868-1-toke@toke.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01net: ethernet: move from strlcpy with unused retval to strscpyWolfram Sang199-457/+457
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Petr Machata <petrm@nvidia.com> # For drivers/net/ethernet/mellanox/mlxsw Acked-by: Geoff Levand <geoff@infradead.org> # For ps3_gelic_net and spider_net_ethtool Acked-by: Tom Lendacky <thomas.lendacky@amd.com> # For drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c Acked-by: Marcin Wojtas <mw@semihalf.com> # For drivers/net/ethernet/marvell/mvpp2 Reviewed-by: Leon Romanovsky <leonro@nvidia.com> # For drivers/net/ethernet/mellanox/mlx{4|5} Reviewed-by: Shay Agroskin <shayagr@amazon.com> # For drivers/net/ethernet/amazon/ena Acked-by: Krzysztof Hałasa <khalasa@piap.pl> # For IXP4xx Ethernet Link: https://lore.kernel.org/r/20220830201457.7984-3-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-01net: move from strlcpy with unused retval to strscpyWolfram Sang40-75/+75
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220830201457.7984-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31selftests: net: sort .gitignore fileAxel Rasmussen1-25/+25
This is the result of `sort tools/testing/selftests/net/.gitignore`, but preserving the comment at the top. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Link: https://lore.kernel.org/r/20220829184748.1535580-1-axelrasmussen@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31Documentation: networking: correct possessive "its"Randy Dunlap5-5/+5
Change occurrences of "it's" that are possessive to "its" so that they don't read as "it is". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20220829235414.17110-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>