summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-10-28ice: low level support for tunnelsMichal Swiatkowski5-16/+298
Add definition of UDP tunnel dummy packets. Fill destination port value in filter based on UDP tunnel port. Append tunnel flags to switch filter definition in case of matching the tunnel. Both VXLAN and Geneve are UDP tunnels, so only one new header is needed. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-28ice: VXLAN and Geneve TC supportMichal Swiatkowski5-44/+362
Add definition for VXLAN and Geneve dummy packet. Define VXLAN and Geneve type of fields to match on correct UDP tunnel header. Parse tunnel specific fields from TC tool like outer MACs, outer IPs, outer destination port and VNI. Save values and masks in outer header struct and move header pointer to inner to simplify parsing inner values. There are two cases for redirect action: - from uplink to VF - TC filter is added on tunnel device - from VF to uplink - TC filter is added on PR, for this case check if redirect device is tunnel device VXLAN example: - create tunnel device ip l add $VXLAN_DEV type vxlan id $VXLAN_VNI dstport $VXLAN_PORT \ dev $PF - add TC filter (in switchdev mode) tc filter add dev $VXLAN_DEV protocol ip parent ffff: flower \ enc_dst_ip $VF1_IP enc_key_id $VXLAN_VNI action mirred egress \ redirect dev $VF1_PR Geneve example: - create tunnel device ip l add $GENEVE_DEV type geneve id $GENEVE_VNI dstport $GENEVE_PORT \ remote $GENEVE_IP - add TC filter (in switchdev mode) tc filter add dev $GENEVE_DEV protocol ip parent ffff: flower \ enc_key_id $GENEVE_VNI dst_ip $GENEVE1_IP action mirred egress \ redirect dev $VF1_PR Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-28ice: support for indirect notificationMichal Swiatkowski3-2/+200
Implement indirect notification mechanism to support offloading TC rules on tunnel devices. Keep indirect block list in netdev priv. Notification will call setting tc cls flower function. For now we can offload only ingress type. Return not supported for other flow block binder. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-28net: virtio: use eth_hw_addr_set()Jakub Kicinski1-3/+7
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it go through appropriate helpers. Even though the current code uses dev->addr_len the we can switch to eth_hw_addr_set() instead of dev_addr_set(). The netdev is always allocated by alloc_etherdev_mq() and there are at least two places which assume Ethernet address: - the line below calling eth_hw_addr_random() - virtnet_set_mac_address() -> eth_commit_mac_addr_change() Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211027152012.3393077-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28devlink: Simplify internal devlink params implementationLeon Romanovsky1-123/+46
Reduce extra indirection from devlink_params_*() API. Such change makes it clear that we can drop devlink->lock from these flows, because everything is executed when the devlink is not registered yet. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28Merge branch 'octeontx2-debugfs-updates'David S. Miller3-11/+114
Rakesh Babu Saladi says: ==================== RVU Debugfs updates. Patch 1: Few minor changes such as spelling mistakes, deleting unwanted characters, etc. Patch 2: Add debugfs dump for lmtst map table Patch 3: Add channel and channel mask in debugfs. Changes made from v2 to v3: 1. In patch 1 moved few lines and submitted those changes as a different patch to net branch 2. Patch 2 is left unchanged. 3. Patch 3 is left unchanged. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28octeontx2-af: debugfs: Add channel and channel mask.Rakesh Babu3-0/+9
This patch is to dispaly channel and channel_mask for each RX interface of NPC MCAM rule. Signed-off-by: Rakesh Babu <rsaladi2@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28octeontx2-af: cn10k: debugfs for dumping LMTST map tableHarman Kalra1-0/+94
CN10k SoCs use atomic stores of up to 128 bytes to submit packets/instructions into co-processor cores. The enqueueing is performed using Large Memory Transaction Store (LMTST) operations. They allow for lockless enqueue operations - i.e., two different CPU cores can submit instructions to the same queue without needing to lock the queue or synchronize their accesses. This patch implements a new debugfs entry for dumping LMTST map table present on CN10K, as this might be very useful to debug any issue in case of shared LMTST region among multiple pci functions. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: Bhaskara Budiredla <bbudiredla@marvell.com> Signed-off-by: Rakesh Babu <rsaladi2@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28octeontx2-af: debugfs: Minor changes.Rakesh Babu Saladi1-11/+11
Few changes in rvu_debugfs.c file to remove unwanted characters, indenting the code, added a new comment line etc. Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28net: phy: microchip_t1: add cable test support for lan87xx phyYuiko Oshino1-0/+239
Add a basic cable test (diagnostic) support for lan87xx phy. Tested with LAN8770 for connected/open/short wires using ethtool. Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28ptp: fix code indentation issuesCarlos Llamas1-3/+3
This fixes the following checkpatch.pl errors: ERROR: code indent should use tabs where possible +^I if (ptp->pps_source)$ ERROR: code indent should use tabs where possible +^I pps_unregister_source(ptp->pps_source);$ ERROR: code indent should use tabs where possible +^I kthread_destroy_worker(ptp->kworker);$ Fixes: 4225fea1cb28 ("ptp: Fix possible memory leak in ptp_clock_register()") Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28net: cleanup __sk_stream_memory_free()Eric Dumazet1-8/+2
We now have INDIRECT_CALL_INET_1() macro, no need to use #ifdef CONFIG_INET Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28sky2: Remove redundant assignment and parenthesesluo penghao1-1/+1
The variable err will be reassigned on subsequent branches, and this assignment does not perform related value operations. This will cause the double parentheses to be redundant, so the inner parentheses should be deleted. clang_analyzer complains as follows: drivers/net/ethernet/marvell/sky2.c:4988: warning: Although the value stored to 'err' is used in the enclosing expression, the value is never actually read from 'err'. Changes in v2: modify title category:octeontx2-af to sky2. delete the inner parentheses. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28net: ipconfig: Release the rtnl_lock while waiting for carrierMaxime Chevallier1-2/+10
While waiting for a carrier to come on one of the netdevices, some devices will require to take the rtnl lock at some point to fully initialize all parts of the link. That's the case for SFP, where the rtnl is taken when a module gets detected. This prevents mounting an NFS rootfs over an SFP link. This means that while ipconfig waits for carriers to be detected, no SFP modules can be detected in the meantime, it's only detected after ipconfig times out. This commit releases the rtnl_lock while waiting for the carrier to come up, and re-takes it to check the for the init device and carrier status. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28devlink: add documentation for octeontx2 driverSubbaraya Sundeep2-0/+43
Add a file to document devlink support for octeontx2 driver. Driver-specific parameters implemented by AF, PF and VF drivers are documented. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28sch_htb: Add extack messages for EOPNOTSUPP errorsMaxim Mikityanskiy1-2/+6
In order to make the "Operation not supported" message clearer to the user, add extack messages explaining why exactly adding offloaded HTB could be not supported in each case. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28Merge tag 'mlx5-net-next-5.15-rc7' of ↵David S. Miller34-248/+292
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Merge mlx5-next into net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28Merge branch 'mvpp2-phylink'David S. Miller1-45/+52
Russell King says: ==================== Convert mvpp2 to phylink supported_interfaces This patch series converts mvpp2 to use phylinks supported_interfaces bitmap to simplify the validate() implementation. The patches: 1) Add the supported interface modes the supported_interfaces bitmap. 2) Removes the checks for the interface type being supported from the validate callback 3) Removes the now unnecessary checks and call to phylink_helper_basex_speed() to support switching between 1000base-X and 2500base-X for SFPs 4) Cleans up the resulting validate() code. (3) becomes possible because when asking the MAC for its complete support, we walk all supported interfaces which will include 1000base-X and 2500base-X only if the comphy is present. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28net: mvpp2: clean up mvpp2_phylink_validate()Russell King (Oracle)1-10/+7
mvpp2_phylink_validate() no longer needs to check for PHY_INTERFACE_MODE_NA as phylink will walk the supported interface types to discover the link mode capabilities. Remove these checks. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28net: mvpp2: drop use of phylink_helper_basex_speed()Russell King (Oracle)1-12/+7
Now that we have a better method to select SFP interface modes, we no longer need to use phylink_helper_basex_speed() in a driver's validation function, and we can also get rid of our hack to indicate both 1000base-X and 2500base-X if the comphy is present to make that work. Remove this hack and use of phylink_helper_basex_speed(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28net: mvpp2: remove interface checks in mvpp2_phylink_validate()Russell King (Oracle)1-26/+7
As phylink checks the interface mode against the supported_interfaces bitmap, we no longer need to validate the interface mode in the validation function. Remove this to simplify it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28net: mvpp2: populate supported_interfaces memberRussell King1-0/+34
Populate the phy interface mode bitmap for the Marvell mvpp2 driver with interfaces modes supported by the MAC. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28ipv6: enable net.ipv6.route.max_size sysctl in network namespaceAlexander Kuznetsov1-12/+12
We want to increase route cache size in network namespace created with user namespace. Currently ipv6 route settings are disabled for non-initial network namespaces. We can allow this sysctl and it will be safe since commit <6126891c6d4f> because route cache account to kmem, that is why users from user namespace can not DOS system. Signed-off-by: Alexander Kuznetsov <wwfq@yandex-team.ru> Acked-by: Dmitry Yakunin <zeil@yandex-team.ru> Acked-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28mpt fusion: use dev_addr_set()Jakub Kicinski1-1/+1
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it go through appropriate helpers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28firewire: don't write directly to netdev->dev_addrJakub Kicinski1-7/+7
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it go through appropriate helpers. Prepare fwnet_hwaddr on the stack and use dev_addr_set() to copy it to netdev->dev_addr. We no longer need to worry about alignment. union fwnet_hwaddr does not have any padding and we set all fields so we don't need to zero it upfront. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28media: use eth_hw_addr_set()Jakub Kicinski1-4/+4
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it go through appropriate helpers. Convert media from memcpy(... 6) and memcpy(... addr_len) to eth_hw_addr_set(): @@ expression dev, np; @@ - memcpy(dev->dev_addr, np, 6) + eth_hw_addr_set(dev, np) @@ - memcpy(dev->dev_addr, np, dev->addr_len) + eth_hw_addr_set(dev, np) Make sure we don't cast off const qualifier from dev->dev_addr. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28Merge branch 'tcp-tx-side-cleanups'David S. Miller5-47/+14
Eric Dumazet says: ==================== tcp: tx side cleanups We no longer need to set skb->reserved_tailroom because TCP sendmsg() do not put payload in skb->head anymore. Also do some cleanups around skb->ip_summed/csum, and CP_SKB_CB(skb)->sacked for fresh skbs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28tcp: do not clear TCP_SKB_CB(skb)->sacked if already zeroEric Dumazet2-6/+0
Freshly allocated skbs have zero in skb->cb[] already. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28tcp: do not clear skb->csum if already zeroEric Dumazet3-3/+0
Freshly allocated skbs have their csum field cleared already. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28tcp: factorize ip_summed settingEric Dumazet3-9/+2
Setting skb->ip_summed to CHECKSUM_PARTIAL can be centralized in tcp_stream_alloc_skb() and __mptcp_do_alloc_tx_skb() instead of being done multiple times. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28tcp: no longer set skb->reserved_tailroomEric Dumazet2-6/+0
TCP/MPTCP sendmsg() no longer puts payload in skb->head, we can remove not needed code. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28tcp: remove dead code from tcp_collapse_retrans()Eric Dumazet1-7/+3
TCP sendmsg() no longer puts payload in skb->head, remove some dead code from tcp_collapse_retrans(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28tcp: cleanup tcp_remove_empty_skb() useEric Dumazet3-7/+8
All tcp_remove_empty_skb() callers now use tcp_write_queue_tail() for the skb argument, we can therefore factorize code. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28tcp: remove dead code from tcp_sendmsg_locked()Eric Dumazet1-9/+1
TCP sendmsg() no longer puts payload in skb head, we can remove dead code. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28Merge branch 'mlx5-next' of ↵Saeed Mahameed34-248/+292
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into net-next Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-28net: phy: Fix unsigned comparison with less than zeroJiapeng Chong1-1/+1
Fix the following coccicheck warning: ./drivers/net/phy/at803x.c:493:5-10: WARNING: Unsigned expression compared with zero: value < 0. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Fixes: 7beecaf7d507 ("net: phy: at803x: improve the WOL feature") Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/1635325191-101815-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28Merge branch 'mptcp-rework-fwd-memory-allocation-and-one-cleanup'Jakub Kicinski6-155/+120
Mat Martineau says: ==================== mptcp: Rework fwd memory allocation and one cleanup These patches from the MPTCP tree rework forward memory allocation for MPTCP (with some supporting changes in the net core), and also clean up an unused function parameter. Patch 1 updates TCP code but does not change any behavior, and creates some macros for reclaim thresholds that will be reused in the MPTCP code. Patch 2 adds sk_forward_alloc_get() to the networking core to support MPTCP's forward allocation with the diag interface. Patch 3 reworks forward memory for MPTCP. Patch 4 removes an unused arg and has no functional changes. ==================== Link: https://lore.kernel.org/r/20211026232916.179450-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28mptcp: drop unused sk in mptcp_push_releaseGeliang Tang1-5/+4
Since mptcp_set_timeout() had removed from mptcp_push_release() in commit 33d41c9cd74c5 ("mptcp: more accurate timeout"), the argument sk in mptcp_push_release() became useless. Let's drop it. Fixes: 33d41c9cd74c5 ("mptcp: more accurate timeout") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28mptcp: allocate fwd memory separately on the rx and tx pathPaolo Abeni2-145/+95
All the mptcp receive path is protected by the msk socket spinlock. As consequences, the tx path has to play a few tricks to allocate the forward memory without acquiring the spinlock multiple times, making the overall TX path quite complex. This patch tries to clean-up a bit the tx path, using completely separated fwd memory allocation, for the rx and the tx path. The forward memory allocated in the rx path is now accounted in msk->rmem_fwd_alloc and is (still) protected by the msk socket spinlock. To cope with the above we provide a few MPTCP-specific variants for the helpers to charge, uncharge, reclaim and free the forward memory in the receive path. msk->sk_forward_alloc now accounts only the forward memory for the tx path, we can use the plain core sock helper to manipulate it and drop quite a bit of complexity. On memory pressure, both rx and tx fwd memories are reclaimed. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28net: introduce sk_forward_alloc_get()Paolo Abeni4-3/+14
A later patch will change the MPTCP memory accounting schema in such a way that MPTCP sockets will encode the total amount of forward allocated memory in two separate fields (one for tx and one for rx). MPTCP sockets will use their own helper to provide the accurate amount of fwd allocated memory. To allow the above, this patch adds a new, optional, sk method to fetch the fwd memory, wrap the call in a new helper and use it where it is appropriate. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28tcp: define macros for a couple reclaim thresholdsPaolo Abeni1-2/+7
A following patch is going to implement a similar reclaim schema for the MPTCP protocol, with different locking. Let's define a couple of macros for the used thresholds, so that the latter code will be more easily maintainable. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28inet: remove races in inet{6}_getname()Eric Dumazet3-17/+21
syzbot reported data-races in inet_getname() multiple times, it is time we fix this instead of pretending applications should not trigger them. getsockname() and getpeername() are not really considered fast path. v2: added the missing BPF_CGROUP_RUN_SA_PROG() declaration needed when CONFIG_CGROUP_BPF=n, as reported by kernel test robot <lkp@intel.com> syzbot typical report: BUG: KCSAN: data-race in __inet_hash_connect / inet_getname write to 0xffff888136d66cf8 of 2 bytes by task 14374 on cpu 1: __inet_hash_connect+0x7ec/0x950 net/ipv4/inet_hashtables.c:831 inet_hash_connect+0x85/0x90 net/ipv4/inet_hashtables.c:853 tcp_v4_connect+0x782/0xbb0 net/ipv4/tcp_ipv4.c:275 __inet_stream_connect+0x156/0x6e0 net/ipv4/af_inet.c:664 inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:728 __sys_connect_file net/socket.c:1896 [inline] __sys_connect+0x254/0x290 net/socket.c:1913 __do_sys_connect net/socket.c:1923 [inline] __se_sys_connect net/socket.c:1920 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:1920 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xa0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888136d66cf8 of 2 bytes by task 14408 on cpu 0: inet_getname+0x11f/0x170 net/ipv4/af_inet.c:790 __sys_getsockname+0x11d/0x1b0 net/socket.c:1946 __do_sys_getsockname net/socket.c:1961 [inline] __se_sys_getsockname net/socket.c:1958 [inline] __x64_sys_getsockname+0x3e/0x50 net/socket.c:1958 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xa0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x0000 -> 0xdee0 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 14408 Comm: syz-executor.3 Not tainted 5.15.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20211026213014.3026708-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28xdp: Remove redundant warningYajun Deng1-2/+0
There is a warning in xdp_rxq_info_unreg_mem_model() when reg_state isn't equal to REG_STATE_REGISTERED, so the warning in xdp_rxq_info_unreg() is redundant. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20211027013856.1866-1-yajun.deng@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28net: thunderbolt: use eth_hw_addr_set()Jakub Kicinski1-3/+5
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it go through appropriate helpers. Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://lore.kernel.org/r/20211026175547.3198242-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28staging: use of_get_ethdev_address()Jakub Kicinski1-1/+1
Use the new of_get_ethdev_address() helper for the cases where dev->dev_addr is passed in directly as the destination. @@ expression dev, np; @@ - of_get_mac_address(np, dev->dev_addr) + of_get_ethdev_address(np, dev) Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20211026175038.3197397-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28net: macb: Fix mdio child node detectionGuenter Roeck1-1/+1
Commit 4d98bb0d7ec2 ("net: macb: Use mdio child node for MDIO bus if it exists") added code to detect if a 'mdio' child node exists to the macb driver. Ths added code does, however, not actually check if the child node exists, but if the parent node exists. This results in errors such as macb 10090000.ethernet eth0: Could not attach PHY (-19) if there is no 'mdio' child node. Fix the code to actually check for the child node. Fixes: 4d98bb0d7ec2 ("net: macb: Use mdio child node for MDIO bus if it exists") Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Sean Anderson <sean.anderson@seco.com> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20211026173950.353636-1-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28net: sch: simplify condtion for selecting mini_Qdisc_pair bufferSeth Forshee1-1/+1
The only valid values for a miniq pointer are NULL or a pointer to miniq1 or miniq2, so testing for miniq_old != &miniq1 is functionally equivalent to testing that it is NULL or equal to &miniq2. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Seth Forshee <sforshee@digitalocean.com> Link: https://lore.kernel.org/r/20211026183721.137930-1-seth@forshee.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap()Seth Forshee2-20/+20
Currently rcu_barrier() is used to ensure that no readers of the inactive mini_Qdisc buffer remain before it is reused. This waits for any pending RCU callbacks to complete, when all that is actually required is to wait for one RCU grace period to elapse after the buffer was made inactive. This means that using rcu_barrier() may result in unnecessary waits. To improve this, store the current RCU state when a buffer is made inactive and use poll_state_synchronize_rcu() to check whether a full grace period has elapsed before reusing it. If a full grace period has not elapsed, wait for a grace period to elapse, and in the non-RT case use synchronize_rcu_expedited() to hasten it. Since this approach eliminates the RCU callback it is no longer necessary to synchronize_rcu() in the tp_head==NULL case. However, the RCU state should still be saved for the previously active buffer. Before this change I would typically see mini_qdisc_pair_swap() take tens of milliseconds to complete. After this change it typcially finishes in less than 1 ms, and often it takes just a few microseconds. Thanks to Paul for walking me through the options for improving this. Cc: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Seth Forshee <sforshee@digitalocean.com> Link: https://lore.kernel.org/r/20211026130700.121189-1-seth@forshee.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27net: sched: gred: dynamically allocate tc_gred_qopt_offloadArnd Bergmann1-20/+30
The tc_gred_qopt_offload structure has grown too big to be on the stack for 32-bit architectures after recent changes. net/sched/sch_gred.c:903:13: error: stack frame size (1180) exceeds limit (1024) in 'gred_destroy' [-Werror,-Wframe-larger-than] net/sched/sch_gred.c:310:13: error: stack frame size (1212) exceeds limit (1024) in 'gred_offload' [-Werror,-Wframe-larger-than] Use dynamic allocation per qdisc to avoid this. Fixes: 50dc9a8572aa ("net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types") Fixes: 67c9e6270f30 ("net: sched: Protect Qdisc::bstats with u64_stats") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20211026100711.nalhttf6mbe6sudx@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27Merge branch 'two-reverts-to-calm-down-devlink-discussion'Jakub Kicinski1-10/+14
Leon Romanovsky says: ==================== Two reverts to calm down devlink discussion Two reverts as was discussed in [1], fast, easy and wrong in long run solution to syzkaller bug [2]. [1] https://lore.kernel.org/all/20211026120234.3408fbcc@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com [2] https://lore.kernel.org/netdev/000000000000af277405cf0a7ef0@google.com/ ==================== Link: https://lore.kernel.org/r/cover.1635276828.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>