summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-02-09NFSD: Fix ia_size underflowChuck Lever1-0/+4
iattr::ia_size is a loff_t, which is a signed 64-bit type. NFSv3 and NFSv4 both define file size as an unsigned 64-bit type. Thus there is a range of valid file size values an NFS client can send that is already larger than Linux can handle. Currently decode_fattr4() dumps a full u64 value into ia_size. If that value happens to be larger than S64_MAX, then ia_size underflows. I'm about to fix up the NFSv3 behavior as well, so let's catch the underflow in the common code path: nfsd_setattr(). Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09NFSD: Fix the behavior of READ near OFFSET_MAXChuck Lever3-10/+14
Dan Aloni reports: > Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to > the RPC read layers") on the client, a read of 0xfff is aligned up > to server rsize of 0x1000. > > As a result, in a test where the server has a file of size > 0x7fffffffffffffff, and the client tries to read from the offset > 0x7ffffffffffff000, the read causes loff_t overflow in the server > and it returns an NFS code of EINVAL to the client. The client as > a result indefinitely retries the request. The Linux NFS client does not handle NFS?ERR_INVAL, even though all NFS specifications permit servers to return that status code for a READ. Instead of NFS?ERR_INVAL, have out-of-range READ requests succeed and return a short result. Set the EOF flag in the result to prevent the client from retrying the READ request. This behavior appears to be consistent with Solaris NFS servers. Note that NFSv3 and NFSv4 use u64 offset values on the wire. These must be converted to loff_t internally before use -- an implicit type cast is not adequate for this purpose. Otherwise VFS checks against sb->s_maxbytes do not work properly. Reported-by: Dan Aloni <dan.aloni@vastdata.com> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09Merge branch 'vlan-QinQ-leak-fix'David S. Miller3-8/+16
Xin Long says: ==================== vlan: fix a netdev refcnt leak for QinQ This issue can be simply reproduced by: # ip link add dummy0 type dummy # ip link add link dummy0 name dummy0.1 type vlan id 1 # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2 # rmmod 8021q unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1 So as to fix it, adjust vlan_dev_uninit() in Patch 1/1 so that it won't be called twice for the same device, then do the fix in vlan_dev_uninit() in Patch 2/2. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09vlan: move dev_put into vlan_dev_uninitXin Long1-3/+5
Shuang Li reported an QinQ issue by simply doing: # ip link add dummy0 type dummy # ip link add link dummy0 name dummy0.1 type vlan id 1 # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2 # rmmod 8021q unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1 When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links(). When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However, due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and release dummy0.1, as it delays dev_put until calling dev->priv_destructor, vlan_dev_free(). This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before netdev_wait_allrefs(). Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09vlan: introduce vlan_dev_free_egress_priorityXin Long3-5/+11
This patch is to introduce vlan_dev_free_egress_priority() to free egress priority for vlan dev, and keep vlan_dev_uninit() static as .ndo_uninit. It makes the code more clear and safer when adding new code in vlan_dev_uninit() in the future. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09libbpf: Fix compilation warning due to mismatched printf formatAndrii Nakryiko1-1/+2
On ppc64le architecture __s64 is long int and requires %ld. Cast to ssize_t and use %zd to avoid architecture-specific specifiers. Fixes: 4172843ed4a3 ("libbpf: Fix signedness bug in btf_dump_array_data()") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220209063909.1268319-1-andrii@kernel.org
2022-02-09ax25: fix UAF bugs of net_device caused by rebinding operationDuoming Zhou1-1/+4
The ax25_kill_by_device() will set s->ax25_dev = NULL and call ax25_disconnect() to change states of ax25_cb and sock, if we call ax25_bind() before ax25_kill_by_device(). However, if we call ax25_bind() again between the window of ax25_kill_by_device() and ax25_dev_device_down(), the values and states changed by ax25_kill_by_device() will be reassigned. Finally, ax25_dev_device_down() will deallocate net_device. If we dereference net_device in syscall functions such as ax25_release(), ax25_sendmsg(), ax25_getsockopt(), ax25_getname() and ax25_info_show(), a UAF bug will occur. One of the possible race conditions is shown below: (USE) | (FREE) ax25_bind() | | ax25_kill_by_device() ax25_bind() | ax25_connect() | ... | ax25_dev_device_down() | ... | dev_put_track(dev, ...) //FREE ax25_release() | ... ax25_send_control() | alloc_skb() //USE | the corresponding fail log is shown below: =============================================================== BUG: KASAN: use-after-free in ax25_send_control+0x43/0x210 ... Call Trace: ... ax25_send_control+0x43/0x210 ax25_release+0x2db/0x3b0 __sock_release+0x6d/0x120 sock_close+0xf/0x20 __fput+0x11f/0x420 ... Allocated by task 1283: ... __kasan_kmalloc+0x81/0xa0 alloc_netdev_mqs+0x5a/0x680 mkiss_open+0x6c/0x380 tty_ldisc_open+0x55/0x90 ... Freed by task 1969: ... kfree+0xa3/0x2c0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 tty_ldisc_kill+0x3e/0x80 ... In order to fix these UAF bugs caused by rebinding operation, this patch adds dev_hold_track() into ax25_bind() and corresponding dev_put_track() into ax25_kill_by_device(). Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net: usb: smsc95xx: add generic selftest supportOleksij Rempel2-0/+26
Provide generic selftest support. Tested with LAN9500 and LAN9512. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net: ethernet: cavium: use div64_u64() instead of do_div()Wang Qing1-2/+2
do_div() does a 64-by-32 division. When the divisor is u64, do_div() truncates it to 32 bits, this means it can test non-zero and be truncated to zero for division. fix do_div.cocci warning: do_div() does a 64-by-32 division, please consider using div64_u64 instead. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net:enetc: enetc qos using the CBDR dma alloc functionPo Liu1-75/+22
Now we can use the enetc_cbd_alloc_data_mem() to replace complicated DMA data alloc method and CBDR memory basic seting. Signed-off-by: Po Liu <po.liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net:enetc: command BD ring data memory alloc as one function alonePo Liu2-30/+49
Separate the CBDR data memory alloc standalone. It is convenient for other part loading, for example the ENETC QOS part. Reported-and-suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Po Liu <po.liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net:enetc: allocate CBD ring data memory using DMA coherent methodsPo Liu1-64/+64
To replace the dma_map_single() stream DMA mapping with DMA coherent method dma_alloc_coherent() which is more simple. dma_map_single() found by Tim Gardner not proper. Suggested by Claudiu Manoil and Jakub Kicinski to use dma_alloc_coherent(). Discussion at: https://lore.kernel.org/netdev/AM9PR04MB8397F300DECD3C44D2EBD07796BD9@AM9PR04MB8397.eurprd04.prod.outlook.com/t/ Fixes: 888ae5a3952ba ("net: enetc: add tc flower psfp offload driver") cc: Claudiu Manoil <claudiu.manoil@nxp.com> Reported-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Po Liu <po.liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net: dsa: fix panic when DSA master device unbinds on shutdownVladimir Oltean1-19/+6
Rafael reports that on a system with LX2160A and Marvell DSA switches, if a reboot occurs while the DSA master (dpaa2-eth) is up, the following panic can be seen: systemd-shutdown[1]: Rebooting. Unable to handle kernel paging request at virtual address 00a0000800000041 [00a0000800000041] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] PREEMPT SMP CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00042-g8f5585009b24 #32 pc : dsa_slave_netdevice_event+0x130/0x3e4 lr : raw_notifier_call_chain+0x50/0x6c Call trace: dsa_slave_netdevice_event+0x130/0x3e4 raw_notifier_call_chain+0x50/0x6c call_netdevice_notifiers_info+0x54/0xa0 __dev_close_many+0x50/0x130 dev_close_many+0x84/0x120 unregister_netdevice_many+0x130/0x710 unregister_netdevice_queue+0x8c/0xd0 unregister_netdev+0x20/0x30 dpaa2_eth_remove+0x68/0x190 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver_internal+0xac/0xb0 device_links_unbind_consumers+0xd4/0x100 __device_release_driver+0x94/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_device_remove+0x24/0x40 __fsl_mc_device_remove+0xc/0x20 device_for_each_child+0x58/0xa0 dprc_remove+0x90/0xb0 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_bus_remove+0x80/0x100 fsl_mc_bus_shutdown+0xc/0x1c platform_shutdown+0x20/0x30 device_shutdown+0x154/0x330 __do_sys_reboot+0x1cc/0x250 __arm64_sys_reboot+0x20/0x30 invoke_syscall.constprop.0+0x4c/0xe0 do_el0_svc+0x4c/0x150 el0_svc+0x24/0xb0 el0t_64_sync_handler+0xa8/0xb0 el0t_64_sync+0x178/0x17c It can be seen from the stack trace that the problem is that the deregistration of the master causes a dev_close(), which gets notified as NETDEV_GOING_DOWN to dsa_slave_netdevice_event(). But dsa_switch_shutdown() has already run, and this has unregistered the DSA slave interfaces, and yet, the NETDEV_GOING_DOWN handler attempts to call dev_close_many() on those slave interfaces, leading to the problem. The previous attempt to avoid the NETDEV_GOING_DOWN on the master after dsa_switch_shutdown() was called seems improper. Unregistering the slave interfaces is unnecessary and unhelpful. Instead, after the slaves have stopped being uppers of the DSA master, we can now reset to NULL the master->dsa_ptr pointer, which will make DSA start ignoring all future notifier events on the master. Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") Reported-by: Rafael Richter <rafael.richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09Merge branch 'dpaa2-eth-sw-TSO'David S. Miller4-70/+301
Ioana Ciornei says: ==================== dpaa2-eth: add support for software TSO This series adds support for driver level TSO in the dpaa2-eth driver. The first 5 patches lay the ground work for the actual feature: rearrange some variable declaration, cleaning up the interraction with the S/G Table buffer cache etc. The 6th patch adds the actual driver level software TSO support by using the usual tso_build_hdr()/tso_build_data() APIs and creates the S/G FDs. With this patch set we can see the following improvement in a TCP flow running on a single A72@2.2GHz of the LX2160A SoC: before: 6.38Gbit/s after: 8.48Gbit/s ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09soc: fsl: dpio: read the consumer index from the cache inhibited areaIoana Ciornei1-4/+4
Once we added support in the dpaa2-eth for driver level software TSO we observed the following situation: if the EQCR CI (consumer index) is read from the cache-enabled area we sometimes end up with a computed value of available enqueue entries bigger than the size of the ring. This eventually will lead to the multiple enqueue of the same FD which will determine the same FD to end up on the Tx confirmation path and the same skb being freed twice. Just read the consumer index from the cache inhibited area so that we avoid this situation. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dpaa2-eth: add support for software TSOIoana Ciornei3-18/+207
This patch adds support for driver level TSO in the enetc driver using the TSO API. There is not much to say about this specific implementation. We are using the usual tso_build_hdr(), tso_build_data() to create each data segment, we create an array of S/G FDs where the first S/G entry is referencing the header data and the remaining ones the data portion. For the S/G Table buffer we use the same cache of buffers used on the other non-GSO cases - dpaa2_eth_sgt_get() and dpaa2_eth_sgt_recycle(). We cannot keep a DMA coherent buffer for all the TSO headers because the DPAA2 architecture does not work in a ring based fashion so we just allocate a buffer each time. Even with these limitations we get the following improvement in TCP termination on the LX2160A SoC, on a single A72 core running at 2.2GHz. before: 6.38Gbit/s after: 8.48Gbit/s Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dpaa2-eth: work with an array of FDsIoana Ciornei2-16/+47
Up until now, the __dpaa2_eth_tx function used a single FD on the stack to construct the structure to be enqueued. Since we are now preparing the ground work to add support for TSO done in software at the driver level, the same function needs to work with an array of FDs and enqueue as many as the build_*_fd functions create. Make the necessary adjustments in order to do this. These include: keeping an array of FDs in a percpu structure, cleaning up the necessary FDs before populating it and then, retrying the enqueue process up till all the generated FDs were enqueued or until we reach the maximum number retries. This patch does not change the fact that only a single FD will result from a __dpaa2_eth_tx call but rather just creates the necessary changes for the next patch. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dpaa2-eth: use the S/G table cache also for the normal S/G pathIoana Ciornei2-9/+8
Instead of allocating memory for an S/G table each time a nonlinear skb is processed, and then freeing it on the Tx confirmation path, use the S/G table cache in order to reuse the memory. For this to work we have to change the size of the cached buffers so that it can hold the maximum number of scatterlist entries. Other than that, each allocate/free call is replaced by a call to the dpaa2_eth_sgt_get/dpaa2_eth_sgt_recycle functions, introduced in the previous patch. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dpaa2-eth: extract the S/G table buffer cache interaction into functionsIoana Ciornei1-24/+37
The dpaa2-eth driver uses in certain circumstances a buffer cache for the S/G tables needed in case of a S/G FD. At the moment, the interraction with the cache is open-coded and couldn't be reused easily. Add two new functions - dpaa2_eth_sgt_get and dpaa2_eth_sgt_recycle - which help with code reusability. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dpaa2-eth: allocate a fragment already alignedIoana Ciornei1-6/+5
Instead of allocating memory and then manually aligning it to the desired value use napi_alloc_frag_align() directly to streamline the process. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dpaa2-eth: rearrange variable declaration in __dpaa2_eth_txIoana Ciornei1-4/+4
In the next patches we'll be moving things arroung in the mentioned function and also add some new variable declarations. Before all this, cleanup the variable declaration order. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09Merge branch 'octeontx2-af-priority-flow-control'David S. Miller16-145/+836
Hariprasad Kelam says: ==================== Priority flow control support for RVU netdev In network congestion, instead of pausing all traffic on link PFC allows user to selectively pause traffic according to its class. This series of patches add support of PFC for RVU netdev drivers. Patch1 adds support to disable pause frames by default as with PFC user can enable either PFC or 802.3 pause frames. Patch2&3 adds resource management support for flow control and configures necessary registers for PFC. Patch4 adds dcb ops registration for netdev drivers. V2 changes: Fix compilation error by exporting required symbols 'otx2_config_pause_frm' ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09octeontx2-pf: PFC config support with DCBxHariprasad Kelam7-8/+271
Data centric bridging designed to eliminate packet loss due to queue overflow by adding enhancements to ethernet network such as proprity flow control etc. This patch adds support for management of Priority flow control(PFC) on Octeontx2 and CN10K interfaces. To enable PFC for all priorities dcb pfc set dev eth0 prio-pfc all:on/off To enable PFC on selected priorites dcb pfc set dev eth0 prio-pfc 0:on/off 1:on/off ..7:on/off With the ntuple commands user can map Priority to receive queues. On queue overflow NIX will assert backpressure such that PFC pause frames are genarated with mapped priority. To map priority 7 to Queue 1 ethtool -U eth0 flow-type ether dst xx:xx:xx:xx:xx:xx vlan 0xe00a m 0x1fff queue 1 Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09octeontx2-af: Flow control resource managementHariprasad Kelam9-48/+276
CN10K MAC block (RPM) and Octeontx2 MAC block (CGX) both supports PFC flow control and 802.3X flow control pause frames. Each MAC block supports max 4 LMACS and AF driver assigns same (MAC,LMAC) to PF and its VFs. As PF and its share same (MAC,LMAC) pair we need resource management to address below scenarios 1. Maintain PFC and 8023X pause frames mutually exclusive. 2. Reject disable flow control request if other PF or Vfs enabled it. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09octeontx2-af: Priority flow control configuration supportSunil Kumar Kori7-15/+251
Prirority based flow control (802.1Qbb) mechanism is similar to ethernet pause frames (802.3x) instead pausing all traffic on a link, PFC allows user to selectively pause traffic according to its class. Oceteontx2 MAC block (CGX) and CN10K Mac block (RPM) both supports PFC. As upper layer mbox handler is same for both the MACs, this patch configures PFC by calling apporopritate callbacks. Signed-off-by: Sunil Kumar Kori <skori@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09octeontx2-af: Don't enable Pause frames by defaultHariprasad Kelam6-80/+44
Current implementation is such that 802.3x pause frames are enabled by default. As CGX and RPM blocks support PFC (priority flow control) also, instead of driver enabling one between them enable them upon request from PF or its VFs. Also add support to disable pause frames in driver unbind. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09MIPS: DTS: CI20: fix how ddc power is enabledH. Nikolaus Schaller1-13/+2
Originally we proposed a new hdmi-5v-supply regulator reference for CI20 device tree but that was superseded by a better idea to use the already defined "ddc-en-gpios" property of the "hdmi-connector". Since "MIPS: DTS: CI20: Add DT nodes for HDMI setup" has already been applied to v5.17-rc1, we add this on top. Fixes: ae1b8d2c2de9 ("MIPS: DTS: CI20: Add DT nodes for HDMI setup") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Reviewed-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2022-02-09net: amd-xgbe: disable interrupts during pci removalRaju Rangoju1-0/+3
Hardware interrupts are enabled during the pci probe, however, they are not disabled during pci removal. Disable all hardware interrupts during pci removal to avoid any issues. Fixes: e75377404726 ("amd-xgbe: Update PCI support to use new IRQ functions") Suggested-by: Selwin Sebastian <Selwin.Sebastian@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09tipc: rate limit warning for received illegal binding updateJon Maloy1-1/+1
It would be easy to craft a message containing an illegal binding table update operation. This is handled correctly by the code, but the corresponding warning printout is not rate limited as is should be. We fix this now. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net: mdio: aspeed: Add missing MODULE_DEVICE_TABLEJoel Stanley1-0/+1
Fix loading of the driver when built as a module. Fixes: f160e99462c6 ("net: phy: Add mdio-aspeed") Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09veth: fix races around rq->rx_notify_maskedEric Dumazet1-5/+8
veth being NETIF_F_LLTX enabled, we need to be more careful whenever we read/write rq->rx_notify_masked. BUG: KCSAN: data-race in veth_xmit / veth_xmit write to 0xffff888133d9a9f8 of 1 bytes by task 23552 on cpu 0: __veth_xdp_flush drivers/net/veth.c:269 [inline] veth_xmit+0x307/0x470 drivers/net/veth.c:350 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one+0x105/0x2f0 net/core/dev.c:3473 dev_hard_start_xmit net/core/dev.c:3489 [inline] __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149 br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53 NF_HOOK include/linux/netfilter.h:307 [inline] br_forward_finish net/bridge/br_forward.c:66 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115 br_flood+0x521/0x5c0 net/bridge/br_forward.c:242 br_dev_xmit+0x8b6/0x960 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one+0x105/0x2f0 net/core/dev.c:3473 dev_hard_start_xmit net/core/dev.c:3489 [inline] __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149 neigh_hh_output include/net/neighbour.h:525 [inline] neigh_output include/net/neighbour.h:539 [inline] ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228 ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:451 [inline] ip_local_out net/ipv4/ip_output.c:126 [inline] ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570 udp_send_skb+0x641/0x880 net/ipv4/udp.c:967 udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553 __do_sys_sendmmsg net/socket.c:2582 [inline] __se_sys_sendmmsg net/socket.c:2579 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888133d9a9f8 of 1 bytes by task 23563 on cpu 1: __veth_xdp_flush drivers/net/veth.c:268 [inline] veth_xmit+0x2d6/0x470 drivers/net/veth.c:350 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one+0x105/0x2f0 net/core/dev.c:3473 dev_hard_start_xmit net/core/dev.c:3489 [inline] __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149 br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53 NF_HOOK include/linux/netfilter.h:307 [inline] br_forward_finish net/bridge/br_forward.c:66 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115 br_flood+0x521/0x5c0 net/bridge/br_forward.c:242 br_dev_xmit+0x8b6/0x960 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one+0x105/0x2f0 net/core/dev.c:3473 dev_hard_start_xmit net/core/dev.c:3489 [inline] __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149 neigh_hh_output include/net/neighbour.h:525 [inline] neigh_output include/net/neighbour.h:539 [inline] ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228 ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:451 [inline] ip_local_out net/ipv4/ip_output.c:126 [inline] ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570 udp_send_skb+0x641/0x880 net/ipv4/udp.c:967 udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553 __do_sys_sendmmsg net/socket.c:2582 [inline] __se_sys_sendmmsg net/socket.c:2579 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00 -> 0x01 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 23563 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00064-gc36c04c2e132 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 948d4f214fde ("veth: Add driver XDP") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09Merge tag 'linux-can-fixes-for-5.17-20220209' of ↵David S. Miller1-7/+22
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2022-02-09 this is a pull request of 2 patches for net/master. Oliver Hartkopp contributes 2 fixes for the CAN ISOTP protocol. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09Merge branch 'MCTP-tag-control-interface'David S. Miller7-68/+489
Jeremy Kerr says: ==================== MCTP tag control interface This series implements a small interface for userspace-controlled message tag allocation for the MCTP protocol. Rather than leaving the kernel to allocate per-message tag values, userspace can explicitly allocate (and release) message tags through two new ioctls: SIOCMCTPALLOCTAG and SIOCMCTPDROPTAG. In order to do this, we first introduce some minor changes to the tag handling, including a couple of new tests for the route input paths. As always, any comments/queries/etc are most welcome. v2: - make mctp_lookup_prealloc_tag static - minor checkpatch formatting fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag controlMatt Johnston6-56/+329
This change adds a couple of new ioctls for mctp sockets: SIOCMCTPALLOCTAG and SIOCMCTPDROPTAG. These ioctls provide facilities for explicit allocation / release of tags, overriding the automatic allocate-on-send/release-on-reply and timeout behaviours. This allows userspace more control over messages that may not fit a simple request/response model. In order to indicate a pre-allocated tag to the sendmsg() syscall, we introduce a new flag to the struct sockaddr_mctp.smctp_tag value: MCTP_TAG_PREALLOC. Additional changes from Jeremy Kerr <jk@codeconstruct.com.au>. Contains a fix that was: Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09mctp: Allow keys matching any local addressJeremy Kerr2-2/+10
Currently, we require an exact match on an incoming packet's dest address, and the key's local_addr field. In a future change, we may want to set up a key before packets are routed, meaning we have no local address to match on. This change allows key lookups to match on local_addr = MCTP_ADDR_ANY. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09mctp: Add helper for address match checkingJeremy Kerr2-5/+8
Currently, we have a couple of paths that check that an EID matches, or the match value is MCTP_ADDR_ANY. Rather than open coding this, add a little helper. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09mctp: tests: Add key state testsJeremy Kerr1-0/+137
This change adds a few more tests to check the key/tag lookups on route input. We add a specific entry to the keys lists, route a packet with specific header values, and check for key match/mismatch. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09mctp: tests: Rename FL_T macro to FL_TOJeremy Kerr1-6/+6
This is a definition for the tag-owner flag, which has TO as a standard abbreviation. We'll want to add a helper for the actual tag value in a future change. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nextDavid S. Miller5-8/+42
-queue Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2022-02-08 Joe Damato says: This patch set makes several updates to the i40e driver stats collection and reporting code to help users of i40e get a better sense of how the driver is performing and interacting with the rest of the kernel. These patches include some new stats (like waived and busy) which were inspired by other drivers that track stats using the same nomenclature. The new stats and an existing stat, rx_reuse, are now accessible with ethtool to make harvesting this data more convenient for users. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09ip6_tunnel: fix possible NULL deref in ip6_tnl_xmitEric Dumazet1-1/+4
Make sure to test that skb has a dst attached to it. general protection fault, probably for non-canonical address 0xdffffc0000000011: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f] CPU: 0 PID: 32650 Comm: syz-executor.4 Not tainted 5.17.0-rc2-next-20220204-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:ip6_tnl_xmit+0x2140/0x35f0 net/ipv6/ip6_tunnel.c:1127 Code: 4d 85 f6 0f 85 c5 04 00 00 e8 9c b0 66 f9 48 83 e3 fe 48 b8 00 00 00 00 00 fc ff df 48 8d bb 88 00 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 07 7f 05 e8 11 25 b2 f9 44 0f b6 b3 88 00 00 RSP: 0018:ffffc900141b7310 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000c77a000 RDX: 0000000000000011 RSI: ffffffff8811f854 RDI: 0000000000000088 RBP: ffffc900141b7480 R08: 0000000000000000 R09: 0000000000000008 R10: ffffffff8811f846 R11: 0000000000000008 R12: ffffc900141b7548 R13: ffff8880297c6000 R14: 0000000000000000 R15: ffff8880351c8dc0 FS: 00007f9827ba2700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b31322000 CR3: 0000000033a70000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ipxip6_tnl_xmit net/ipv6/ip6_tunnel.c:1386 [inline] ip6_tnl_start_xmit+0x71e/0x1830 net/ipv6/ip6_tunnel.c:1435 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one net/core/dev.c:3473 [inline] dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489 __dev_queue_xmit+0x2a24/0x3760 net/core/dev.c:4116 packet_snd net/packet/af_packet.c:3057 [inline] packet_sendmsg+0x2265/0x5460 net/packet/af_packet.c:3084 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 sock_write_iter+0x289/0x3c0 net/socket.c:1061 call_write_iter include/linux/fs.h:2075 [inline] do_iter_readv_writev+0x47a/0x750 fs/read_write.c:726 do_iter_write+0x188/0x710 fs/read_write.c:852 vfs_writev+0x1aa/0x630 fs/read_write.c:925 do_writev+0x27f/0x300 fs/read_write.c:968 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f9828c2d059 Fixes: c1f55c5e0482 ("ip6_tunnel: allow routing IPv4 traffic in NBMA mode") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Qing Deng <i@moy.cat> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09ax25: fix NPD bug in ax25_disconnectDuoming Zhou1-1/+1
The ax25_disconnect() in ax25_kill_by_device() is not protected by any locks, thus there is a race condition between ax25_disconnect() and ax25_destroy_socket(). when ax25->sk is assigned as NULL by ax25_destroy_socket(), a NULL pointer dereference bug will occur if site (1) or (2) dereferences ax25->sk. ax25_kill_by_device() | ax25_release() ax25_disconnect() | ax25_destroy_socket() ... | if(ax25->sk != NULL) | ... ... | ax25->sk = NULL; bh_lock_sock(ax25->sk); //(1) | ... ... | bh_unlock_sock(ax25->sk); //(2)| This patch moves ax25_disconnect() into lock_sock(), which can synchronize with ax25_destroy_socket() in ax25_release(). Fail log: =============================================================== BUG: kernel NULL pointer dereference, address: 0000000000000088 ... RIP: 0010:_raw_spin_lock+0x7e/0xd0 ... Call Trace: ax25_disconnect+0xf6/0x220 ax25_device_event+0x187/0x250 raw_notifier_call_chain+0x5e/0x70 dev_close_many+0x17d/0x230 rollback_registered_many+0x1f1/0x950 unregister_netdevice_queue+0x133/0x200 unregister_netdev+0x13/0x20 ... Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09Netvsc: Call hv_unmap_memory() in the netvsc_device_remove()Tianyu Lan1-8/+16
netvsc_device_remove() calls vunmap() inside which should not be called in the interrupt context. Current code calls hv_unmap_memory() in the free_netvsc_device() which is rcu callback and maybe called in the interrupt context. This will trigger BUG_ON(in_interrupt()) in the vunmap(). Fix it via moving hv_unmap_memory() to netvsc_device_ remove(). Fixes: 846da38de0e8 ("net: netvsc: Add Isolation VM support for netvsc driver") Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09Merge branch '1GbE' of ↵David S. Miller3-10/+16
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2022-02-07 Corinna Vinschen says: Fix the kernel warning "Missing unregister, handled but fix driver" when running, e.g., $ ethtool -G eth0 rx 1024 on igc. Remove memset hack from igb and align igb code to igc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09Merge branch 'net-fix-skb-unclone-issues'David S. Miller1-1/+13
Antoine Tenart says: ==================== net: fix issues when uncloning an skb dst+metadata This fixes two issues when uncloning an skb dst+metadata in tun_dst_unclone; this was initially reported by Vlad Buslov[1]. Because of the memory leak fixed by patch 2, the issue in patch 1 never happened in practice. tun_dst_unclone is called from two different places, one in geneve/vxlan to handle PMTU and one in net/openvswitch/actions.c where it is used to retrieve tunnel information. While both Vlad and I tested the former, we could not for the latter. I did spend quite some time trying to, but that code path is not easy to trigger. Code inspection shows this should be fine, the tunnel information (dst+metadata) is uncloned and the skb it is referenced from is only consumed after all accesses to the tunnel information are done: do_execute_actions output_userspace dev_fill_metadata_dst <- dst+metadata is uncloned ovs_dp_upcall queue_userspace_packet ovs_nla_put_tunnel_info <- metadata (tunnel info) is accessed consume_skb <- dst+metadata is freed Thanks! Antoine [1] https://lore.kernel.org/all/ygnhh79yluw2.fsf@nvidia.com/T/#m2f814614a4f5424cea66bbff7297f692b59b69a0 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net: fix a memleak when uncloning an skb dst and its metadataAntoine Tenart1-1/+0
When uncloning an skb dst and its associated metadata, a new dst+metadata is allocated and later replaces the old one in the skb. This is helpful to have a non-shared dst+metadata attached to a specific skb. The issue is the uncloned dst+metadata is initialized with a refcount of 1, which is increased to 2 before attaching it to the skb. When tun_dst_unclone returns, the dst+metadata is only referenced from a single place (the skb) while its refcount is 2. Its refcount will never drop to 0 (when the skb is consumed), leading to a memory leak. Fix this by removing the call to dst_hold in tun_dst_unclone, as the dst+metadata refcount is already 1. Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.") Cc: Pravin B Shelar <pshelar@ovn.org> Reported-by: Vlad Buslov <vladbu@nvidia.com> Tested-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net: do not keep the dst cache when uncloning an skb dst and its metadataAntoine Tenart1-0/+13
When uncloning an skb dst and its associated metadata a new dst+metadata is allocated and the tunnel information from the old metadata is copied over there. The issue is the tunnel metadata has references to cached dst, which are copied along the way. When a dst+metadata refcount drops to 0 the metadata is freed including the cached dst entries. As they are also referenced in the initial dst+metadata, this ends up in UaFs. In practice the above did not happen because of another issue, the dst+metadata was never freed because its refcount never dropped to 0 (this will be fixed in a subsequent patch). Fix this by initializing the dst cache after copying the tunnel information from the old metadata to also unshare the dst cache. Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel") Cc: Paolo Abeni <pabeni@redhat.com> Reported-by: Vlad Buslov <vladbu@nvidia.com> Tested-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dt-bindings: net: renesas,etheravb: Document RZ/G2UL SoCBiju Das1-1/+2
Document Gigabit Ethernet IP found on RZ/G2UL SoC. Gigabit Ethernet Interface is identical to one found on the RZ/G2L SoC. No driver changes are required as generic compatible string "renesas,rzg2l-gbeth" will be used as a fallback. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09dt-bindings: net: renesas,etheravb: Document RZ/V2L SoCBiju Das1-1/+2
Document Gigabit Ethernet IP found on RZ/V2L SoC. Gigabit Ethernet Interface is identical to one found on the RZ/G2L SoC. No driver changes are required as generic compatible string "renesas,rzg2l-gbeth" will be used as a fallback. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09netfilter: ctnetlink: use dump structure instead of raw argsFlorian Westphal1-12/+24
netlink_dump structure has a union of 'long args[6]' and a context buffer as scratch space. Convert ctnetlink to use a structure, its easier to read than the raw 'args' usage which comes with no type checks and no readable names. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-09nfqueue: enable to set skb->priorityNicolas Dichtel1-0/+8
This is a follow up of the previous patch that enables to get skb->priority. It's now posssible to set it also. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Florian Westphal <fw@strlen.de> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>