kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2022-05-10	ath11k: remove redundant assignment to variables vht_mcs and he_mcs	Colin Ian King	1	-2/+2
	The variables vht_mcs and he_mcs are being initialized in the start of for-loops however they are re-assigned new values in the loop and not used outside the loop. The initializations are redundant and can be removed. Cleans up clang scan warnings: warning: Although the value stored to 'vht_mcs' is used in the enclosing expression, the value is never actually read from 'vht_mcs' [deadcode.DeadStores] warning: Although the value stored to 'he_mcs' is used in the enclosing expression, the value is never actually read from 'he_mcs' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220507184155.26939-1-colin.i.king@gmail.com
2022-05-10	ath11k: Reuse the available memory after firmware reload	Anilkumar Kolli	3	-4/+23
	Ath11k allocates memory when firmware requests memory in QMI. Coldboot calibration and firmware recovery uses firmware reload. On firmware reload, firmware sends memory request again. If Ath11k allocates memory on first firmware boot, reuse the available memory. Also check if the segment type and size is same on the next firmware boot. Reuse if segment type/size is same as previous firmware boot else free the segment and allocate the segment with size/type. Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.6.0.1-00752-QCAHKSWPL_SILICONZ-1 Signed-off-by: Anilkumar Kolli <quic_akolli@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220506141448.10340-1-quic_akolli@quicinc.com
2022-05-10	wil6210: remove 'freq' debugfs	Johannes Berg	1	-14/+0
	This is completely racy, remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220506095451.9852305a92c8.I05155824a1b9059eb59beccde81786dc69de354a@changeid
2022-05-10	ath11k: Designating channel frequency when sending management frames	Baochen Qiang	3	-1/+24
	In case of Passpoint, the WLAN interface may be requested to remain on a specific channel and then to send some management frames on that channel. Now chanfreq of wmi_mgmt_send_cmd is set as 0, as a result firmware may choose a default but wrong channel. Fix it by assigning chanfreq field with the designated channel. This change only applies to WCN6855 and QCA6390, other chips are not affected. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220506013614.1580274-4-quic_bqiang@quicinc.com
2022-05-10	ath11k: Don't check arvif->is_started before sending management frames	Baochen Qiang	1	-2/+3
	Commit 66307ca04057 ("ath11k: fix mgmt_tx_wmi cmd sent to FW for deleted vdev") wants both of below two conditions are true before sending management frames: 1: ar->allocated_vdev_map & (1LL << arvif->vdev_id) 2: arvif->is_started Actually the second one is not necessary because with the first one we can make sure the vdev is present. Also use ar->conf_mutex to synchronize vdev delete and mgmt. TX. This issue is found in case of Passpoint scenario where ath11k needs to send action frames before vdev is started. Fix it by removing the second condition. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Fixes: 66307ca04057 ("ath11k: fix mgmt_tx_wmi cmd sent to FW for deleted vdev") Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220506013614.1580274-3-quic_bqiang@quicinc.com
2022-05-10	ath11k: Implement remain-on-channel support	Baochen Qiang	3	-0/+120
	Add remain on channel support, it is needed in several scenarios such as Passpoint etc. Currently this is supported by QCA6390, WCN6855, IPQ8074, IPQ6018 and QCN9074. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220506013614.1580274-2-quic_bqiang@quicinc.com
2022-05-10	ath11k: Handle keepalive during WoWLAN suspend and resume	Baochen Qiang	5	-0/+156
	With WoWLAN enabled and after sleeping for a rather long time, we are seeing that with some APs, it is not able to wake up the STA though the correct wake up pattern has been configured. This is because the host doesn't send keepalive command to firmware, thus firmware will not send any packet to the AP and after a specific time the AP kicks out the STA. Fix this issue by enabling keepalive before going to suspend and disabling it after resume back. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220506012540.1579604-1-quic_bqiang@quicinc.com
2022-05-10	MAINTAINERS: Add James and Mike as Arm64 performance events reviewers	Mathieu Poirier	1	-1/+2
	James, Mike and Leo have been doing all the reviews and development work for the Coresight perf tools for a couple of years now. As such remove my name and add James and Mike as official reviewers (Leo is already listed as such). Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: James Clark <james.clark@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10	ALSA: wavefront: Proper check of get_user() error	Takashi Iwai	1	-1/+2
	The antient ISA wavefront driver reads its sample patch data (uploaded over an ioctl) via __get_user() with no good reason; likely just for some performance optimizations in the past. Let's change this to the standard get_user() and the error check for handling the fault case properly. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220510103626.16635-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-05-10	udf: Avoid using stale lengthOfImpUse	Jan Kara	1	-4/+4
	udf_write_fi() uses lengthOfImpUse of the entry it is writing to. However this field has not yet been initialized so it either contains completely bogus value or value from last directory entry at that place. In either case this is wrong and can lead to filesystem corruption or kernel crashes. Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> CC: stable@vger.kernel.org Fixes: 979a6e28dd96 ("udf: Get rid of 0-length arrays in struct fileIdentDesc") Signed-off-by: Jan Kara <jack@suse.cz>
2022-05-10	virtio: fix virtio transitional ids	Shunsuke Mie	1	-7/+7
	This commit fixes the transitional PCI device ID. Fixes: d61914ea6ada ("virtio: update virtio id table, add transitional ids") Signed-off-by: Shunsuke Mie <mie@igel.co.jp> Link: https://lore.kernel.org/r/20220510102723.87666-1-mie@igel.co.jp Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-10	arm64: vdso: fix makefile dependency on vdso.so	Joey Gouly	3	-6/+4
	There is currently no dependency for vdso-wrap.S on vdso.so, which means that you can get a build that uses a stale vdso*-wrap.o. In commit a5b8ca97fbf8, the file that includes the vdso.so was moved and renamed from arch/arm64/kernel/vdso/vdso.S to arch/arm64/kernel/vdso-wrap.S, when this happened the Makefile was not updated to force the dependcy on vdso.so. Fixes: a5b8ca97fbf8 ("arm64: do not descend to vdso directories twice") Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220510102721.50811-1-joey.gouly@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-10	decnet: Use container_of() for struct dn_neigh casts	Kees Cook	3	-5/+6
	Clang's structure layout randomization feature gets upset when it sees struct neighbor (which is randomized) cast to struct dn_neigh: net/decnet/dn_route.c:1123:15: error: casting from randomized structure pointer type 'struct neighbour ' to 'struct dn_neigh ' gateway = ((struct dn_neigh *)neigh)->addr; ^ Update all the open-coded casts to use container_of() to do the conversion instead of depending on strict member ordering. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/202205041247.WKBEHGS5-lkp@intel.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Zheng Yongjun <zhengyongjun3@huawei.com> Cc: Bill Wendling <morbo@google.com> Cc: linux-decnet-user@lists.sourceforge.net Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220508102217.2647184-1-keescook@chromium.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	writeback: Avoid skipping inode writeback	Jing Xia	1	-0/+4
	We have run into an issue that a task gets stuck in balance_dirty_pages_ratelimited() when perform I/O stress testing. The reason we observed is that an I_DIRTY_PAGES inode with lots of dirty pages is in b_dirty_time list and standard background writeback cannot writeback the inode. After studing the relevant code, the following scenario may lead to the issue: task1 task2 ----- ----- fuse_flush write_inode_now //in b_dirty_time writeback_single_inode __writeback_single_inode fuse_write_end filemap_dirty_folio __xa_set_mark:PAGECACHE_TAG_DIRTY lock inode->i_lock if mapping tagged PAGECACHE_TAG_DIRTY inode->i_state \|= I_DIRTY_PAGES unlock inode->i_lock __mark_inode_dirty:I_DIRTY_PAGES lock inode->i_lock -was dirty,inode stays in -b_dirty_time unlock inode->i_lock if(!(inode->i_state & I_DIRTY_All)) -not true,so nothing done This patch moves the dirty inode to b_dirty list when the inode currently is not queued in b_io or b_more_io list at the end of writeback_single_inode. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> CC: stable@vger.kernel.org Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") Signed-off-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220510023514.27399-1-jing.xia@unisoc.com
2022-05-10	x25: remove redundant pointer dev	Colin Ian King	1	-2/+1
	Pointer dev is being assigned a value that is never used, the assignment and the variable are redundant and can be removed. Also replace null check with the preferred !ptr idiom. Cleans up clang scan warning: net/x25/x25_proc.c:94:26: warning: Although the value stored to 'dev' is used in the enclosing expression, the value is never actually read from 'dev' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220508214500.60446-1-colin.i.king@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	Merge branch 'this-is-a-patch-series-for-ethernet-driver-of-sunplus-sp7021-soc'	Paolo Abeni	19	-0/+2191
	Wells Lu says: ==================== This is a patch series for Ethernet driver of Sunplus SP7021 SoC. Sunplus SP7021 is an ARM Cortex A7 (4 cores) based SoC. It integrates many peripherals (ex: UART, I2C, SPI, SDIO, eMMC, USB, SD card and etc.) into a single chip. It is designed for industrial control applications. Refer to: https://sunplus.atlassian.net/wiki/spaces/doc/overview https://tibbo.com/store/plus1.html ==================== Link: https://lore.kernel.org/r/1652004800-3212-1-git-send-email-wellslutw@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	net: ethernet: Add driver for Sunplus SP7021	Wells Lu	18	-0/+2043
	Add driver for Sunplus SP7021 SoC. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Wells Lu <wellslutw@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	devicetree: bindings: net: Add bindings doc for Sunplus SP7021.	Wells Lu	2	-0/+148
	Add bindings documentation for Sunplus SP7021 SoC. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Wells Lu <wellslutw@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	dma-buf: call dma_buf_stats_setup after dmabuf is in valid list	Charan Teja Reddy	1	-4/+4
	When dma_buf_stats_setup() fails, it closes the dmabuf file which results into the calling of dma_buf_file_release() where it does list_del(&dmabuf->list_node) with out first adding it to the proper list. This is resulting into panic in the below path: __list_del_entry_valid+0x38/0xac dma_buf_file_release+0x74/0x158 __fput+0xf4/0x428 ____fput+0x14/0x24 task_work_run+0x178/0x24c do_notify_resume+0x194/0x264 work_pending+0xc/0x5f0 Fix it by moving the dma_buf_stats_setup() after dmabuf is added to the list. Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs") Signed-off-by: Charan Teja Reddy <quic_charante@quicinc.com> Tested-by: T.J. Mercier <tjmercier@google.com> Acked-by: T.J. Mercier <tjmercier@google.com> Cc: <stable@vger.kernel.org> # 5.15.x+ Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1652125797-2043-1-git-send-email-quic_charante@quicinc.com
2022-05-10	net: atlantic: always deep reset on pm op, fixing up my null deref regression	Manuel Ullmann	1	-2/+2
	The impact of this regression is the same for resume that I saw on thaw: the kernel hangs and nothing except SysRq rebooting can be done. Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs"), where I disabled deep pm resets in suspend and resume, trying to make sense of the atl_resume_common() deep parameter in the first place. It turns out, that atlantic always has to deep reset on pm operations. Even though I expected that and tested resume, I screwed up by kexec-rebooting into an unpatched kernel, thus missing the breakage. This fixup obsoletes the deep parameter of atl_resume_common, but I leave the cleanup for the maintainers to post to mainline. Suspend and hibernation were successfully tested by the reporters. Fixes: cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs") Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/ Reported-by: Jordan Leppert <jordanleppert@protonmail.com> Reported-by: Holger Hoffstaette <holger@applied-asynchrony.com> Tested-by: Jordan Leppert <jordanleppert@protonmail.com> Tested-by: Holger Hoffstaette <holger@applied-asynchrony.com> CC: <stable@vger.kernel.org> # 5.10+ Signed-off-by: Manuel Ullmann <labre@posteo.de> Link: https://lore.kernel.org/r/87bkw8dfmp.fsf@posteo.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	ceph: check folio PG_private bit instead of folio->private	Xiubo Li	1	-4/+7
	The pages in the file mapping maybe reclaimed and reused by other subsystems and the page->private maybe used as flags field or something else, if later that pages are used by page caches again the page->private maybe not cleared as expected. Here will check the PG_private bit instead of the folio->private. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/55421 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-10	ceph: fix setting of xattrs on async created inodes	Jeff Layton	1	-3/+13
	Currently when we create a file, we spin up an xattr buffer to send along with the create request. If we end up doing an async create however, then we currently pass down a zero-length xattr buffer. Fix the code to send down the xattr buffer in req->r_pagelist. If the xattrs span more than a page, however give up and don't try to do an async create. Cc: stable@vger.kernel.org URL: https://bugzilla.redhat.com/show_bug.cgi?id=2063929 Fixes: 9a8d03ca2e2c ("ceph: attempt to do async create when possible") Reported-by: John Fortin <fortinj66@gmail.com> Reported-by: Sri Ramanujam <sri@ramanujam.io> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-10	Merge branch ↵	Paolo Abeni	11	-63/+282
	'ptp-support-hardware-clocks-with-additional-free-running-cycle-counter' Gerhard Engleder says: ==================== ptp: Support hardware clocks with additional free running cycle counter ptp vclocks require a clock with free running time for the timecounter. Currently only a physical clock forced to free running is supported. If vclocks are used, then the physical clock cannot be synchronized anymore. The synchronized time is not available in hardware in this case. As a result, timed transmission with TAPRIO hardware support is not possible anymore. If hardware would support a free running time additionally to the physical clock, then the physical clock does not need to be forced to free running. Thus, the physical clocks can still be synchronized while vclocks are in use. The physical clock could be used to synchronize the time domain of the TSN network and trigger TAPRIO. In parallel vclocks can be used to synchronize other time domains. One year ago I thought for two time domains within a TSN network also two physical clocks are required. This would lead to new kernel interfaces for asking for the second clock, ... . But actually for a time triggered system like TSN there can be only one time domain that controls the system itself. All other time domains belong to other layers, but not to the time triggered system itself. So other time domains can be based on a free running counter if similar mechanisms like 2 step synchroisation are used. Synchronisation was tested with two time domains between two directly connected hosts. Each host run two ptp4l instances, the first used the physical clock and the second used the virtual clock. I used my FPGA based network controller as network device. ptp4l was used in combination with the virtual clock support patches from Miroslav Lichvar. v4: - if_index of 0 is invalid (Jonathan Lemon) - set if_index to 0 in the SOF_TIMESTAMPING_RAW_HARDWARE block (Jonathan Lemon) - add helper function for netdev_get_tstamp() call (Jonathan Lemon) - update SKBTX_ANY_TSTAMP (Paolo Abeni) - use separate bits for new tx_flags (Richard Cochran) v3: - optimize ptp_convert_timestamp (Richard Cochran) - call dev_get_by_napi_id() only if needed (Richard Cochran) - use non-negated logical test (Richard Cochran) - add comment for skipped output (Richard Cochran) - add comment for SKBTX_HW_TSTAMP_USE_CYCLES masking (Richard Cochran) v2: - rename ptp_clock cycles to has_cycles (Richard Cochran) - call it free running cycle counter (Richard Cochran) - update struct skb_shared_hwtstamps kdoc (Richard Cochran) - optimize timestamp address/cookie processing path (Richard Cochran, Vinicius Costa Gomes) v1: - complete rework based on suggestions (Richard Cochran) ==================== Link: https://lore.kernel.org/r/20220506200142.3329-1-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	tsnep: Add free running cycle counter support	Gerhard Engleder	3	-7/+63
	The TSN endpoint Ethernet MAC supports a free running counter additionally to its clock. This free running counter can be read and hardware timestamps are supported. As the name implies, this counter cannot be set and its frequency cannot be adjusted. Add free running cycle counter support based on this free running counter to physical clock. This also requires hardware time stamps based on that free running counter. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	ptp: Speed up vclock lookup	Gerhard Engleder	2	-19/+48
	ptp_convert_timestamp() is called in the RX path of network messages. The current implementation takes ~5000ns on 1.2GHz A53. This is too much for the hot path of packet processing. Introduce hash table for fast vclock lookup in ptp_convert_timestamp(). The execution time of ptp_convert_timestamp() is reduced to ~700ns on 1.2GHz A53. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	ptp: Support late timestamp determination	Gerhard Engleder	3	-13/+71
	If a physical clock supports a free running cycle counter, then timestamps shall be based on this time too. For TX it is known in advance before the transmission if a timestamp based on the free running cycle counter is needed. For RX it is impossible to know which timestamp is needed before the packet is received and assigned to a socket. Support late timestamp determination by a network device. Therefore, an address/cookie is stored within the new netdev_data field of struct skb_shared_hwtstamps. This address/cookie is provided to a new network device function called ndo_get_tstamp(), which returns a timestamp based on the normal/adjustable time or based on the free running cycle counter. If function is not supported, then timestamp handling is not changed. This mechanism is intended for RX, but TX use is also possible. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	ptp: Pass hwtstamp to ptp_convert_timestamp()	Gerhard Engleder	3	-8/+6
	ptp_convert_timestamp() converts only the timestamp hwtstamp, which is a field of the argument with the type struct skb_shared_hwtstamps . So a pointer to the hwtstamp field of this structure is sufficient. Rework ptp_convert_timestamp() to use an argument of type ktime_t . This allows to add additional timestamp manipulation stages before the call of ptp_convert_timestamp(). Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	ptp: Request cycles for TX timestamp	Gerhard Engleder	2	-2/+16
	The free running cycle counter of physical clocks called cycles shall be used for hardware timestamps to enable synchronisation. Introduce new flag SKBTX_HW_TSTAMP_USE_CYCLES, which signals driver to provide a TX timestamp based on cycles if cycles are supported. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	ptp: Add cycles support for virtual clocks	Gerhard Engleder	5	-16/+80
	ptp vclocks require a free running time for their timecounter. Currently only a physical clock forced to free running is supported. If vclocks are used, then the physical clock cannot be synchronized anymore. The synchronized time is not available in hardware in this case. As a result, timed transmission with TAPRIO hardware support is not possible anymore. If hardware would support a free running time additionally to the physical clock, then the physical clock does not need to be forced to free running. Thus, the physical clocks can still be synchronized while vclocks are in use. The physical clock could be used to synchronize the time domain of the TSN network and trigger TAPRIO. In parallel vclocks can be used to synchronize other time domains. Introduce support for a free running cycle counter called cycles to physical clocks. Rework ptp vclocks to use this free running cycle counter. Default implementation is based on time of physical clock. Thus, behavior of ptp vclocks based on physical clocks without free running cycle counter is identical to previous behavior. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	eth: dpaa2-mac: remove a dead-code NULL check on fwnode parent	Jakub Kicinski	1	-3/+0
	Since commit 4e30e98c4b4c ("dpaa2-mac: return -EPROBE_DEFER from dpaa2_mac_open in case the fwnode is not set") @parent can't be NULL after the if. It's either the address of the ->fwnode of @dpmacs or @fwnode in case of ACPI. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20220506200029.852310-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	net/mlx5: Lag, add debugfs to query hardware lag state	Mark Bloch	5	-4/+192
	Lag state has become very complicated with many modes, flags, types and port selections methods and future work will add additional features. Add a debugfs to query the current lag state. A new directory named "lag" will be created under the mlx5 debugfs directory. As the driver has debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag For example: /sys/kernel/debug/mlx5/0000:08:00.0/lag The following files are exposed: - state: Returns "active" or "disabled". If "active" it means hardware lag is active. - members: Returns the BDFs of all the members of lag object. - type: Returns the type of the lag currently configured. Valid only if hardware lag is active. * "roce" - Members are bare metal PFs. * "switchdev" - Members are in switchdev mode. * "multipath" - ECMP offloads. - port_sel_mode: Returns the egress port selection method, valid only if hardware lag is active. * "queue_affinity" - Egress port is selected by the QP/SQ affinity. * "hash" - Egress port is selected by hash done on each packet. Controlled by: xmit_hash_policy of the bond device. - flags: Returns flags that are specific per lag @type. Valid only if hardware lag is active. * "shared_fdb" - "on" or "off", if "on" single FDB is used. - mapping: Returns the mapping which is used to select egress port. Valid only if hardware lag is active. If @port_sel_mode is "hash" returns the active egress ports. The hash result will select only active ports. if @port_sel_mode is "queue_affinity" returns the mapping between the configured port affinity of the QP/SQ and actual egress port. For example: * 1:1 - Mapping means if the configured affinity is port 1 traffic will egress via port 1. * 1:2 - Mapping means if the configured affinity is port 1 traffic will egress via port 2. This can happen if port 1 is down or in active/backup mode and port 1 is backup. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Lag, use buckets in hash mode	Mark Bloch	4	-76/+182
	When in hardware lag and the NIC has more than 2 ports when one port goes down need to distribute the traffic between the remaining active ports. For better spread in such cases instead of using 1-to-1 mapping and only 4 slots in the hash, use many. Each port will have many slots that point to it. When a port goes down go over all the slots that pointed to that port and spread them between the remaining active ports. Once the port comes back restore the default mapping. We will have number_of_ports * MLX5_LAG_MAX_HASH_BUCKETS slots. Each MLX5_LAG_MAX_HASH_BUCKETS belong to a different port. The native mapping is such that: port 1: The first MLX5_LAG_MAX_HASH_BUCKETS slots are: [1, 1, .., 1] which means if a packet is hased into one of this slots it will hit the wire via port 1. port 2: The second MLX5_LAG_MAX_HASH_BUCKETS slots are: [2, 2, .., 2] which means if a packet is hased into one of this slots it will hit the wire via port2. and this mapping is the same of the rest of the ports. On a failover, lets say port 2 goes down (port 1, 3, 4 are still up). the new mapping for port 2 will be: port 2: The second MLX5_LAG_MAX_HASH_BUCKETS are: [1, 3, 1, 4, .., 4] which means the mapping was changed from the native mapping to a mapping that consists of only the active ports. With this if a port goes down the traffic will be split between the active ports randomly Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Lag, refactor dmesg print	Mark Bloch	1	-10/+12
	Combine dmesg lag prints into a single function. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Support devices with more than 2 ports	Mark Bloch	3	-3/+5
	Increase the define MLX5_MAX_PORTS to 4 as the driver is ready to support NICs with 4 ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Lag, use actual number of lag ports	Mark Bloch	3	-149/+216
	Refactor the entire lag code to use ldev->ports instead of hard-coded defines (like MLX5_MAX_PORTS) for its operations. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Lag, use hash when in roce lag on 4 ports	Mark Bloch	1	-9/+36
	Downstream patches will add support for lag over 4 ports. In that mode we will only use hash as the uplink selection method. Using hash instead of queue affinity (before this patch) offers key advantages like: - Align ports selection method with the method used by the bond device - Better packets distribution where a single queue can transmit from multiple ports (with queue affinity a queue is bound to a single port regardless of the packet being sent). - In case of failover we traffic is split between multiple ports and not a single one like in queue affinity. Going forward it was decided that queue affinity will be deprecated as using hash provides a better user experience which means on 4 ports HCAs hash will always be used. Future work will add hash support for 2 ports HCAs as well. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Lag, support single FDB only on 2 ports	Mark Bloch	1	-0/+4
	E-Switch currently doesn't support more than 2 E-Switch managers being aggregated under a single hardware lag. Have specific checks to disallow creating lag when the code doesn't support it. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Lag, store number of ports inside lag object	Mark Bloch	2	-0/+2
	Store the number of lag ports inside the lag object. Lag object is a single shared object managing the lag state of multiple mlx5 devices on the same physical HCA. Downstream patches will allow hardware lag to be created over devices with more than 2 ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Lag, filter non compatible devices	Mark Bloch	3	-14/+47
	When search for a peer lag device we can filter based on that device's capabilities. Downstream patch will be less strict when filtering compatible devices and remove the limitation where we require exact MLX5_MAX_PORTS and change it to a range. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Lag, use lag lock	Mark Bloch	4	-65/+35
	Use a lag specific lock instead of depending on external locks to synchronise the lag creation/destruction. With this, taking E-Switch mode lock is no longer needed for syncing lag logic. Cleanup any dead code that is left over and don't export functions that aren't used outside the E-Switch core code. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Lag, move E-Switch prerequisite check into lag code	Mark Bloch	3	-16/+9
	There is no need to expose E-Switch function for something that can be checked with already present API inside lag code. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: devcom only supports 2 ports	Mark Bloch	2	-7/+11
	Devcom API is intended to be used between 2 devices only add this implied assumption into the code and check when it's no true. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Lag, expose number of lag ports	Mark Bloch	6	-2/+11
	Downstream patches will add support for hardware lag with more than 2 ports. Add a way for users to query the number of lag ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Increase FW pre-init timeout for health recovery	Gavin Li	6	-13/+20
	Currently, health recovery will reload driver to recover it from fatal errors. During the driver's load process, it would wait for FW to set the pre-init bit for up to 120 seconds, beyond this threshold it would abort the load process. In some cases, such as a FW upgrade on the DPU, this timeout period is insufficient, and the user has no way to recover the host device. To solve this issue, introduce a new FW pre-init timeout for health recovery, which is set to 2 hours. The timeout for devlink reload and probe will use the original one because they are user triggered flows, and therefore should not have a significantly long timeout, during which the user command would hang. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	net/mlx5: Add exit route when waiting for FW	Gavin Li	2	-1/+5
	Currently, removing a device needs to get the driver interface lock before doing any cleanup. If the driver is waiting in a loop for FW init, there is no way to cancel the wait, instead the device cleanup waits for the loop to conclude and release the lock. To allow immediate response to remove device commands, check the TEARDOWN flag while waiting for FW init, and exit the loop if it has been set. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10	Merge branch 'nfp-support-corigine-pcie-vendor-id'	Jakub Kicinski	5	-21/+57
	Simon Horman says: ==================== nfp: support Corigine PCIE vendor ID Historically the nfp driver has supported NFP chips with Netronome's PCIE vendor ID. This patch extends the driver to also support NFP chips, which at this point are assumed to be otherwise identical from a software perspective, that have Corigine's PCIE vendor ID (0x1da8). This patchset begins by cleaning up strings to make them: * Vendor neutral for the NFP chip * Relate to Corigine for the driver itself It then adds support to the driver for the Corigine's PCIE vendor ID ==================== Link: https://lore.kernel.org/r/20220508173816.476357-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	nfp: support Corigine PCIE vendor ID	Yu Xiao	4	-16/+50
	Historically the nfp driver has supported NFP chips with Netronome's PCIE vendor ID. This patch extends the driver to also support NFP chips, which at this point are assumed to be otherwise identical from a software perspective, that have Corigine's PCIE vendor ID (0x1da8). Also, Rename the macro definitions PCI_DEVICE_ID_NERTONEOME_NFPXXXX to PCI_DEVICE_ID_NFPXXXX, as they are now used in conjunction with two PCIE vendor IDs. Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	nfp: vendor neutral strings for chip and Corigne in strings for driver	Yu Xiao	3	-5/+7
	Historically the nfp driver has supported NFP chips with Netronome's PCIE vendor ID. In preparation for extending the to also support NFP chips that have Corigine's PCIE vendor ID (0x1da8) make printk statements relating to the chip vendor neutral. An alternate approach is to set the string based on the PCI vendor ID. In our judgement this proved to cumbersome so we have taken this simpler approach. Update strings relating to the driver to use Corigine, who have taken over maintenance of the driver. Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	Merge tag 'batadv-net-pullrequest-20220508' of ↵	Jakub Kicinski	1	-0/+11
	git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - Don't skb_split skbuffs with frag_list, by Sven Eckelmann * tag 'batadv-net-pullrequest-20220508' of git://git.open-mesh.org/linux-merge: batman-adv: Don't skb_split skbuffs with frag_list ==================== Link: https://lore.kernel.org/r/20220508132110.20451-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	net: dsa: flush switchdev workqueue on bridge join error path	Vladimir Oltean	1	-0/+1
	There is a race between switchdev_bridge_port_offload() and the dsa_port_switchdev_sync_attrs() call right below it. When switchdev_bridge_port_offload() finishes, FDB entries have been replayed by the bridge, but are scheduled for deferred execution later. However dsa_port_switchdev_sync_attrs -> dsa_port_can_apply_vlan_filtering() may impose restrictions on the vlan_filtering attribute and refuse offloading. When this happens, the delayed FDB entries will dereference dp->bridge, which is a NULL pointer because we have stopped the process of offloading this bridge. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Workqueue: dsa_ordered dsa_slave_switchdev_event_work pc : dsa_port_bridge_host_fdb_del+0x64/0x100 lr : dsa_slave_switchdev_event_work+0x130/0x1bc Call trace: dsa_port_bridge_host_fdb_del+0x64/0x100 dsa_slave_switchdev_event_work+0x130/0x1bc process_one_work+0x294/0x670 worker_thread+0x80/0x460 ---[ end trace 0000000000000000 ]--- Error: dsa_core: Must first remove VLAN uppers having VIDs also present in bridge. Fix the bug by doing what we do on the normal bridge leave path as well, which is to wait until the deferred FDB entries complete executing, then exit. The placement of dsa_flush_workqueue() after switchdev_bridge_port_unoffload() guarantees that both the FDB additions and deletions on rollback are waited for. Fixes: d7d0d423dbaa ("net: dsa: flush switchdev workqueue when leaving the bridge") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220507134550.1849834-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>