kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2022-03-04	Bluetooth: mgmt: Remove unneeded variable	Minghao Chi	1	-5/+2
	Return value from mgmt_cmd_complete() directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-03-04	Bluetooth: hci_sync: fix undefined return of hci_disconnect_all_sync()	Tom Rix	1	-1/+1
	clang static analysis reports this problem hci_sync.c:4428:2: warning: Undefined or garbage value returned to caller return err; ^~~~~~~~~~ If there are no connections this function is a noop but err is never set and a false error could be reported. Return 0 as other hci_* functions do. Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier") Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-03-04	Bluetooth: mgmt: Replace zero-length array with flexible-array member	Changcheng Deng	1	-1/+1
	There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use "flexible array members" for these cases. The older style of one-element or zero-length arrays should no longer be used. Reference: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-03-04	btrfs: fallback to blocking mode when doing async dio over multiple extents	Filipe Manana	1	-0/+28
	Some users recently reported that MariaDB was getting a read corruption when using io_uring on top of btrfs. This started to happen in 5.16, after commit 51bd9563b6783d ("btrfs: fix deadlock due to page faults during direct IO reads and writes"). That changed btrfs to use the new iomap flag IOMAP_DIO_PARTIAL and to disable page faults before calling iomap_dio_rw(). This was necessary to fix deadlocks when the iovector corresponds to a memory mapped file region. That type of scenario is exercised by test case generic/647 from fstests. For this MariaDB scenario, we attempt to read 16K from file offset X using IOCB_NOWAIT and io_uring. In that range we have 4 extents, each with a size of 4K, and what happens is the following: 1) btrfs_direct_read() disables page faults and calls iomap_dio_rw(); 2) iomap creates a struct iomap_dio object, its reference count is initialized to 1 and its ->size field is initialized to 0; 3) iomap calls btrfs_dio_iomap_begin() with file offset X, which finds the first 4K extent, and setups an iomap for this extent consisting of a single page; 4) At iomap_dio_bio_iter(), we are able to access the first page of the buffer (struct iov_iter) with bio_iov_iter_get_pages() without triggering a page fault; 5) iomap submits a bio for this 4K extent (iomap_dio_submit_bio() -> btrfs_submit_direct()) and increments the refcount on the struct iomap_dio object to 2; The ->size field of the struct iomap_dio object is incremented to 4K; 6) iomap calls btrfs_iomap_begin() again, this time with a file offset of X + 4K. There we setup an iomap for the next extent that also has a size of 4K; 7) Then at iomap_dio_bio_iter() we call bio_iov_iter_get_pages(), which tries to access the next page (2nd page) of the buffer. This triggers a page fault and returns -EFAULT; 8) At __iomap_dio_rw() we see the -EFAULT, but we reset the error to 0 because we passed the flag IOMAP_DIO_PARTIAL to iomap and the struct iomap_dio object has a ->size value of 4K (we submitted a bio for an extent already). The 'wait_for_completion' variable is not set to true, because our iocb has IOCB_NOWAIT set; 9) At the bottom of __iomap_dio_rw(), we decrement the reference count of the struct iomap_dio object from 2 to 1. Because we were not the only ones holding a reference on it and 'wait_for_completion' is set to false, -EIOCBQUEUED is returned to btrfs_direct_read(), which just returns it up the callchain, up to io_uring; 10) The bio submitted for the first extent (step 5) completes and its bio endio function, iomap_dio_bio_end_io(), decrements the last reference on the struct iomap_dio object, resulting in calling iomap_dio_complete_work() -> iomap_dio_complete(). 11) At iomap_dio_complete() we adjust the iocb->ki_pos from X to X + 4K and return 4K (the amount of io done) to iomap_dio_complete_work(); 12) iomap_dio_complete_work() calls the iocb completion callback, iocb->ki_complete() with a second argument value of 4K (total io done) and the iocb with the adjust ki_pos of X + 4K. This results in completing the read request for io_uring, leaving it with a result of 4K bytes read, and only the first page of the buffer filled in, while the remaining 3 pages, corresponding to the other 3 extents, were not filled; 13) For the application, the result is unexpected because if we ask to read N bytes, it expects to get N bytes read as long as those N bytes don't cross the EOF (i_size). MariaDB reports this as an error, as it's not expecting a short read, since it knows it's asking for read operations fully within the i_size boundary. This is typical in many applications, but it may also be questionable if they should react to such short reads by issuing more read calls to get the remaining data. Nevertheless, the short read happened due to a change in btrfs regarding how it deals with page faults while in the middle of a read operation, and there's no reason why btrfs can't have the previous behaviour of returning the whole data that was requested by the application. The problem can also be triggered with the following simple program: /* Get O_DIRECT / #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <errno.h> #include <string.h> #include <liburing.h> int main(int argc, char argv[]) { char foo_path; struct io_uring ring; struct io_uring_sqe sqe; struct io_uring_cqe cqe; struct iovec iovec; int fd; long pagesize; void write_buf; void read_buf; ssize_t ret; int i; if (argc != 2) { fprintf(stderr, "Use: %s <directory>\n", argv[0]); return 1; } foo_path = malloc(strlen(argv[1]) + 5); if (!foo_path) { fprintf(stderr, "Failed to allocate memory for file path\n"); return 1; } strcpy(foo_path, argv[1]); strcat(foo_path, "/foo"); / * Create file foo with 2 extents, each with a size matching * the page size. Then allocate a buffer to read both extents * with io_uring, using O_DIRECT and IOCB_NOWAIT. Before doing * the read with io_uring, access the first page of the buffer * to fault it in, so that during the read we only trigger a * page fault when accessing the second page of the buffer. / fd = open(foo_path, O_CREAT \| O_TRUNC \| O_WRONLY \| O_DIRECT, 0666); if (fd == -1) { fprintf(stderr, "Failed to create file 'foo': %s (errno %d)", strerror(errno), errno); return 1; } pagesize = sysconf(_SC_PAGE_SIZE); ret = posix_memalign(&write_buf, pagesize, 2 pagesize); if (ret) { fprintf(stderr, "Failed to allocate write buffer\n"); return 1; } memset(write_buf, 0xab, pagesize); memset(write_buf + pagesize, 0xcd, pagesize); /* Create 2 extents, each with a size matching page size. / for (i = 0; i < 2; i++) { ret = pwrite(fd, write_buf + i pagesize, pagesize, i * pagesize); if (ret != pagesize) { fprintf(stderr, "Failed to write to file, ret = %ld errno %d (%s)\n", ret, errno, strerror(errno)); return 1; } ret = fsync(fd); if (ret != 0) { fprintf(stderr, "Failed to fsync file\n"); return 1; } } close(fd); fd = open(foo_path, O_RDONLY \| O_DIRECT); if (fd == -1) { fprintf(stderr, "Failed to open file 'foo': %s (errno %d)", strerror(errno), errno); return 1; } ret = posix_memalign(&read_buf, pagesize, 2 * pagesize); if (ret) { fprintf(stderr, "Failed to allocate read buffer\n"); return 1; } /* * Fault in only the first page of the read buffer. * We want to trigger a page fault for the 2nd page of the * read buffer during the read operation with io_uring * (O_DIRECT and IOCB_NOWAIT). / memset(read_buf, 0, 1); ret = io_uring_queue_init(1, &ring, 0); if (ret != 0) { fprintf(stderr, "Failed to create io_uring queue\n"); return 1; } sqe = io_uring_get_sqe(&ring); if (!sqe) { fprintf(stderr, "Failed to get io_uring sqe\n"); return 1; } iovec.iov_base = read_buf; iovec.iov_len = 2 pagesize; io_uring_prep_readv(sqe, fd, &iovec, 1, 0); ret = io_uring_submit_and_wait(&ring, 1); if (ret != 1) { fprintf(stderr, "Failed at io_uring_submit_and_wait()\n"); return 1; } ret = io_uring_wait_cqe(&ring, &cqe); if (ret < 0) { fprintf(stderr, "Failed at io_uring_wait_cqe()\n"); return 1; } printf("io_uring read result for file foo:\n\n"); printf(" cqe->res == %d (expected %d)\n", cqe->res, 2 * pagesize); printf(" memcmp(read_buf, write_buf) == %d (expected 0)\n", memcmp(read_buf, write_buf, 2 * pagesize)); io_uring_cqe_seen(&ring, cqe); io_uring_queue_exit(&ring); return 0; } When running it on an unpatched kernel: $ gcc io_uring_test.c -luring $ mkfs.btrfs -f /dev/sda $ mount /dev/sda /mnt/sda $ ./a.out /mnt/sda io_uring read result for file foo: cqe->res == 4096 (expected 8192) memcmp(read_buf, write_buf) == -205 (expected 0) After this patch, the read always returns 8192 bytes, with the buffer filled with the correct data. Although that reproducer always triggers the bug in my test vms, it's possible that it will not be so reliable on other environments, as that can happen if the bio for the first extent completes and decrements the reference on the struct iomap_dio object before we do the atomic_dec_and_test() on the reference at __iomap_dio_rw(). Fix this in btrfs by having btrfs_dio_iomap_begin() return -EAGAIN whenever we try to satisfy a non blocking IO request (IOMAP_NOWAIT flag set) over a range that spans multiple extents (or a mix of extents and holes). This avoids returning success to the caller when we only did partial IO, which is not optimal for writes and for reads it's actually incorrect, as the caller doesn't expect to get less bytes read than it has requested (unless EOF is crossed), as previously mentioned. This is also the type of behaviour that xfs follows (xfs_direct_write_iomap_begin()), even though it doesn't use IOMAP_DIO_PARTIAL. A test case for fstests will follow soon. Link: https://lore.kernel.org/linux-btrfs/CABVffEM0eEWho+206m470rtM0d9J8ue85TtR-A_oVTuGLWFicA@mail.gmail.com/ Link: https://lore.kernel.org/linux-btrfs/CAHF2GV6U32gmqSjLe=XKgfcZAmLCiH26cJ2OnHGp5x=VAH4OHQ@mail.gmail.com/ CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-04	virtio_console: break out of buf poll on remove	Michael S. Tsirkin	1	-0/+7
	A common pattern for device reset is currently: vdev->config->reset(vdev); .. cleanup .. reset prevents new interrupts from arriving and waits for interrupt handlers to finish. However if - as is common - the handler queues a work request which is flushed during the cleanup stage, we have code adding buffers / trying to get buffers while device is reset. Not good. This was reproduced by running modprobe virtio_console modprobe -r virtio_console in a loop. Fix this up by calling virtio_break_device + flush before reset. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1786239 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04	virtio: document virtio_reset_device	Michael S. Tsirkin	1	-0/+16
	Looks like most callers get driver/device removal wrong. Document what's expected of callers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04	virtio: acknowledge all features before access	Michael S. Tsirkin	2	-18/+24
	The feature negotiation was designed in a way that makes it possible for devices to know which config fields will be accessed by drivers. This is broken since commit 404123c2db79 ("virtio: allow drivers to validate features") with fallout in at least block and net. We have a partial work-around in commit 2f9a174f918e ("virtio: write back F_VERSION_1 before validate") which at least lets devices find out which format should config space have, but this is a partial fix: guests should not access config space without acknowledging features since otherwise we'll never be able to change the config space format. To fix, split finalize_features from virtio_finalize_features and call finalize_features with all feature bits before validation, and then - if validation changed any bits - once again after. Since virtio_finalize_features no longer writes out features rename it to virtio_features_ok - since that is what it does: checks that features are ok with the device. As a side effect, this also reduces the amount of hypervisor accesses - we now only acknowledge features once unless we are clearing any features when validating (which is uncommon). IRC I think that this was more or less always the intent in the spec but unfortunately the way the spec is worded does not say this explicitly, I plan to address this at the spec level, too. Acked-by: Jason Wang <jasowang@redhat.com> Cc: stable@vger.kernel.org Fixes: 404123c2db79 ("virtio: allow drivers to validate features") Fixes: 2f9a174f918e ("virtio: write back F_VERSION_1 before validate") Cc: "Halil Pasic" <pasic@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04	virtio: unexport virtio_finalize_features	Michael S. Tsirkin	2	-3/+1
	virtio_finalize_features is only used internally within virtio. No reason to export it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-03-04	tipc: fix kernel panic when enabling bearer	Tung Nguyen	1	-5/+7
	When enabling a bearer on a node, a kernel panic is observed: [ 4.498085] RIP: 0010:tipc_mon_prep+0x4e/0x130 [tipc] ... [ 4.520030] Call Trace: [ 4.520689] <IRQ> [ 4.521236] tipc_link_build_proto_msg+0x375/0x750 [tipc] [ 4.522654] tipc_link_build_state_msg+0x48/0xc0 [tipc] [ 4.524034] __tipc_node_link_up+0xd7/0x290 [tipc] [ 4.525292] tipc_rcv+0x5da/0x730 [tipc] [ 4.526346] ? __netif_receive_skb_core+0xb7/0xfc0 [ 4.527601] tipc_l2_rcv_msg+0x5e/0x90 [tipc] [ 4.528737] __netif_receive_skb_list_core+0x20b/0x260 [ 4.530068] netif_receive_skb_list_internal+0x1bf/0x2e0 [ 4.531450] ? dev_gro_receive+0x4c2/0x680 [ 4.532512] napi_complete_done+0x6f/0x180 [ 4.533570] virtnet_poll+0x29c/0x42e [virtio_net] ... The node in question is receiving activate messages in another thread after changing bearer status to allow message sending/ receiving in current thread: thread 1 \| thread 2 -------- \| -------- \| tipc_enable_bearer() \| test_and_set_bit_lock() \| tipc_bearer_xmit_skb() \| \| tipc_l2_rcv_msg() \| tipc_rcv() \| __tipc_node_link_up() \| tipc_link_build_state_msg() \| tipc_link_build_proto_msg() \| tipc_mon_prep() \| { \| ... \| // null-pointer dereference \| u16 gen = mon->dom_gen; \| ... \| } // Not being executed yet \| tipc_mon_create() \| { \| ... \| // allocate \| mon = kzalloc(); \| ... \| } \| Monitoring pointer in thread 2 is dereferenced before monitoring data is allocated in thread 1. This causes kernel panic. This commit fixes it by allocating the monitoring data before enabling the bearer to receive messages. Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework") Reported-by: Shuang Li <shuali@redhat.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: ethernet: sun: Remove redundant code	Jiapeng Chong	1	-16/+0
	Since the starting value in the for loop is greater than or equal to 1, the restriction is CAS_FLAG_REG_PLUS is in the file cassini.h is defined as 0x1 by macro, and the for loop and if condition is not satisfied, so the code here is redundant. Clean up the following smatch warning: drivers/net/ethernet/sun/cassini.c:3513 cas_start_dma() warn: we never enter this loop. drivers/net/ethernet/sun/cassini.c:1239 cas_init_rx_dma() warn: we never enter this loop. drivers/net/ethernet/sun/cassini.c:1247 cas_init_rx_dma() warn: we never enter this loop. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	Merge branch 'nfp-AF_XDP-zero-copy'	David S. Miller	6	-54/+855
	Simon Horman says: ==================== Add AF_XDP zero-copy support for NFP Niklas Söderlund says: This series adds AF_XDP zero-copy support for the NFP driver. The series is based on previous work done by Jakub Kicinski. Patch 1/5 and 2/5 prepares the driver for AF_XDP support by refactoring functions that will act differently once AF_XDP is active or not making the driver easier to read and by preparing some functions to be reused outside the local file scope. Patch 3/5 and 4/5 prepares the driver for dealing the UMEM while finally patch 5/5 adds AF_XDP support. Based on work by Jakub Kicinski. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	nfp: xsk: add AF_XDP zero-copy Rx and Tx support	Niklas Söderlund	6	-28/+756
	This patch adds zero-copy Rx and Tx support for AF_XDP sockets. It do so by adding a separate NAPI poll function that is attached to a each channel when the XSK socket is attached with XDP_SETUP_XSK_POOL, and restored when the XSK socket is terminated, this is done per channel. Support for XDP_TX is implemented and the XDP buffer can safely be moved from the Rx to the Tx queue and correctly freed and returned to the XSK pool once it's transmitted. Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS will allocate a new buffer and copy the zero-copy frame prior passing it to the kernel stack. This patch is based on previous work by Jakub Kicinski. Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	nfp: xsk: add configuration check for XSK socket chunk size	Niklas Söderlund	1	-4/+38
	In preparation for adding AF_XDP support add a configuration check to make sure the buffer size can not be set to a larger value then the XSK socket chunk size. Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	nfp: xsk: add an array of xsk buffer pools to each data path	Niklas Söderlund	2	-2/+21
	Each data path needs an array of xsk pools to track if an xsk socket is in use. Add this array and make sure it's handled correctly when the data path is duplicated. Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	nfp: wrap napi add/del logic	Jakub Kicinski	1	-16/+22
	There will be more NAPI register logic once AF_XDP support is added, wrap our already conditional napi add/del into helpers. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	nfp: expose common functions to be used for AF_XDP	Niklas Söderlund	2	-8/+22
	There are some common functionality that can be reused in the upcoming AF_XDP support. Expose those functions in the header. While at it mark some arguments of nfp_net_rx_csum() as const. Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	Merge branch 'sparx5-ptp'	David S. Miller	11	-11/+1221
	Horatiu Vultur says: ==================== net: sparx5: Add PTP Hardware Clock support Add support for PTP Hardware Clock (PHC) for sparx5. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: sparx5: Implement get_ts_info	Horatiu Vultur	1	-0/+34
	Implement the function get_ts_info in ethtool_ops which is needed to get the HW capabilities for timestamping. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: sparx5: Add support for ptp interrupts	Horatiu Vultur	3	-0/+134
	When doing 2-step timestamping the HW will generate an interrupt when it managed to timestamp a frame. It is the SW responsibility to read it from the FIFO. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: sparx5: Update extraction/injection for timestamping	Horatiu Vultur	5	-1/+248
	Update both the extraction and injection to do timestamping of the frames. The extraction is always doing the timestamping while for injection is doing the timestamping only if it is configured. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: sparx5: Implement SIOCSHWTSTAMP and SIOCGHWTSTAMP	Horatiu Vultur	3	-0/+102
	Implement the ioctl callbacks SIOCSHWTSTAMP and SIOCGHWTSTAMP to allow to configure the ports to enable/disable timestamping for TX. The RX timestamping is always enabled. The HW is capable to run both 1-step timestamping and 2-step timestamping. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: sparx5: Add support for ptp clocks	Horatiu Vultur	4	-1/+356
	The sparx5 has 3 PHC. Enable each of them, for now all the timestamping is happening on the first PHC. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: sparx5: Add registers that are used by ptp functionality	Horatiu Vultur	2	-2/+334
	Add the registers that will be used to configure the PHC in the HW. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	dts: sparx5: Enable ptp interrupt	Horatiu Vultur	1	-2/+3
	Add support for ptp interrupt. This interrupt is used when using 2-step timestamping. For each timestamp that is added in a queue, an interrupt is generated. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	dt-bindings: net: sparx5: Extend with the ptp interrupt	Horatiu Vultur	1	-0/+2
	Extend dt-bindings for sparx5 with ptp interrupt. This is generated when doing 2-step timestamping and the timestamp can be read from the FIFO. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: sparx5: Move ifh from port to local variable	Horatiu Vultur	3	-5/+8
	Currently the ifh is not changed, it is fixed for each frame for each port that is sent out. Move this on the stack because this ifh needs to be change based on the frames that are send out. This is needed for PTP frames. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	Merge branch 'lan937x-t1-phy-driver'	David S. Miller	1	-62/+325
	Arun Ramadoss says: ==================== Add support for LAN937x T1 Phy Driver LAN937x is a Multi-port 100Base-T1 Switch and it internally uses LAN87xx T1 Phy. This series of patch update the initialization routine for the LAN87xx phy and also add LAN937x part support. Added the T1 Phy master-slave configuration through ethtool. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: phy: added ethtool master-slave configuration support	Arun Ramadoss	1	-0/+90
	To configure the T1 phy as master or slave using the ethtool -s <dev> master-slave <forced-master/forced-slave>, the config_aneg and read status functions are added. Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: phy: added the LAN937x phy support	Arun Ramadoss	1	-1/+54
	LAN937x T1 switch is based on LAN87xx Phy, so reusing the init script of the LAN87xx. There is a workaround in accessing the DSP bank register for LAN937x Phy. Whenever there is a bank switch to DSP registers, then we need a one dummy read access before proceeding to the actual register access. Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: phy: updated the initialization routine for LAN87xx	Arun Ramadoss	1	-42/+175
	The new initialization sequence is the improvement to the existing init routine. Init routine does soft reset, run init script and set Hw_init. Added the new access_smi_poll_timeout() for polling smi bank write. Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: phy: removed empty lines in LAN87XX	Arun Ramadoss	1	-4/+0
	Removed the empty lines in struct phy_drivers. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: phy: used the PHY_ID_MATCH_MODEL macro for LAN87XX	Arun Ramadoss	1	-3/+4
	Used the PHY_ID_MATCH_MODEL MACRO for describing the phy_id and phy_id_mask. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: phy: used genphy_soft_reset for phy reset in LAN87xx	Arun Ramadoss	1	-12/+2
	Replaced the current code of resetting of LAN87xx phy to genphy_soft_reset library function. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	Merge branch 'lan8814-1588-support'	David S. Miller	2	-34/+1097
	Divya Koppera says: ==================== Add support for 1588 in LAN8814 The following patch series contains: - Fix for concurrent register access, which provides atomic access to extended page register reads/writes. - Provides dt-bindings related to latency and timestamping that are required for LAN8814 phy. - 1588 hardware timestamping support in LAN8814 phy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: phy: micrel: 1588 support for LAN8814 phy	Divya Koppera	1	-22/+1066
	Add support for 1588 in LAN8814 phy driver. It supports 1-step and 2-step timestamping. Co-developed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	dt-bindings: net: micrel: Configure latency values and timestamping check ↵	Divya Koppera	1	-0/+17
	for LAN8814 phy Supports configuring latency values and also adds check for phy timestamping feature. Signed-off-by: Divya Koppera<Divya.Koppera@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: phy: micrel: Fix concurrent register access	Divya Koppera	1	-14/+16
	Make Extended page register accessing atomic, to overcome unexpected output from register reads/writes. Fixes: 7c2dcfa295b1 ("net: phy: micrel: Add support for LAN8804 PHY") Signed-off-by: Divya Koppera<Divya.Koppera@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	Merge branch 'skb-drop-reasons'	David S. Miller	4	-13/+50
	Menglong Dong says: ==================== net: dev: add skb drop reasons to net/core/dev.c In the commit c504e5c2f964 ("net: skb: introduce kfree_skb_reason()"), we added the support of reporting the reasons of skb drops to kfree_skb tracepoint. And in this series patches, reasons for skb drops are added to the link layer, which means that 'net/core/dev.c' is our target. Following functions are processed: sch_handle_egress() __dev_xmit_skb() enqueue_to_backlog() do_xdp_generic() sch_handle_ingress() __netif_receive_skb_core() and following new drop reasons are added (what they mean can be see in the document of them): SKB_DROP_REASON_QDISC_EGRESS SKB_DROP_REASON_QDISC_DROP SKB_DROP_REASON_CPU_BACKLOG SKB_DROP_REASON_XDP SKB_DROP_REASON_QDISC_INGRESS SKB_DROP_REASON_PTYPE_ABSENT In order to add skb drop reasons to kfree_skb_list(), the function kfree_skb_list_reason() is introduced in the 2th patch, which will be used in __dev_xmit_skb() in the 3th patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: dev: use kfree_skb_reason() for __netif_receive_skb_core()	Menglong Dong	3	-3/+11
	Add reason for skb drops to __netif_receive_skb_core() when packet_type not found to handle the skb. For this purpose, the drop reason SKB_DROP_REASON_PTYPE_ABSENT is introduced. Take ether packets for example, this case mainly happens when L3 protocol is not supported. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: dev: use kfree_skb_reason() for sch_handle_ingress()	Menglong Dong	3	-1/+3
	Replace kfree_skb() used in sch_handle_ingress() with kfree_skb_reason(). Following drop reasons are introduced: SKB_DROP_REASON_TC_INGRESS Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: dev: use kfree_skb_reason() for do_xdp_generic()	Menglong Dong	3	-1/+3
	Replace kfree_skb() used in do_xdp_generic() with kfree_skb_reason(). The drop reason SKB_DROP_REASON_XDP is introduced for this case. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: dev: use kfree_skb_reason() for enqueue_to_backlog()	Menglong Dong	3	-1/+11
	Replace kfree_skb() used in enqueue_to_backlog() with kfree_skb_reason(). The skb rop reason SKB_DROP_REASON_CPU_BACKLOG is introduced for the case of failing to enqueue the skb to the per CPU backlog queue. The further reason can be backlog queue full or RPS flow limition, and I think we needn't to make further distinctions. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: dev: add skb drop reasons to __dev_xmit_skb()	Menglong Dong	3	-2/+8
	Add reasons for skb drops to __dev_xmit_skb() by replacing kfree_skb_list() with kfree_skb_list_reason(). The drop reason of SKB_DROP_REASON_QDISC_DROP is introduced for qdisc enqueue fails. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: skb: introduce the function kfree_skb_list_reason()	Menglong Dong	2	-4/+11
	To report reasons of skb drops, introduce the function kfree_skb_list_reason() and make kfree_skb_list() an inline call to it. This function will be used in the next commit in __dev_xmit_skb(). Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: dev: use kfree_skb_reason() for sch_handle_egress()	Menglong Dong	3	-1/+3
	Replace kfree_skb() used in sch_handle_egress() with kfree_skb_reason(). The drop reason SKB_DROP_REASON_TC_EGRESS is introduced. Considering the code path of tc egerss, we make it distinct with the drop reason of SKB_DROP_REASON_QDISC_DROP in the next commit. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: macb: Fix lost RX packet wakeup race in NAPI receive	Robert Hancock	1	-1/+24
	There is an oddity in the way the RSR register flags propagate to the ISR register (and the actual interrupt output) on this hardware: it appears that RSR register bits only result in ISR being asserted if the interrupt was actually enabled at the time, so enabling interrupts with RSR bits already set doesn't trigger an interrupt to be raised. There was already a partial fix for this race in the macb_poll function where it checked for RSR bits being set and re-triggered NAPI receive. However, there was a still a race window between checking RSR and actually enabling interrupts, where a lost wakeup could happen. It's necessary to check again after enabling interrupts to see if RSR was set just prior to the interrupt being enabled, and re-trigger receive in that case. This issue was noticed in a point-to-point UDP request-response protocol which periodically saw timeouts or abnormally high response times due to received packets not being processed in a timely fashion. In many applications, more packets arriving, including TCP retransmissions, would cause the original packet to be processed, thus masking the issue. Fixes: 02f7a34f34e3 ("net: macb: Re-enable RX interrupt only when RX is done") Cc: stable@vger.kernel.org Co-developed-by: Scott McNutt <scott.mcnutt@siriusxm.com> Signed-off-by: Scott McNutt <scott.mcnutt@siriusxm.com> Signed-off-by: Robert Hancock <robert.hancock@calian.com> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	Merge branch 'netif_rx'	David S. Miller	20	-23/+23
	Sebastian Andrzej Siewior says: ==================== net: Convert user to netif_rx(). This is the first batch of converting netif_rx_ni() caller to netif_rx(). The change making this possible is net-next and netif_rx_ni() is a wrapper around netif_rx(). This is a clean up in order to remove netif_rx_ni(). Cc: Andrew Lunn <andrew@lunn.ch> Cc: bridge@lists.linux-foundation.org Cc: Chris Zankel <chris@zankel.net> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kurt Kanzenbach <kurt@linutronix.de> Cc: linux-doc@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: Łukasz Stelmach <l.stelmach@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mike Travis <mike.travis@hpe.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Robin Holt <robinmholt@gmail.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Steve Wahl <steve.wahl@hpe.com> Cc: UNGLinuxDriver@microchip.com Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: dev: Use netif_rx().	Sebastian Andrzej Siewior	1	-3/+3
	Since commit baebdf48c3600 ("net: dev: Makes sure netif_rx() can be invoked in any context.") the function netif_rx() can be used in preemptible/thread context as well as in interrupt context. Use netif_rx(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: bridge: Use netif_rx().	Sebastian Andrzej Siewior	1	-2/+2
	Since commit baebdf48c3600 ("net: dev: Makes sure netif_rx() can be invoked in any context.") the function netif_rx() can be used in preemptible/thread context as well as in interrupt context. Use netif_rx(). Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: bridge@lists.linux-foundation.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-04	net: macvlan: Use netif_rx().	Sebastian Andrzej Siewior	1	-1/+1
	Since commit baebdf48c3600 ("net: dev: Makes sure netif_rx() can be invoked in any context.") the function netif_rx() can be used in preemptible/thread context as well as in interrupt context. Use netif_rx(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>