summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2023-08-02ata,scsi: do not issue START STOP UNIT on resumeDamien Le Moal1-0/+1
During system resume, ata_port_pm_resume() triggers ata EH to 1) Resume the controller 2) Reset and rescan the ports 3) Revalidate devices This EH execution is started asynchronously from ata_port_pm_resume(), which means that when sd_resume() is executed, none or only part of the above processing may have been executed. However, sd_resume() issues a START STOP UNIT to wake up the drive from sleep mode. This command is translated to ATA with ata_scsi_start_stop_xlat() and issued to the device. However, depending on the state of execution of the EH process and revalidation triggerred by ata_port_pm_resume(), two things may happen: 1) The START STOP UNIT fails if it is received before the controller has been reenabled at the beginning of the EH execution. This is visible with error messages like: ata10.00: device reported invalid CHS sector 0 sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 sd 9:0:0:0: PM: failed to resume async: error -5 2) The START STOP UNIT command is received while the EH process is on-going, which mean that it is stopped and must wait for its completion, at which point the command is rather useless as the drive is already fully spun up already. This case results also in a significant delay in sd_resume() which is observable by users as the entire system resume completion is delayed. Given that ATA devices will be woken up by libata activity on resume, sd_resume() has no need to issue a START STOP UNIT command, which solves the above mentioned problems. Do not issue this command by introducing the new scsi_device flag no_start_on_resume and setting this flag to 1 in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP UNIT command only if this flag is not set. Reported-by: Paul Ausbeck <paula@soe.ucsc.edu> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880 Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Tanner Watkins <dalzot@gmail.com> Tested-by: Paul Ausbeck <paula@soe.ucsc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
2023-08-02virtio_net: support per queue interrupt coalesce commandGavin Li1-0/+14
Add interrupt_coalesce config in send_queue and receive_queue to cache user config. Send per virtqueue interrupt moderation config to underlying device in order to have more efficient interrupt moderation and cpu utilization of guest VM. Additionally, address all the VQs when updating the global configuration, as now the individual VQs configuration can diverge from the global configuration. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Heng Qi <hengqi@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20230731070656.96411-3-gavinl@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02Merge tag 'xfs-async-dio.6-2023-08-01' of git://git.kernel.dk/linux into ↵Darrick J. Wong1-2/+33
iomap-6.6-mergeA Improve iomap/xfs async dio write performance iomap always punts async dio write completions to a workqueue, which has a cost in terms of efficiency (now you need an unrelated worker to process it) and latency (now you're bouncing a completion through an async worker, which is a classic slowdown scenario). io_uring handles IRQ completions via task_work, and for writes that don't need to do extra IO at completion time, we can safely complete them inline from that. This patchset adds IOCB_DIO_CALLER_COMP, which an IO issuer can set to inform the completion side that any extra work that needs doing for that completion can be punted to a safe task context. The iomap dio completion will happen in hard/soft irq context, and we need a saner context to process these completions. IOCB_DIO_CALLER_COMP is added, which can be set in a struct kiocb->ki_flags by the issuer. If the completion side of the iocb handling understands this flag, it can choose to set a kiocb->dio_complete() handler and just call ki_complete from IRQ context. The issuer must then ensure that this callback is processed from a task. io_uring punts IRQ completions to task_work already, so it's trivial wire it up to run more of the completion before posting a CQE. This is good for up to a 37% improvement in throughput/latency for low queue depth IO, patch 5 has the details. If we need to do real work at completion time, iomap will clear the IOMAP_DIO_CALLER_COMP flag. This work came about when Andres tested low queue depth dio writes for postgres and compared it to doing sync dio writes, showing that the async processing slows us down a lot. * tag 'xfs-async-dio.6-2023-08-01' of git://git.kernel.dk/linux: iomap: support IOCB_DIO_CALLER_COMP io_uring/rw: add write support for IOCB_DIO_CALLER_COMP fs: add IOCB flags related to passing back dio completions iomap: add IOMAP_DIO_INLINE_COMP iomap: only set iocb->private for polled bio iomap: treat a write through cache the same as FUA iomap: use an unsigned type for IOMAP_DIO_* defines iomap: cleanup up iomap_dio_bio_end_io() Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-08-02fs: add IOCB flags related to passing back dio completionsJens Axboe1-2/+33
Async dio completions generally happen from hard/soft IRQ context, which means that users like iomap may need to defer some of the completion handling to a workqueue. This is less efficient than having the original issuer handle it, like we do for sync IO, and it adds latency to the completions. Add IOCB_DIO_CALLER_COMP, which the issuer can set if it is able to safely punt these completions to a safe context. If the dio handler is aware of this flag, assign a callback handler in kiocb->dio_complete and associated data io kiocb->private. The issuer will then call this handler with that data from task context. No functional changes in this patch. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-02inet6: Remove unused function declaration udpv6_connect()Yue Haibing1-2/+0
This is never implemented since the beginning of git history. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230731140437.37056-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-01swiotlb: search the software IO TLB only if the device makes use of itPetr Tesarik2-1/+8
Skip searching the software IO TLB if a device has never used it, making sure these devices are not affected by the introduction of multiple IO TLB memory pools. Additional memory barrier is required to ensure that the new value of the flag is visible to other CPUs after mapping a new bounce buffer. For efficiency, the flag check should be inlined, and then the memory barrier must be moved to is_swiotlb_buffer(). However, it can replace the existing barrier in swiotlb_find_pool(), because all callers use is_swiotlb_buffer() first to verify that the buffer address belongs to the software IO TLB. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-08-01swiotlb: allocate a new memory pool when existing pools are fullPetr Tesarik1-0/+8
When swiotlb_find_slots() cannot find suitable slots, schedule the allocation of a new memory pool. It is not possible to allocate the pool immediately, because this code may run in interrupt context, which is not suitable for large memory allocations. This means that the memory pool will be available too late for the currently requested mapping, but the stress on the software IO TLB allocator is likely to continue, and subsequent allocations will benefit from the additional pool eventually. Keep all memory pools for an allocator in an RCU list to avoid locking on the read side. For modifications, add a new spinlock to struct io_tlb_mem. The spinlock also protects updates to the total number of slabs (nslabs in struct io_tlb_mem), but not reads of the value. Readers may therefore encounter a stale value, but this is not an issue: - swiotlb_tbl_map_single() and is_swiotlb_active() only check for non-zero value. This is ensured by the existence of the default memory pool, allocated at boot. - The exact value is used only for non-critical purposes (debugfs, kernel messages). Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-08-01swiotlb: determine potential physical address limitPetr Tesarik1-0/+2
The value returned by default_swiotlb_limit() should be constant, because it is used to decide whether DMA can be used. To allow allocating memory pools on the fly, use the maximum possible physical address rather than the highest address used by the default pool. For swiotlb_init_remap(), this is either an arch-specific limit used by memblock_alloc_low(), or the highest directly mapped physical address if the initialization flags include SWIOTLB_ANY. For swiotlb_init_late(), the highest address is determined by the GFP flags. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-08-01swiotlb: if swiotlb is full, fall back to a transient memory poolPetr Tesarik3-1/+36
Try to allocate a transient memory pool if no suitable slots can be found and the respective SWIOTLB is allowed to grow. The transient pool is just enough big for this one bounce buffer. It is inserted into a per-device list of transient memory pools, and it is freed again when the bounce buffer is unmapped. Transient memory pools are kept in an RCU list. A memory barrier is required after adding a new entry, because any address within a transient buffer must be immediately recognized as belonging to the SWIOTLB, even if it is passed to another CPU. Deletion does not require any synchronization beyond RCU ordering guarantees. After a buffer is unmapped, its physical addresses may no longer be passed to the DMA API, so the memory range of the corresponding stale entry in the RCU list never matches. If the memory range gets allocated again, then it happens only after a RCU quiescent state. Since bounce buffers can now be allocated from different pools, add a parameter to swiotlb_alloc_pool() to let the caller know which memory pool is used. Add swiotlb_find_pool() to find the memory pool corresponding to an address. This function is now also used by is_swiotlb_buffer(), because a simple boundary check is no longer sufficient. The logic in swiotlb_alloc_tlb() is taken from __dma_direct_alloc_pages(), simplified and enhanced to use coherent memory pools if needed. Note that this is not the most efficient way to provide a bounce buffer, but when a DMA buffer can't be mapped, something may (and will) actually break. At that point it is better to make an allocation, even if it may be an expensive operation. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-08-01swiotlb: add a flag whether SWIOTLB is allowed to growPetr Tesarik1-0/+4
Add a config option (CONFIG_SWIOTLB_DYNAMIC) to enable or disable dynamic allocation of additional bounce buffers. If this option is set, mark the default SWIOTLB as able to grow and restricted DMA pools as unable. However, if the address of the default memory pool is explicitly queried, make the default SWIOTLB also unable to grow. This is currently used to set up PCI BAR movable regions on some Octeon MIPS boards which may not be able to use a SWIOTLB pool elsewhere in physical memory. See octeon_pci_setup() for more details. If a remap function is specified, it must be also called on any dynamically allocated pools, but there are some issues: - The remap function may block, so it should not be called from an atomic context. - There is no corresponding unremap() function if the memory pool is freed. - The only in-tree implementation (xen_swiotlb_fixup) requires that the number of slots in the memory pool is a multiple of SWIOTLB_SEGSIZE. Keep it simple for now and disable growing the SWIOTLB if a remap function was specified. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-08-01swiotlb: separate memory pool data from other allocator dataPetr Tesarik2-18/+29
Carve out memory pool specific fields from struct io_tlb_mem. The original struct now contains shared data for the whole allocator, while the new struct io_tlb_pool contains data that is specific to one memory pool of (potentially) many. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-08-01swiotlb: add documentation and rename swiotlb_do_find_slots()Petr Tesarik1-4/+11
Add some kernel-doc comments and move the existing documentation of struct io_tlb_slot to its correct location. The latter was forgotten in commit 942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c"). Use the opportunity to give swiotlb_do_find_slots() a more descriptive name and make it clear how it differs from swiotlb_find_slots(). Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-08-01swiotlb: make io_tlb_default_mem local to swiotlb.cPetr Tesarik1-1/+24
SWIOTLB implementation details should not be exposed to the rest of the kernel. This will allow to make changes to the implementation without modifying non-swiotlb code. To avoid breaking existing users, provide helper functions for the few required fields. As a bonus, using a helper function to initialize struct device allows to get rid of an #ifdef in driver core. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-08-01powercap: intel_rapl: Fix a sparse warning in TPMI interfaceZhang Rui1-4/+10
Depends on the interface used, the RAPL registers can be either MSR indexes or memory mapped IO addresses. Current RAPL common code uses u64 to save both MSR and memory mapped IO registers. With this, when handling register address with an __iomem annotation, it triggers a sparse warning like below: sparse warnings: (new ones prefixed by >>) >> drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected unsigned long long [usertype] *tpmi_rapl_regs @@ got void [noderef] __iomem * @@ drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: expected unsigned long long [usertype] *tpmi_rapl_regs drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: got void [noderef] __iomem * Fix the problem by using a union to save the registers instead. Suggested-by: David Laight <David.Laight@ACULAB.COM> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202307031405.dy3druuy-lkp@intel.com/ Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-01xfrm: don't skip free of empty state in acquire policyLeon Romanovsky1-0/+1
In destruction flow, the assignment of NULL to xso->dev caused to skip of xfrm_dev_state_free() call, which was called in xfrm_state_put(to_put) routine. Instead of open-coded variant of xfrm_dev_state_delete() and xfrm_dev_state_free(), let's use them directly. Fixes: f8a70afafc17 ("xfrm: add TX datapath support for IPsec packet offload mode") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-08-01net/sched: wrap open coded Qdics class filter counterPedro Tammela1-0/+26
The 'filter_cnt' counter is used to control a Qdisc class lifetime. Each filter referecing this class by its id will eventually increment/decrement this counter in their respective 'add/update/delete' routines. As these operations are always serialized under rtnl lock, we don't need an atomic type like 'refcount_t'. It also means that we lose the overflow/underflow checks already present in refcount_t, which are valuable to hunt down bugs where the unsigned counter wraps around as it aids automated tools like syzkaller to scream in such situations. Wrap the open coded increment/decrement into helper functions and add overflow checks to the operations. Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-01serial: core: Fix serial core port id to not use port->lineTony Lindgren1-0/+1
The serial core port id should be serial core controller specific port instance, which is not always the port->line index. For example, 8250 driver maps a number of legacy ports, and when a hardware specific device driver takes over, we typically have one driver instance for each port. Let's instead add port->port_id to keep track serial ports mapped to each serial core controller instance. Currently this is only a cosmetic issue for the serial core port device names. The issue can be noticed looking at /sys/bus/serial-base/devices for example though. Let's fix the issue to avoid port addressing issues later on. Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20230725054216.45696-3-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-01serial: core: Controller id cannot be negativeTony Lindgren1-1/+1
The controller id cannot be negative. Let's fix the ctrl_id in preparation for adding port_id to fix the device name. Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20230725054216.45696-2-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-01tcx: Fix splat during dev unregisterMartin KaFai Lau1-0/+16
During unregister_netdevice_many_notify(), the ordering of our concerned function calls is like this: unregister_netdevice_many_notify dev_shutdown qdisc_put clsact_destroy tcx_uninstall The syzbot reproducer triggered a case that the qdisc refcnt is not zero during dev_shutdown(). tcx_uninstall() will then WARN_ON_ONCE(tcx_entry(entry)->miniq_active) because the miniq is still active and the entry should not be freed. The latter assumed that qdisc destruction happens before tcx teardown. This fix is to avoid tcx_uninstall() doing tcx_entry_free() when the miniq is still alive and let the clsact_destroy() do the free later, so that we do not assume any specific ordering for either of them. If still active, tcx_uninstall() does clear the entry when flushing out the prog/link. clsact_destroy() will then notice the "!tcx_entry_is_active()" and then does the tcx_entry_free() eventually. Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") Reported-by: syzbot+376a289e86a0fd02b9ba@syzkaller.appspotmail.com Reported-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: syzbot+376a289e86a0fd02b9ba@syzkaller.appspotmail.com Tested-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/222255fe07cb58f15ee662e7ee78328af5b438e4.1690549248.git.daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-01tcp: Remove unused function declarationsYue Haibing1-3/+0
commit 8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") left behind tcp_bpf_get_proto() declaration. And tcp_v4_tw_remember_stamp() function is remove in ccb7c410ddc0 ("timewait_sock: Create and use getpeer op."). Since commit 686989700cab ("tcp: simplify tcp_mark_skb_lost") tcp_skb_mark_lost_uncond_verify() declaration is not used anymore. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230729122644.10648-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-01devlink: Remove unused extern declaration devlink_port_region_destroy()Yue Haibing1-2/+0
devlink_port_region_destroy() is never implemented since commit 544e7c33ec2f ("net: devlink: Add support for port regions"). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230728132113.32888-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31dma-contiguous: support per-numa CMA for all architecturesYajun Deng1-6/+0
In the commit b7176c261cdb ("dma-contiguous: provide the ability to reserve per-numa CMA"), Barry adds DMA_PERNUMA_CMA for ARM64. But this feature is architecture independent, so support per-numa CMA for all architectures, and enable it by default if NUMA. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Tested-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-07-31dma-mapping: move arch_dma_set_mask() declaration to headerArnd Bergmann1-0/+6
This function has a __weak definition and an override that is only used on freescale powerpc chips. The powerpc definition however does not see the declaration that is in a .c file: arch/powerpc/kernel/dma-mask.c:7:6: error: no previous prototype for 'arch_dma_set_mask' [-Werror=missing-prototypes] Move it into the linux/dma-map-ops.h header where the other arch_dma_* functions are declared. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-07-31xen/pci: add flag for PCI passthrough being possibleJuergen Gross1-0/+6
When running as a Xen PV guests passed through PCI devices only have a chance to work if the Xen supplied memory map has some PCI space reserved. Add a flag xen_pv_pci_possible which will be set in early boot in case the memory map has at least one area with the type E820_TYPE_RESERVED. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-07-31soc: qcom: geni-se: Add SPI Device mode support for GENI based QuPv3Praveen Talari1-0/+9
Add device mode supported registers and masks. Signed-off-by: Praveen Talari <quic_ptalari@quicinc.com> Reviewed-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com> Link: https://lore.kernel.org/r/20230714042203.14251-2-quic_ptalari@quicinc.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-07-31net: flow_dissector: Use 64bits for used_keysRatheesh Kannoth1-2/+3
As 32bits of dissector->used_keys are exhausted, increase the size to 64bits. This is base change for ESP/AH flow dissector patch. Please find patch and discussions at https://lore.kernel.org/netdev/ZMDNjD46BvZ5zp5I@corigine.com/T/#t Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Reviewed-by: Petr Machata <petrm@nvidia.com> # for mlxsw Tested-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-31spi: Merge up fixes from Linus' treeMark Brown14-48/+103
Gets us pine64plus back if nothing else.
2023-07-31regulator: Merge up fixes from Linus' treeMark Brown14-48/+103
Gets us pine64plus back if nothing else.
2023-07-31regmap: Merge up fixes from Linus' treeMark Brown14-48/+103
Gets us pine64plus back if nothing else.
2023-07-30Merge tag '6.5-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds1-1/+1
Pull smb client fixes from Steve French: "Four small SMB3 client fixes: - two reconnect fixes (to address the case where non-default iocharset gets incorrectly overridden at reconnect with the default charset) - fix for NTLMSSP_AUTH request setting a flag incorrectly) - Add missing check for invalid tlink (tree connection) in ioctl" * tag '6.5-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: add missing return value check for cifs_sb_tlink smb3: do not set NTLMSSP_VERSION flag for negotiate not auth request cifs: fix charset issue in reconnection fs/nls: make load_nls() take a const parameter
2023-07-30Merge tag 'trace-v6.5-rc3' of ↵Linus Torvalds1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix to /sys/kernel/tracing/per_cpu/cpu*/stats read and entries. If a resize shrinks the buffer it clears the read count to notify readers that they need to reset. But the read count is also used for accounting and this causes the numbers to be off. Instead, create a separate variable to use to notify readers to reset. - Fix the ref counts of the "soft disable" mode. The wrong value was used for testing if soft disable mode should be enabled or disable, but instead, just change the logic to do the enable and disable in place when the SOFT_MODE is set or cleared. - Several kernel-doc fixes - Removal of unused external declarations * tag 'trace-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix warning in trace_buffered_event_disable() ftrace: Remove unused extern declarations tracing: Fix kernel-doc warnings in trace_seq.c tracing: Fix kernel-doc warnings in trace_events_trigger.c tracing/synthetic: Fix kernel-doc warnings in trace_events_synth.c ring-buffer: Fix kernel-doc warnings in ring_buffer.c ring-buffer: Fix wrong stat of cpu_buffer->read
2023-07-29net: annotate data-races around sk->sk_markEric Dumazet3-6/+7
sk->sk_mark is often read while another thread could change the value. Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29net: gro: fix misuse of CB in udp socket lookupRichard Gobert1-0/+43
This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to `udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the device from CB. The fix changes it to fetch the device from `skb->dev`. l3mdev case requires special attention since it has a master and a slave device. Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Reported-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29bonding: 3ad: Remove unused declaration bond_3ad_update_lacp_active()YueHaibing1-1/+0
This is not used since commit 3a755cd8b7c6 ("bonding: add new option lacp_active") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20230726143816.15280-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-29Merge tag 'mm-hotfixes-stable-2023-07-28-15-52' of ↵Linus Torvalds3-8/+59
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "11 hotfixes. Five are cc:stable and the remainder address post-6.4 issues or aren't considered serious enough to justify backporting" * tag 'mm-hotfixes-stable-2023-07-28-15-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/memory-failure: fix hardware poison check in unpoison_memory() proc/vmcore: fix signedness bug in read_from_oldmem() mailmap: update remaining active codeaurora.org email addresses mm: lock VMA in dup_anon_vma() before setting ->anon_vma mm: fix memory ordering for mm_lock_seq and vm_lock_seq scripts/spelling.txt: remove 'thead' as a typo mm/pagewalk: fix EFI_PGT_DUMP of espfix area shmem: minor fixes to splice-read implementation tmpfs: fix Documentation of noswap and huge mount options Revert "um: Use swap() to make code cleaner" mm/damon/core-test: initialise context before test in damon_test_set_attrs()
2023-07-29Merge tag 'thermal-6.5-rc4' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "Constify thermal_zone_device_register() parameters, which was omitted by mistake, and fix a double free on thermal zone unregistration in the generic DT thermal driver (Ahmad Fatoum)" * tag 'thermal-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: of: fix double-free on unregistration thermal: core: constify params in thermal_zone_device_register
2023-07-29Merge tag 'pm-6.5-rc4' of ↵Linus Torvalds1-10/+0
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Fix the arming of wakeup IRQs in the generic wakeup IRQ code (wakeirq), drop unused functions from it and fix up a driver using it and trying to work around the IRQ arming issue in a questionable way (Johan Hovold)" * tag 'pm-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: serial: qcom-geni: drop bogus runtime pm state update PM: sleep: wakeirq: drop unused enable helpers PM: sleep: wakeirq: fix wake irq arming
2023-07-29ftrace: Remove unused extern declarationsYueHaibing1-4/+0
commit 6a9c981b1e96 ("ftrace: Remove unused function ftrace_arch_read_dyn_info()") left ftrace_arch_read_dyn_info() extern declaration. And commit 1d74f2a0f64b ("ftrace: remove ftrace_ip_converted()") leave ftrace_ip_converted() declaration. Link: https://lore.kernel.org/linux-trace-kernel/20230725134808.9716-1-yuehaibing@huawei.com Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-29netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter linkDaniel Xu1-0/+5
This commit adds support for enabling IP defrag using pre-existing netfilter defrag support. Basically all the flag does is bump a refcnt while the link the active. Checks are also added to ensure the prog requesting defrag support is run _after_ netfilter defrag hooks. We also take care to avoid any issues w.r.t. module unloading -- while defrag is active on a link, the module is prevented from unloading. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/5cff26f97e55161b7d56b09ddcf5f8888a5add1d.1689970773.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-29netfilter: defrag: Add glue hooks for enabling/disabling defragDaniel Xu1-0/+10
We want to be able to enable/disable IP packet defrag from core bpf/netfilter code. In other words, execute code from core that could possibly be built as a module. To help avoid symbol resolution errors, use glue hooks that the modules will register callbacks with during module init. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/f6a8824052441b72afe5285acedbd634bd3384c1.1689970773.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-29Merge branch 'in-kernel-support-for-the-tls-alert-protocol'Jakub Kicinski4-4/+233
Chuck Lever says: ==================== In-kernel support for the TLS Alert protocol IMO the kernel doesn't need user space (ie, tlshd) to handle the TLS Alert protocol. Instead, a set of small helper functions can be used to handle sending and receiving TLS Alerts for in-kernel TLS consumers. ==================== Merged on top of a tag in case it's needed in the NFS tree. Link: https://lore.kernel.org/r/169047923706.5241.1181144206068116926.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-29net/handshake: Trace events for TLS Alert helpersChuck Lever1-0/+160
Add observability for the new TLS Alert infrastructure. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/169047947409.5241.14548832149596892717.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-29net/handshake: Add helpers for parsing incoming TLS AlertsChuck Lever1-0/+4
Kernel TLS consumers can replace common TLS Alert parsing code with these helpers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/169047942074.5241.13791647439480672048.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-29net/handshake: Add API for sending TLS Closure alertsChuck Lever1-0/+1
This helper sends an alert only if a TLS session was established. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/169047936730.5241.618595693821012638.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-29net/tls: Add TLS Alert definitionsChuck Lever1-0/+42
I'm about to add support for kernel handshake API consumers to send TLS Alerts, so introduce the needed protocol definitions in the new header tls_prot.h. This presages support for Closure alerts. Also, support for alerts is a pre-requite for handling session re-keying, where one peer will signal the need for a re-key by sending a TLS Alert. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/169047934064.5241.8377890858495063518.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-29net/tls: Move TLS protocol elements to a separate headerChuck Lever2-4/+26
Kernel TLS consumers will need definitions of various parts of the TLS protocol, but often do not need the function declarations and other infrastructure provided in <net/tls.h>. Break out existing standardized protocol elements into a separate header, and make room for a few more elements in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/169047931374.5241.7713175865185969309.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28Merge tag 'mlx5-updates-2023-07-24' of ↵Jakub Kicinski1-13/+14
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-07-24 1) Generalize devcom implementation to be independent of number of ports or device's GUID. 2) Save memory on command interface statistics. 3) General code cleanups * tag 'mlx5-updates-2023-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Give esw_offloads_load/unload_rep() "mlx5_" prefix net/mlx5: Make mlx5_eswitch_load/unload_vport() static net/mlx5: Make mlx5_esw_offloads_rep_load/unload() static net/mlx5: Remove pointless devlink_rate checks net/mlx5: Don't check vport->enabled in port ops net/mlx5e: Make flow classification filters static net/mlx5e: Remove duplicate code for user flow net/mlx5: Allocate command stats with xarray net/mlx5: split mlx5_cmd_init() to probe and reload routines net/mlx5: Remove redundant cmdif revision check net/mlx5: Re-organize mlx5_cmd struct net/mlx5e: E-Switch, Allow devcom initialization on more vports net/mlx5e: E-Switch, Register devcom device with switch id key net/mlx5: Devcom, Infrastructure changes net/mlx5: Use shared code for checking lag is supported ==================== Link: https://lore.kernel.org/r/20230727183914.69229-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28net: change accept_ra_min_rtr_lft to affect all RA lifetimesPatrick Rohr2-2/+2
accept_ra_min_rtr_lft only considered the lifetime of the default route and discarded entire RAs accordingly. This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and applies the value to individual RA sections; in particular, router lifetime, PIO preferred lifetime, and RIO lifetime. If any of those lifetimes are lower than the configured value, the specific RA section is ignored. In order for the sysctl to be useful to Android, it should really apply to all lifetimes in the RA, since that is what determines the minimum frequency at which RAs must be processed by the kernel. Android uses hardware offloads to drop RAs for a fraction of the minimum of all lifetimes present in the RA (some networks have very frequent RAs (5s) with high lifetimes (2h)). Despite this, we have encountered networks that set the router lifetime to 30s which results in very frequent CPU wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the WiFi firmware) entirely on such networks, it seems better to ignore the misconfigured routers while still processing RAs from other IPv6 routers on the same network (i.e. to support IoT applications). The previous implementation dropped the entire RA based on router lifetime. This turned out to be hard to expand to the other lifetimes present in the RA in a consistent manner; dropping the entire RA based on RIO/PIO lifetimes would essentially require parsing the whole thing twice. Fixes: 1671bcfd76fd ("net: add sysctl accept_ra_min_rtr_lft") Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230726230701.919212-1-prohr@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28net: convert some netlink netdev iterators to depend on the xarrayJakub Kicinski1-0/+3
Reap the benefits of easier iteration thanks to the xarray. Convert just the genetlink ones, those are easier to test. Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230726185530.2247698-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28net: store netdevs in an xarrayJakub Kicinski1-1/+3
Iterating over the netdev hash table for netlink dumps is hard. Dumps are done in "chunks" so we need to save the position after each chunk, so we know where to restart from. Because netdevs are stored in a hash table we remember which bucket we were in and how many devices we dumped. Since we don't hold any locks across the "chunks" - devices may come and go while we're dumping. If that happens we may miss a device (if device is deleted from the bucket we were in). We indicate to user space that this may have happened by setting NLM_F_DUMP_INTR. User space is supposed to dump again (I think) if it sees that. Somehow I doubt most user space gets this right.. To illustrate let's look at an example: System state: start: # [A, B, C] del: B # [A, C] with the hash table we may dump [A, B], missing C completely even tho it existed both before and after the "del B". Add an xarray and use it to allocate ifindexes. This way we can iterate ifindexes in order, without the worry that we'll skip one. We may still generate a dump of a state which "never existed", for example for a set of values and sequence of ops: System state: start: # [A, B] add: C # [A, C, B] del: B # [A, C] we may generate a dump of [A], if C got an index between A and B. System has never been in such state. But I'm 90% sure that's perfectly fine, important part is that we can't _miss_ devices which exist before and after. User space which wants to mirror kernel's state subscribes to notifications and does periodic dumps so it will know that C exists from the notification about its creation or from the next dump (next dump is _guaranteed_ to include C, if it doesn't get removed). To avoid any perf regressions keep the hash table for now. Most net namespaces have very few devices and microbenchmarking 1M lookups on Skylake I get the following results (not counting loopback to number of devs): #devs | hash | xa | delta 2 | 18.3 | 20.1 | + 9.8% 16 | 18.3 | 20.1 | + 9.5% 64 | 18.3 | 26.3 | +43.8% 128 | 20.4 | 26.3 | +28.6% 256 | 20.0 | 26.4 | +32.1% 1024 | 26.6 | 26.7 | + 0.2% 8192 |541.3 | 33.5 | -93.8% No surprises since the hash table has 256 entries. The microbenchmark scans indexes in order, if the pattern is more random xa starts to win at 512 devices already. But that's a lot of devices, in practice. Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230726185530.2247698-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>