summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2024-02-04iio: commom: st_sensors: ensure proper DMA alignmentNuno Sa1-2/+2
Aligning the buffer to the L1 cache is not sufficient in some platforms as they might have larger cacheline sizes for caches after L1 and thus, we can't guarantee DMA safety. That was the whole reason to introduce IIO_DMA_MINALIGN in [1]. Do the same for st_sensors common buffer. While at it, moved the odr_lock before buffer_data as we definitely don't want any other data to share a cacheline with the buffer. [1]: https://lore.kernel.org/linux-iio/20220508175712.647246-2-jic23@kernel.org/ Fixes: e031d5f558f1 ("iio:st_sensors: remove buffer allocation at each buffer enable") Signed-off-by: Nuno Sa <nuno.sa@analog.com> Cc: <Stable@vger.kernel.org> Link: https://lore.kernel.org/r/20240131-dev_dma_safety_stm-v2-1-580c07fae51b@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-02-04Merge tag 'dmaengine-fix-6.8' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: "Core: - fix return value of is_slave_direction() for D2D dma Driver fixes for: - Documentaion fixes to resolve warnings for at_hdmac driver - bunch of fsl driver fixes for memory leaks, and useless kfree - TI edma and k3 fixes for packet error and null pointer checks" * tag 'dmaengine-fix-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine: at_hdmac: add missing kernel-doc style description dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV dmaengine: fsl-qdma: Remove a useless devm_kfree() dmaengine: fsl-qdma: Fix a memory leak related to the queue command DMA dmaengine: fsl-qdma: Fix a memory leak related to the status queue DMA dmaengine: ti: k3-udma: Report short packet errors dmaengine: ti: edma: Add some null pointer checks to the edma_probe dmaengine: fsl-dpaa2-qdma: Fix the size of dma pools dmaengine: at_hdmac: fix some kernel-doc warnings
2024-02-03cxl/cper: Fix errant CPER prints for CXL eventsIra Weiny1-0/+23
Jonathan reports that CXL CPER events dump an extra generic error message. {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 {1}[Hardware Error]: event severity: recoverable {1}[Hardware Error]: Error 0, type: recoverable {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6 {1}[Hardware Error]: section length: 0x90 {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................ {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................ ... CXL events were rerouted though the CXL subsystem for additional processing. However, when that work was done it was missed that cper_estatus_print_section() continued with a generic error message which is confusing. Teach CPER print code to ignore printing details of some section types. Assign the CXL event GUIDs to this set to prevent confusing unknown prints. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-02-02Merge tag 'pci-v6.8-fixes-1' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Fix a potential deadlock that was reintroduced by an ASPM revert merged for v6.8 (Johan Hovold) - Add Manivannan Sadhasivam as PCI Endpoint maintainer (Lorenzo Pieralisi) * tag 'pci-v6.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: MAINTAINERS: Add Manivannan Sadhasivam as PCI Endpoint maintainer PCI/ASPM: Fix deadlock when enabling ASPM
2024-02-02Merge tag 'block-6.8-2024-02-01' of git://git.kernel.dk/linuxLinus Torvalds1-8/+2
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Remove duplicated enums (Guixen) - Use appropriate controller state accessors (Keith) - Retryable authentication (Hannes) - Add missing module descriptions (Chaitanya) - Fibre-channel fixes for blktests (Daniel) - Various type correctness updates (Caleb) - Improve fabrics connection debugging prints (Nitin) - Passthrough command verbose error logging (Adam) - Fix for where we set IO priority in the bio for drivers that use fops->submit_bio() to queue IO, like md/dm etc. * tag 'block-6.8-2024-02-01' of git://git.kernel.dk/linux: (32 commits) block: Fix where bio IO priority gets set nvme: allow passthru cmd error logging nvme-fc: show hostnqn when connecting to fc target nvme-rdma: show hostnqn when connecting to rdma target nvme-tcp: show hostnqn when connecting to tcp target nvmet-fc: use RCU list iterator for assoc_list nvmet-fc: take ref count on tgtport before delete assoc nvmet-fc: avoid deadlock on delete association path nvmet-fc: abort command when there is no binding nvmet-fc: do not tack refs on tgtports from assoc nvmet-fc: remove null hostport pointer check nvmet-fc: hold reference on hostport match nvmet-fc: free queue and assoc directly nvmet-fc: defer cleanup using RCU properly nvmet-fc: release reference on target port nvmet-fcloop: swap the list_add_tail arguments nvme-fc: do not wait in vain when unloading module nvme-fc: log human-readable opcode on timeout nvme: split out fabrics version of nvme_opcode_str() nvme: take const cmd pointer in read-only helpers ...
2024-02-02pidfd: implement PIDFD_THREAD flag for pidfd_open()Oleg Nesterov1-1/+2
With this flag: - pidfd_open() doesn't require that the target task must be a thread-group leader - pidfd_poll() succeeds when the task exits and becomes a zombie (iow, passes exit_notify()), even if it is a leader and thread-group is not empty. This means that the behaviour of pidfd_poll(PIDFD_THREAD, pid-of-group-leader) is not well defined if it races with exec() from its sub-thread; pidfd_poll() can succeed or not depending on whether pidfd_task_exited() is called before or after exchange_tids(). Perhaps we can improve this behaviour later, pidfd_poll() can probably take sig->group_exec_task into account. But this doesn't really differ from the case when the leader exits before other threads (so pidfd_poll() succeeds) and then another thread execs and pidfd_poll() will block again. thread_group_exited() is no longer used, perhaps it can die. Co-developed-by: Tycho Andersen <tycho@tycho.pizza> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20240131132602.GA23641@redhat.com Tested-by: Tycho Andersen <tandersen@netflix.com> Reviewed-by: Tycho Andersen <tandersen@netflix.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-02pidfd: cleanup the usage of __pidfd_prepare's flagsOleg Nesterov1-1/+0
- make pidfd_create() static. - Don't pass O_RDWR | O_CLOEXEC to __pidfd_prepare() in copy_process(), __pidfd_prepare() adds these flags unconditionally. - Kill the flags check in __pidfd_prepare(). sys_pidfd_open() checks the flags itself, all other users of pidfd_prepare() pass flags = 0. If we need a sanity check for those other in kernel users then WARN_ON_ONCE(flags & ~PIDFD_NONBLOCK) makes more sense. - Don't pass O_RDWR to get_unused_fd_flags(), it ignores everything except O_CLOEXEC. - Don't pass O_CLOEXEC to anon_inode_getfile(), it ignores everything except O_ACCMODE | O_NONBLOCK. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20240125161734.GA778@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-02filelock: fl_pid field should be signed intJeff Layton1-1/+1
This field has been unsigned for a very long time, but most users of the struct file_lock and the file locking internals themselves treat it as a signed value. Change it to be pid_t (which is a signed int). Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-1-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-01Merge tag 'net-6.8-rc3' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. As Paolo promised we continue to hammer out issues in our selftests. This is not the end but probably the peak. Current release - regressions: - smc: fix incorrect SMC-D link group matching logic Current release - new code bugs: - eth: bnxt: silence WARN() when device skips a timestamp, it happens Previous releases - regressions: - ipmr: fix null-deref when forwarding mcast packets - conntrack: evaluate window negotiation only for packets in the REPLY direction, otherwise SYN retransmissions trigger incorrect window scale negotiation - ipset: fix performance regression in swap operation Previous releases - always broken: - tcp: add sanity checks to types of pages getting into the rx zerocopy path, we only support basic NIC -> user, no page cache pages etc. - ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv() - nt_tables: more input sanitization changes - dsa: mt7530: fix 10M/100M speed on MediaTek MT7988 switch - bridge: mcast: fix loss of snooping after long uptime, jiffies do wrap on 32bit - xen-netback: properly sync TX responses, protect with locking - phy: mediatek-ge-soc: sync calibration values with MediaTek SDK, increase connection stability - eth: pds: fixes for various teardown, and reset races Misc: - hsr: silence WARN() if we can't alloc supervision frame, it happens" * tag 'net-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (82 commits) doc/netlink/specs: Add missing attr in rt_link spec idpf: avoid compiler padding in virtchnl2_ptype struct selftests: mptcp: join: stop transfer when check is done (part 2) selftests: mptcp: join: stop transfer when check is done (part 1) selftests: mptcp: allow changing subtests prefix selftests: mptcp: decrease BW in simult flows selftests: mptcp: increase timeout to 30 min selftests: mptcp: add missing kconfig for NF Mangle selftests: mptcp: add missing kconfig for NF Filter in v6 selftests: mptcp: add missing kconfig for NF Filter mptcp: fix data re-injection from stale subflow selftests: net: enable some more knobs selftests: net: add missing config for NF_TARGET_TTL selftests: forwarding: List helper scripts in TEST_FILES Makefile variable selftests: net: List helper scripts in TEST_FILES Makefile variable selftests: net: Remove executable bits from library scripts selftests: bonding: Check initial state selftests: team: Add missing config options hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove xen-netback: properly sync TX responses ...
2024-02-01Merge tag 'hid-for-linus-2024020101' of ↵Linus Torvalds1-11/+0
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Benjamin Tissoires: - cleanups in the error path in hid-steam (Dan Carpenter) - fixes for Wacom tablets selftests that sneaked in while the CI was taking a break during the year end holidays (Benjamin Tissoires) - null pointer check in nvidia-shield (Kunwu Chan) - memory leak fix in hidraw (Su Hui) - another null pointer fix in i2c-hid-of (Johan Hovold) - another memory leak fix in HID-BPF this time, as well as a double fdget() fix reported by Dan Carpenter (Benjamin Tissoires) - fix for Cirque touchpad when they go on suspend (Kai-Heng Feng) - new device ID in hid-logitech-hidpp: "Logitech G Pro X SuperLight 2" (Jiri Kosina) * tag 'hid-for-linus-2024020101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: bpf: use __bpf_kfunc instead of noinline HID: bpf: actually free hdev memory after attaching a HID-BPF program HID: bpf: remove double fdget() HID: i2c-hid-of: fix NULL-deref on failed power up HID: hidraw: fix a problem of memory leak in hidraw_release() HID: i2c-hid: Skip SET_POWER SLEEP for Cirque touchpad on system suspend HID: nvidia-shield: Add missing null pointer checks to LED initialization HID: logitech-hidpp: add support for Logitech G Pro X Superlight 2 selftests/hid: wacom: fix confidence tests HID: hid-steam: Fix cleanup in probe() HID: hid-steam: remove pointless error message
2024-02-01Merge tag 'lsm-pr-20240131' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm fixes from Paul Moore: "Two small patches to fix some problems relating to LSM hook return values and how the individual LSMs interact" * tag 'lsm-pr-20240131' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: fix default return value of the socket_getpeersec_*() hooks lsm: fix the logic in security_inode_getsecctx()
2024-02-01Merge tag 'nf-24-01-31' of ↵Jakub Kicinski1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) TCP conntrack now only evaluates window negotiation for packets in the REPLY direction, from Ryan Schaefer. Otherwise SYN retransmissions trigger incorrect window scale negotiation. From Ryan Schaefer. 2) Restrict tunnel objects to NFPROTO_NETDEV which is where it makes sense to use this object type. 3) Fix conntrack pick up from the middle of SCTP_CID_SHUTDOWN_ACK packets. From Xin Long. 4) Another attempt from Jozsef Kadlecsik to address the slow down of the swap command in ipset. 5) Replace a BUG_ON by WARN_ON_ONCE in nf_log, and consolidate check for the case that the logger is NULL from the read side lock section. 6) Address lack of sanitization for custom expectations. Restrict layer 3 and 4 families to what it is supported by userspace. * tag 'nf-24-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger netfilter: ipset: fix performance regression in swap operation netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV netfilter: conntrack: correct window scaling with retransmitted SYN ==================== Link: https://lore.kernel.org/r/20240131225943.7536-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01iomap: pass the length of the dirty region to ->map_blocksChristoph Hellwig1-1/+1
Let the file system know how much dirty data exists at the passed in offset. This allows file systems to allocate the right amount of space that actually is written back if they can't eagerly convert (e.g. because they don't support unwritten extents). Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231207072710.176093-15-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-01iomap: map multiple blocks at a timeChristoph Hellwig1-0/+7
The ->map_blocks interface returns a valid range for writeback, but we still call back into it for every block, which is a bit inefficient. Change iomap_writepage_map to use the valid range in the map until the end of the folio or the dirty range inside the folio instead of calling back into every block. Note that the range is not used over folio boundaries as we need to be able to check the mapping sequence count under the folio lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231207072710.176093-14-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-01iomap: don't chain biosChristoph Hellwig1-2/+6
Back in the days when a single bio could only be filled to the hardware limits, and we scheduled a work item for each bio completion, chaining multiple bios for a single ioend made a lot of sense to reduce the number of completions. But these days bios can be filled until we reach the number of vectors or total size limit, which means we can always fit at least 1 megabyte worth of data in the worst case, but usually a lot more due to large folios. The only thing bio chaining is buying us now is to reduce the size of the allocation from an ioend with an embedded bio into a plain bio, which is a 52 bytes differences on 64-bit systems. This is not worth the added complexity, so remove the bio chaining and only use the bio embedded into the ioend. This will help to simplify further changes to the iomap writeback code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231207072710.176093-10-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-01iomap: move the io_folios field out of struct iomap_ioendChristoph Hellwig1-1/+1
The io_folios member in struct iomap_ioend counts the number of folios added to an ioend. It is only used at submission time and can thus be moved to iomap_writepage_ctx instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231207072710.176093-4-hch@lst.de Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-01nvme: take const cmd pointer in read-only helpersCaleb Sander1-2/+2
nvme_is_fabrics() and nvme_is_write() only read struct nvme_command, so take it by const pointer. This allows callers to pass a const pointer and communicates that these functions don't modify the command. Signed-off-by: Caleb Sander <csander@purestorage.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-02-01netfilter: ipset: fix performance regression in swap operationJozsef Kadlecsik1-0/+4
The patch "netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test", commit 28628fa9 fixes a race condition. But the synchronize_rcu() added to the swap function unnecessarily slows it down: it can safely be moved to destroy and use call_rcu() instead. Eric Dumazet pointed out that simply calling the destroy functions as rcu callback does not work: sets with timeout use garbage collectors which need cancelling at destroy which can wait. Therefore the destroy functions are split into two: cancelling garbage collectors safely at executing the command received by netlink and moving the remaining part only into the rcu callback. Link: https://lore.kernel.org/lkml/C0829B10-EAA6-4809-874E-E1E9C05A8D84@automattic.com/ Fixes: 28628fa952fe ("netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test") Reported-by: Ale Crismani <ale.crismani@automattic.com> Reported-by: David Wang <00107082@163.com> Tested-by: David Wang <00107082@163.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-31PCI/ASPM: Fix deadlock when enabling ASPMJohan Hovold1-0/+5
A last minute revert in 6.7-final introduced a potential deadlock when enabling ASPM during probe of Qualcomm PCIe controllers as reported by lockdep: ============================================ WARNING: possible recursive locking detected 6.7.0 #40 Not tainted -------------------------------------------- kworker/u16:5/90 is trying to acquire lock: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc but task is already holding lock: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(pci_bus_sem); lock(pci_bus_sem); *** DEADLOCK *** Call trace: print_deadlock_bug+0x25c/0x348 __lock_acquire+0x10a4/0x2064 lock_acquire+0x1e8/0x318 down_read+0x60/0x184 pcie_aspm_pm_state_change+0x58/0xdc pci_set_full_power_state+0xa8/0x114 pci_set_power_state+0xc4/0x120 qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom] pci_walk_bus+0x64/0xbc qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom] The deadlock can easily be reproduced on machines like the Lenovo ThinkPad X13s by adding a delay to increase the race window during asynchronous probe where another thread can take a write lock. Add a new pci_set_power_state_locked() and associated helper functions that can be called with the PCI bus semaphore held to avoid taking the read lock twice. Link: https://lore.kernel.org/r/ZZu0qx2cmn7IwTyQ@hovoldconsulting.com Link: https://lore.kernel.org/r/20240130100243.11011-1-johan+linaro@kernel.org Fixes: f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: <stable@vger.kernel.org> # 6.7
2024-01-31HID: bpf: use __bpf_kfunc instead of noinlineBenjamin Tissoires1-11/+0
Follow the docs at Documentation/bpf/kfuncs.rst: - declare the function with `__bpf_kfunc` - disables missing prototype warnings, which allows to remove them from include/linux/hid-bpf.h Removing the prototypes is not an issue because we currently have to redeclare them when writing the BPF program. They will eventually be generated by bpftool directly AFAIU. Link: https://lore.kernel.org/r/20240124-b4-hid-bpf-fixes-v2-3-052520b1e5e6@kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2024-01-31IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not ↵Mark Zhang1-1/+1
supported debugfs entries for RRoCE general CC parameters must be exposed only when they are supported, otherwise when accessing them there may be a syndrome error in kernel log, for example: $ cat /sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp cat: '/sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp': Invalid argument $ dmesg mlx5_core 0000:08:00.1: mlx5_cmd_out_err:805:(pid 1253): QUERY_CONG_PARAMS(0x824) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x325a82), err(-22) Fixes: 66fb1d5df6ac ("IB/mlx5: Extend debug control for CC parameters") Reviewed-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/e7ade70bad52b7468bdb1de4d41d5fad70c8b71c.1706433934.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-01-31RDMA/mlx5: Fix fortify source warning while accessing Eth segmentLeon Romanovsky1-1/+4
------------[ cut here ]------------ memcpy: detected field-spanning write (size 56) of single field "eseg->inline_hdr.start" at /var/lib/dkms/mlnx-ofed-kernel/5.8/build/drivers/infiniband/hw/mlx5/wr.c:131 (size 2) WARNING: CPU: 0 PID: 293779 at /var/lib/dkms/mlnx-ofed-kernel/5.8/build/drivers/infiniband/hw/mlx5/wr.c:131 mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] Modules linked in: 8021q garp mrp stp llc rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) mlx5_core(OE) pci_hyperv_intf mlxdevm(OE) mlx_compat(OE) tls mlxfw(OE) psample nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink mst_pciconf(OE) knem(OE) vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd irqbypass cuse nfsv3 nfs fscache netfs xfrm_user xfrm_algo ipmi_devintf ipmi_msghandler binfmt_misc crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 snd_pcsp aesni_intel crypto_simd cryptd snd_pcm snd_timer joydev snd soundcore input_leds serio_raw evbug nfsd auth_rpcgss nfs_acl lockd grace sch_fq_codel sunrpc drm efi_pstore ip_tables x_tables autofs4 psmouse virtio_net net_failover failover floppy [last unloaded: mlx_compat(OE)] CPU: 0 PID: 293779 Comm: ssh Tainted: G OE 6.2.0-32-generic #32~22.04.1-Ubuntu Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] Code: 0c 01 00 a8 01 75 25 48 8b 75 a0 b9 02 00 00 00 48 c7 c2 10 5b fd c0 48 c7 c7 80 5b fd c0 c6 05 57 0c 03 00 01 e8 95 4d 93 da <0f> 0b 44 8b 4d b0 4c 8b 45 c8 48 8b 4d c0 e9 49 fb ff ff 41 0f b7 RSP: 0018:ffffb5b48478b570 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb5b48478b628 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffb5b48478b5e8 R13: ffff963a3c609b5e R14: ffff9639c3fbd800 R15: ffffb5b480475a80 FS: 00007fc03b444c80(0000) GS:ffff963a3dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556f46bdf000 CR3: 0000000006ac6003 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? show_regs+0x72/0x90 ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] ? __warn+0x8d/0x160 ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] ? report_bug+0x1bb/0x1d0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x19/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib] mlx5_ib_post_send_nodrain+0xb/0x20 [mlx5_ib] ipoib_send+0x2ec/0x770 [ib_ipoib] ipoib_start_xmit+0x5a0/0x770 [ib_ipoib] dev_hard_start_xmit+0x8e/0x1e0 ? validate_xmit_skb_list+0x4d/0x80 sch_direct_xmit+0x116/0x3a0 __dev_xmit_skb+0x1fd/0x580 __dev_queue_xmit+0x284/0x6b0 ? _raw_spin_unlock_irq+0xe/0x50 ? __flush_work.isra.0+0x20d/0x370 ? push_pseudo_header+0x17/0x40 [ib_ipoib] neigh_connected_output+0xcd/0x110 ip_finish_output2+0x179/0x480 ? __smp_call_single_queue+0x61/0xa0 __ip_finish_output+0xc3/0x190 ip_finish_output+0x2e/0xf0 ip_output+0x78/0x110 ? __pfx_ip_finish_output+0x10/0x10 ip_local_out+0x64/0x70 __ip_queue_xmit+0x18a/0x460 ip_queue_xmit+0x15/0x30 __tcp_transmit_skb+0x914/0x9c0 tcp_write_xmit+0x334/0x8d0 tcp_push_one+0x3c/0x60 tcp_sendmsg_locked+0x2e1/0xac0 tcp_sendmsg+0x2d/0x50 inet_sendmsg+0x43/0x90 sock_sendmsg+0x68/0x80 sock_write_iter+0x93/0x100 vfs_write+0x326/0x3c0 ksys_write+0xbd/0xf0 ? do_syscall_64+0x69/0x90 __x64_sys_write+0x19/0x30 do_syscall_64+0x59/0x90 ? do_user_addr_fault+0x1d0/0x640 ? exit_to_user_mode_prepare+0x3b/0xd0 ? irqentry_exit_to_user_mode+0x9/0x20 ? irqentry_exit+0x43/0x50 ? exc_page_fault+0x92/0x1b0 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7fc03ad14a37 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffdf8697fe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000008024 RCX: 00007fc03ad14a37 RDX: 0000000000008024 RSI: 0000556f46bd8270 RDI: 0000000000000003 RBP: 0000556f46bb1800 R08: 0000000000007fe3 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 0000556f46bc66b0 R14: 000000000000000a R15: 0000556f46bb2f50 </TASK> ---[ end trace 0000000000000000 ]--- Link: https://lore.kernel.org/r/8228ad34bd1a25047586270f7b1fb4ddcd046282.1706433934.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-01-31lsm: fix default return value of the socket_getpeersec_*() hooksOndrej Mosnacek1-2/+2
For these hooks the true "neutral" value is -EOPNOTSUPP, which is currently what is returned when no LSM provides this hook and what LSMs return when there is no security context set on the socket. Correct the value in <linux/lsm_hooks.h> and adjust the dispatch functions in security/security.c to avoid issues when the BPF LSM is enabled. Cc: stable@vger.kernel.org Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-01-30dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEVFrank Li1-1/+2
is_slave_direction() should return true when direction is DMA_DEV_TO_DEV. Fixes: 49920bc66984 ("dmaengine: add new enum dma_transfer_direction") Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240123172842.3764529-1-Frank.Li@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-01-30Merge tag 'mm-hotfixes-stable-2024-01-28-23-21' of ↵Linus Torvalds2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "22 hotfixes. 11 are cc:stable and the remainder address post-6.7 issues or aren't considered appropriate for backporting" * tag 'mm-hotfixes-stable-2024-01-28-23-21' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) mm: thp_get_unmapped_area must honour topdown preference mm: huge_memory: don't force huge page alignment on 32 bit userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb selftests/mm: ksm_tests should only MADV_HUGEPAGE valid memory scs: add CONFIG_MMU dependency for vfree_atomic() mm/memory: fix folio_set_dirty() vs. folio_mark_dirty() in zap_pte_range() mm/huge_memory: fix folio_set_dirty() vs. folio_mark_dirty() selftests/mm: Update va_high_addr_switch.sh to check CPU for la57 flag selftests: mm: fix map_hugetlb failure on 64K page size systems MAINTAINERS: supplement of zswap maintainers update stackdepot: make fast paths lock-less again stackdepot: add stats counters exported via debugfs mm, kmsan: fix infinite recursion due to RCU critical section mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again selftests/mm: switch to bash from sh MAINTAINERS: add man-pages git trees mm: memcontrol: don't throttle dying tasks on memory.high mm: mmap: map MAP_STACK to VM_NOHUGEPAGE uprobes: use pagesize-aligned virtual address when replacing pages selftests/mm: mremap_test: fix build warning ...
2024-01-29workqueue: Implement system-wide nr_active enforcement for unbound workqueuesTejun Heo1-3/+32
A pool_workqueue (pwq) represents the connection between a workqueue and a worker_pool. One of the roles that a pwq plays is enforcement of the max_active concurrency limit. Before 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues"), there was one pwq per each CPU for per-cpu workqueues and per each NUMA node for unbound workqueues, which was a natural result of per-cpu workqueues being served by per-cpu pools and unbound by per-NUMA pools. In terms of max_active enforcement, this was, while not perfect, workable. For per-cpu workqueues, it was fine. For unbound, it wasn't great in that NUMA machines would get max_active that's multiplied by the number of nodes but didn't cause huge problems because NUMA machines are relatively rare and the node count is usually pretty low. However, cache layouts are more complex now and sharing a worker pool across a whole node didn't really work well for unbound workqueues. Thus, a series of commits culminating on 8639ecebc9b1 ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") implemented more flexible affinity mechanism for unbound workqueues which enables using e.g. last-level-cache aligned pools. In the process, 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") made unbound workqueues use per-cpu pwqs like per-cpu workqueues. While the change was necessary to enable more flexible affinity scopes, this came with the side effect of blowing up the effective max_active for unbound workqueues. Before, the effective max_active for unbound workqueues was multiplied by the number of nodes. After, by the number of CPUs. 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") claims that this should generally be okay. It is okay for users which self-regulates concurrency level which are the vast majority; however, there are enough use cases which actually depend on max_active to prevent the level of concurrency from going bonkers including several IO handling workqueues that can issue a work item for each in-flight IO. With targeted benchmarks, the misbehavior can easily be exposed as reported in http://lkml.kernel.org/r/dbu6wiwu3sdhmhikb2w6lns7b27gbobfavhjj57kwi2quafgwl@htjcc5oikcr3. Unfortunately, there is no way to express what these use cases need using per-cpu max_active. A CPU may issue most of in-flight IOs, so we don't want to set max_active too low but as soon as we increase max_active a bit, we can end up with unreasonable number of in-flight work items when many CPUs issue IOs at the same time. ie. The acceptable lowest max_active is higher than the acceptable highest max_active. Ideally, max_active for an unbound workqueue should be system-wide so that the users can regulate the total level of concurrency regardless of node and cache layout. The reasons workqueue hasn't implemented that yet are: - One max_active enforcement decouples from pool boundaires, chaining execution after a work item finishes requires inter-pool operations which would require lock dancing, which is nasty. - Sharing a single nr_active count across the whole system can be pretty expensive on NUMA machines. - Per-pwq enforcement had been more or less okay while we were using per-node pools. It looks like we no longer can avoid decoupling max_active enforcement from pool boundaries. This patch implements system-wide nr_active mechanism with the following design characteristics: - To avoid sharing a single counter across multiple nodes, the configured max_active is split across nodes according to the proportion of each workqueue's online effective CPUs per node. e.g. A node with twice more online effective CPUs will get twice higher portion of max_active. - Workqueue used to be able to process a chain of interdependent work items which is as long as max_active. We can't do this anymore as max_active is distributed across the nodes. Instead, a new parameter min_active is introduced which determines the minimum level of concurrency within a node regardless of how max_active distribution comes out to be. It is set to the smaller of max_active and WQ_DFL_MIN_ACTIVE which is 8. This can lead to higher effective max_weight than configured and also deadlocks if a workqueue was depending on being able to handle chains of interdependent work items that are longer than 8. I believe these should be fine given that the number of CPUs in each NUMA node is usually higher than 8 and work item chain longer than 8 is pretty unlikely. However, if these assumptions turn out to be wrong, we'll need to add an interface to adjust min_active. - Each unbound wq has an array of struct wq_node_nr_active which tracks per-node nr_active. When its pwq wants to run a work item, it has to obtain the matching node's nr_active. If over the node's max_active, the pwq is queued on wq_node_nr_active->pending_pwqs. As work items finish, the completion path round-robins the pending pwqs activating the first inactive work item of each, which involves some pool lock dancing and kicking other pools. It's not the simplest code but doesn't look too bad. v4: - wq_adjust_max_active() updated to invoke wq_update_node_max_active(). - wq_adjust_max_active() is now protected by wq->mutex instead of wq_pool_mutex. v3: - wq_node_max_active() used to calculate per-node max_active on the fly based on system-wide CPU online states. Lai pointed out that this can lead to skewed distributions for workqueues with restricted cpumasks. Update the max_active distribution to use per-workqueue effective online CPU counts instead of system-wide and cache the calculation results in node_nr_active->max. v2: - wq->min/max_active now uses WRITE/READ_ONCE() as suggested by Lai. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Naohiro Aota <Naohiro.Aota@wdc.com> Link: http://lkml.kernel.org/r/dbu6wiwu3sdhmhikb2w6lns7b27gbobfavhjj57kwi2quafgwl@htjcc5oikcr3 Fixes: 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
2024-01-28Merge tag 'x86_urgent_for_v6.8_rc2' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure 32-bit syscall registers are properly sign-extended - Add detection for AMD's Zen5 generation CPUs and Intel's Clearwater Forest CPU model number - Make a stub function export non-GPL because it is part of the paravirt alternatives and that can be used by non-GPL code * tag 'x86_urgent_for_v6.8_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/CPU/AMD: Add more models to X86_FEATURE_ZEN5 x86/entry/ia32: Ensure s32 is sign extended to s64 x86/cpu: Add model number for Intel Clearwater Forest processor x86/CPU/AMD: Add X86_FEATURE_ZEN5 x86/paravirt: Make BUG_func() usable by non-GPL modules
2024-01-27Merge tag 'ata-6.8-rc2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata updates from Niklas Cassel: - Fix an incorrect link_power_management_policy sysfs attribute value. We were previously using the same attribute value for two different LPM policies (me) - Add a ASMedia ASM1166 quirk. The SATA host controller always reports that it has 32 ports, even though it only has six ports. Add a quirk that overrides the value reported by the controller (Conrad) - Add a ASMedia ASM1061 quirk. The SATA host controller completely ignores the upper 21 bits of the DMA address. This causes IOMMU error events when a (valid) DMA address actually has any of the upper 21 bits set. Add a quirk that limits the dma_mask to 43-bits (Lennert) * tag 'ata-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ahci: add 43-bit DMA address quirk for ASMedia ASM1061 controllers ahci: asm1166: correct count of reported ports ata: libata-sata: improve sysfs description for ATA_LPM_UNKNOWN
2024-01-27workqueue: Break up enum definitions and give names to the typesTejun Heo1-17/+24
workqueue is collecting different sorts of enums into a single unnamed enum type which can increase confusion around enum width. Also, unnamed enums can't be accessed from BPF. Let's break up enum definitions according to their purposes and give them type names. Signed-off-by: Tejun Heo <tj@kernel.org>
2024-01-26Merge tag 'spi-fix-v6.8-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "As well as a few device IDs and the usual scattering of driver specific fixes this contains a couple of core things. One is a missed case in error handling, the other patch is a change from me raising the number of chip selects allowed by the newly added multi chip select support patches to resolve problems seen on several systems that exceeded the limit. This is not a real solution to the issue but rather just a change to avoid disruption to users, one of the options I am considering is just sending a revert of those changes if we can't come up with something sensible" * tag 'spi-fix-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: fix finalize message on error return spi: cs42l43: Handle error from devm_pm_runtime_enable spi: Raise limit on number of chip selects spi: hisi-sfc-v3xx: Return IRQ_NONE if no interrupts were detected spi: spi-cadence: Reverse the order of interleaved write and read operations spi: spi-imx: Use dev_err_probe for failed DMA channel requests spi: bcm-qspi: fix SFDP BFPT read by usig mspi read spi: intel-pci: Add support for Arrow Lake SPI serial flash spi: intel-pci: Remove Meteor Lake-S SoC PCI ID from the list
2024-01-26bitmap: Define a cleanup function for bitmapsBartosz Golaszewski1-0/+3
Add support for autopointers for bitmaps allocated with bitmap_alloc() et al. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20240122124243.44002-2-brgl@bgdev.pl
2024-01-26mm, kmsan: fix infinite recursion due to RCU critical sectionMarco Elver1-3/+3
Alexander Potapenko writes in [1]: "For every memory access in the code instrumented by KMSAN we call kmsan_get_metadata() to obtain the metadata for the memory being accessed. For virtual memory the metadata pointers are stored in the corresponding `struct page`, therefore we need to call virt_to_page() to get them. According to the comment in arch/x86/include/asm/page.h, virt_to_page(kaddr) returns a valid pointer iff virt_addr_valid(kaddr) is true, so KMSAN needs to call virt_addr_valid() as well. To avoid recursion, kmsan_get_metadata() must not call instrumented code, therefore ./arch/x86/include/asm/kmsan.h forks parts of arch/x86/mm/physaddr.c to check whether a virtual address is valid or not. But the introduction of rcu_read_lock() to pfn_valid() added instrumented RCU API calls to virt_to_page_or_null(), which is called by kmsan_get_metadata(), so there is an infinite recursion now. I do not think it is correct to stop that recursion by doing kmsan_enter_runtime()/kmsan_exit_runtime() in kmsan_get_metadata(): that would prevent instrumented functions called from within the runtime from tracking the shadow values, which might introduce false positives." Fix the issue by switching pfn_valid() to the _sched() variant of rcu_read_lock/unlock(), which does not require calling into RCU. Given the critical section in pfn_valid() is very small, this is a reasonable trade-off (with preemptible RCU). KMSAN further needs to be careful to suppress calls into the scheduler, which would be another source of recursion. This can be done by wrapping the call to pfn_valid() into preempt_disable/enable_no_resched(). The downside is that this sacrifices breaking scheduling guarantees; however, a kernel compiled with KMSAN has already given up any performance guarantees due to being heavily instrumented. Note, KMSAN code already disables tracing via Makefile, and since mmzone.h is included, it is not necessary to use the notrace variant, which is generally preferred in all other cases. Link: https://lkml.kernel.org/r/20240115184430.2710652-1-glider@google.com [1] Link: https://lkml.kernel.org/r/20240118110022.2538350-1-elver@google.com Fixes: 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing memory_section->usage") Signed-off-by: Marco Elver <elver@google.com> Reported-by: Alexander Potapenko <glider@google.com> Reported-by: syzbot+93a9e8a3dea8d6085e12@syzkaller.appspotmail.com Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Cc: Charan Teja Kalla <quic_charante@quicinc.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-26mm: mmap: map MAP_STACK to VM_NOHUGEPAGEYang Shi1-0/+1
commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") incured regression for stress-ng pthread benchmark [1]. It is because THP get allocated to pthread's stack area much more possible than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not. The MAP_STACK flag is used to mark the stack area, but it is a no-op on Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating THP for such stack area. With this change the stack area looks like: fffd18e10000-fffd19610000 rw-p 00000000 00:00 0 Size: 8192 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 12 kB Pss: 12 kB Pss_Dirty: 12 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 12 kB Referenced: 12 kB Anonymous: 12 kB KSM: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd wr mr mw me ac nh The "nh" flag is set. [1] https://lore.kernel.org/linux-mm/202312192310.56367035-oliver.sang@intel.com/ Link: https://lkml.kernel.org/r/20231221065943.2803551-2-shy828301@gmail.com Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christopher Lameter <cl@linux.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: <stable@vger.kerenl.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-25Merge tag 'net-6.8-rc2' of ↵Linus Torvalds5-10/+11
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf, netfilter and WiFi. Jakub is doing a lot of work to include the self-tests in our CI, as a result a significant amount of self-tests related fixes is flowing in (and will likely continue in the next few weeks). Current release - regressions: - bpf: fix a kernel crash for the riscv 64 JIT - bnxt_en: fix memory leak in bnxt_hwrm_get_rings() - revert "net: macsec: use skb_ensure_writable_head_tail to expand the skb" Previous releases - regressions: - core: fix removing a namespace with conflicting altnames - tc/flower: fix chain template offload memory leak - tcp: - make sure init the accept_queue's spinlocks once - fix autocork on CPUs with weak memory model - udp: fix busy polling - mlx5e: - fix out-of-bound read in port timestamping - fix peer flow lists corruption - iwlwifi: fix a memory corruption Previous releases - always broken: - netfilter: - nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain - nft_limit: reject configurations that cause integer overflow - bpf: fix bpf_xdp_adjust_tail() with XSK zero-copy mbuf, avoiding a NULL pointer dereference upon shrinking - llc: make llc_ui_sendmsg() more robust against bonding changes - smc: fix illegal rmb_desc access in SMC-D connection dump - dpll: fix pin dump crash for rebound module - bnxt_en: fix possible crash after creating sw mqprio TCs - hv_netvsc: calculate correct ring size when PAGE_SIZE is not 4kB Misc: - several self-tests fixes for better integration with the netdev CI - added several missing modules descriptions" * tag 'net-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits) tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ring tsnep: Remove FCS for XDP data path net: fec: fix the unhandled context fault from smmu selftests: bonding: do not test arp/ns target with mode balance-alb/tlb fjes: fix memleaks in fjes_hw_setup i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue i40e: set xdp_rxq_info::frag_size xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers ice: remove redundant xdp_rxq_info registration i40e: handle multi-buffer packets that are shrunk by xdp prog ice: work on pre-XDP prog frag count xsk: fix usage of multi-buffer BPF helpers for ZC XDP xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags xsk: recycle buffer in case Rx queue was full net: fill in MODULE_DESCRIPTION()s for rvu_mbox net: fill in MODULE_DESCRIPTION()s for litex net: fill in MODULE_DESCRIPTION()s for fsl_pq_mdio net: fill in MODULE_DESCRIPTION()s for fec ...
2024-01-25fs: make the i_size_read/write helpers be smp_load_acquire/store_release()Baokun Li1-2/+8
In [Link] Linus mentions that acquire/release makes it clear which _particular_ memory accesses are the ordered ones, and it's unlikely to make any performance difference, so it's much better to pair up the release->acquire ordering than have a "wmb->rmb" ordering. ========================================================= update pagecache folio_mark_uptodate(folio) smp_wmb() set_bit PG_uptodate === ↑↑↑ STLR ↑↑↑ === smp_store_release(&inode->i_size, i_size) folio_test_uptodate(folio) test_bit PG_uptodate smp_rmb() === ↓↓↓ LDAR ↓↓↓ === smp_load_acquire(&inode->i_size) copy_page_to_iter() ========================================================= Calling smp_store_release() in i_size_write() ensures that the data in the page and the PG_uptodate bit are updated before the isize is updated, and calling smp_load_acquire() in i_size_read ensures that it will not read a newer isize than the data in the page. Therefore, this avoids buffered read-write inconsistencies caused by Load-Load reordering. Link: https://lore.kernel.org/r/CAHk-=wifOnmeJq+sn+2s-P46zw0SFEbw9BSCGgp2c5fYPtRPGw@mail.gmail.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240124142857.4146716-2-libaokun1@huawei.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-24exec: Distinguish in_execve from in_execKees Cook1-1/+1
Just to help distinguish the fs->in_exec flag from the current->in_execve flag, add comments in check_unsafe_exec() and copy_fs() for more context. Also note that in_execve is only used by TOMOYO now. Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-01-24spi: Raise limit on number of chip selectsMark Brown1-1/+1
As reported by Guenter the limit we've got on the number of chip selects is set too low for some systems, raise the limit. We should really remove the hard coded limit but this is needed as a fix so let's do the simple thing and raise the limit for now. Fixes: 4d8ff6b0991d ("spi: Add multi-cs memories support in SPI core") Reported-by: Guenter Roeck <linux@roeck-us.net> Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/r/20240124-spi-multi-cs-max-v2-1-df6fc5ab1abc@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2024-01-24genirq: Remove unneeded forward declarationDawei Li1-1/+1
The protoype of irq_flow_handler_t is independent of irq_data, so remove unneeded forward declaration. Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240122085716.2999875-4-dawei.li@shingroup.cn
2024-01-24irqchip/gic(v3): Replace gic_irq() with irqd_to_hwirq()Dawei Li1-1/+1
GIC & GIC-v3 share same gic_irq() implementations, both of which serve exact same purpose as irqd_to_hwirq(). irqd_to_hwirq() is a generic and top level API of the interrupt subsystem, it's independent of any chip implementation. Replace gic_irq() with irqd_to_hwirq() and convert struct irq_data::hwirq to irq_hw_number_t explicitly. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240122085716.2999875-3-dawei.li@shingroup.cn
2024-01-24x86/entry/ia32: Ensure s32 is sign extended to s64Richard Palethorpe1-0/+1
Presently ia32 registers stored in ptregs are unconditionally cast to unsigned int by the ia32 stub. They are then cast to long when passed to __se_sys*, but will not be sign extended. This takes the sign of the syscall argument into account in the ia32 stub. It still casts to unsigned int to avoid implementation specific behavior. However then casts to int or unsigned int as necessary. So that the following cast to long sign extends the value. This fixes the io_pgetevents02 LTP test when compiled with -m32. Presently the systemcall io_pgetevents_time64() unexpectedly accepts -1 for the maximum number of events. It doesn't appear other systemcalls with signed arguments are effected because they all have compat variants defined and wired up. Fixes: ebeb8c82ffaf ("syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32") Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com> Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240110130122.3836513-1-nik.borisov@suse.com Link: https://lore.kernel.org/ltp/20210921130127.24131-1-rpalethorpe@suse.com/
2024-01-24net/mlx5: Bridge, fix multicast packets sent to uplinkMoshe Shemesh2-1/+2
To enable multicast packets which are offloaded in bridge multicast offload mode to be sent also to uplink, FTE bit uplink_hairpin_en should be set. Add this bit to FTE for the bridge multicast offload rules. Fixes: 18c2916cee12 ("net/mlx5: Bridge, snoop igmp/mld packets") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24net/mlx5: Fix query of sd_group fieldTariq Toukan2-3/+8
The sd_group field moved in the HW spec from the MPIR register to the vport context. Align the query accordingly. Fixes: f5e956329960 ("net/mlx5: Expose Management PCIe Index Register (MPIR)") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24net/mlx5e: Use the correct lag ports number when creating TISesSaeed Mahameed1-0/+1
The cited commit moved the code of mlx5e_create_tises() and changed the loop to create TISes over MLX5_MAX_PORTS constant value, instead of getting the correct lag ports supported by the device, which can cause FW errors on devices with less than MLX5_MAX_PORTS ports. Change that back to mlx5e_get_num_lag_ports(mdev). Also IPoIB interfaces create there own TISes, they don't use the eth TISes, pass a flag to indicate that. This fixes the following errors that might appear in kernel log: mlx5_cmd_out_err:808:(pid 650): CREATE_TIS(0x912) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x595b5d), err(-22) mlx5e_create_mdev_resources:174:(pid 650): alloc tises failed, -22 Fixes: b25bd37c859f ("net/mlx5: Move TISes from priv to mdev HW resources") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-23Compiler Attributes: counted_by: fixup clang URLSergey Senozhatsky1-1/+1
The URL in question 404 now, fix it up (and switch to github). Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/b7babeb9c5b14af9189f0d6225673e6e9a8f4ad3.1704855496.git.senozhatsky@chromium.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-01-23Compiler Attributes: counted_by: bump min gcc versionSergey Senozhatsky1-1/+1
GCC is expected to implement this feature in version 15, so bump the version. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/e1c27b64ae7abe2ebe647be11b71cf1bca84f677.1704855495.git.senozhatsky@chromium.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-01-23Merge tag 'exportfs-6.9' of ↵Christian Brauner1-0/+11
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/cel/linux Merge exportfs fixes from Chuck Lever: * tag 'exportfs-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/cel/linux: fs: Create a generic is_dot_dotdot() utility exportfs: fix the fallback implementation of the get_name export operation Link: https://lore.kernel.org/r/BDC2AEB4-7085-4A7C-8DE8-A659FE1DBA6A@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-23nvmet: unify aer type enumGuixin Liu1-6/+0
The host and target use two definition of aer type, unify them into a single one. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-01-23fs: Create a generic is_dot_dotdot() utilityChuck Lever1-0/+11
De-duplicate the same functionality in several places by hoisting the is_dot_dotdot() utility function into linux/fs.h. Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-23ata: libata-sata: improve sysfs description for ATA_LPM_UNKNOWNNiklas Cassel1-1/+1
Currently, both ATA_LPM_UNKNOWN (0) and ATA_LPM_MAX_POWER (1) displays as "max_performance" in sysfs. This is quite misleading as they are not the same. For ATA_LPM_UNKNOWN, ata_eh_set_lpm() will not be called at all, leaving the configuration in unknown state. For ATA_LPM_MAX_POWER, ata_eh_set_lpm() is called, and setting the policy to ATA_LPM_MAX_POWER. This also matches the description of the SATA_MOBILE_LPM_POLICY Kconfig: 0 => Keep firmware settings 1 => Maximum performance Thus, update the sysfs description for ATA_LPM_UNKNOWN to match reality. While at it, update libata.h to mention that the ascii descriptions are in libata-sata.c and not in libata-scsi.c. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-01-22iio: adc: ad_sigma_delta: ensure proper DMA alignmentNuno Sa1-1/+3
Aligning the buffer to the L1 cache is not sufficient in some platforms as they might have larger cacheline sizes for caches after L1 and thus, we can't guarantee DMA safety. That was the whole reason to introduce IIO_DMA_MINALIGN in [1]. Do the same for the sigma_delta ADCs. [1]: https://lore.kernel.org/linux-iio/20220508175712.647246-2-jic23@kernel.org/ Fixes: 0fb6ee8d0b5e ("iio: ad_sigma_delta: Don't put SPI transfer buffer on the stack") Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20240117-dev_sigma_delta_no_irq_flags-v1-1-db39261592cf@analog.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>