summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-01-07hfs/hfsplus: avoid WARN_ON() for sanity check, use proper error handlingLinus Torvalds1-5/+10
Commit 55d1cbbbb29e ("hfs/hfsplus: use WARN_ON for sanity check") fixed a build warning by turning a comment into a WARN_ON(), but it turns out that syzbot then complains because it can trigger said warning with a corrupted hfs image. The warning actually does warn about a bad situation, but we are much better off just handling it as the error it is. So rather than warn about us doing bad things, stop doing the bad things and return -EIO. While at it, also fix a memory leak that was introduced by an earlier fix for a similar syzbot warning situation, and add a check for one case that historically wasn't handled at all (ie neither comment nor subsequent WARN_ON). Reported-by: syzbot+7bb7cd3595533513a9e7@syzkaller.appspotmail.com Fixes: 55d1cbbbb29e ("hfs/hfsplus: use WARN_ON for sanity check") Fixes: 8d824e69d9f3 ("hfs: fix OOB Read in __hfs_brec_find") Link: https://lore.kernel.org/lkml/000000000000dbce4e05f170f289@google.com/ Tested-by: Michael Schmitz <schmitzmic@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-07Merge tag 'block-2023-01-06' of git://git.kernel.dk/linuxLinus Torvalds22-18/+3487
Pull block fixes from Jens Axboe: "The big change here is obviously the revert of the pktcdvd driver removal. Outside of that, just minor tweaks. In detail: - Re-instate the pktcdvd driver, which necessitates adding back bio_copy_data_iter() and the fops->devnode() hook for now (me) - Fix for splitting of a bio marked as NOWAIT, causing either nowait reads or writes to error with EAGAIN even if parts of the IO completed (me) - Fix for ublk, punting management commands to io-wq as they can all easily block for extended periods of time (Ming) - Removal of SRCU dependency for the block layer (Paul)" * tag 'block-2023-01-06' of git://git.kernel.dk/linux: block: Remove "select SRCU" Revert "pktcdvd: remove driver." Revert "block: remove devnode callback from struct block_device_operations" Revert "block: bio_copy_data_iter" ublk: honor IO_URING_F_NONBLOCK for handling control command block: don't allow splitting of a REQ_NOWAIT bio block: handle bio_split_to_limits() NULL return
2023-01-07Merge tag 'io_uring-2023-01-06' of git://git.kernel.dk/linuxLinus Torvalds4-8/+30
Pull io_uring fixes from Jens Axboe: "A few minor fixes that should go into the 6.2 release: - Fix for a memory leak in io-wq worker creation, if we ultimately end up canceling the worker creation before it gets created (me) - lockdep annotations for the CQ locking (Pavel) - A regression fix for CQ timeout handling (Pavel) - Ring pinning around deferred task_work fix (Pavel) - A trivial member move in struct io_ring_ctx, saving us some memory (me)" * tag 'io_uring-2023-01-06' of git://git.kernel.dk/linux: io_uring: fix CQ waiting timeout handling io_uring: move 'poll_multi_queue' bool in io_ring_ctx io_uring: lockdep annotate CQ locking io_uring: pin context while queueing deferred tw io_uring/io-wq: free worker if task_work creation is canceled
2023-01-06Merge tag 'tif-notify-signal-2023-01-06' of git://git.kernel.dk/linuxLinus Torvalds1-6/+7
Pull arm TIF_NOTIFY_SIGNAL fixup from Jens Axboe: "Hui Tang reported a performance regressions with _TIF_WORK_MASK in newer kernels, which he tracked to a change that went into 5.11. After this change, we'll call do_work_pending() more often than we need to, because we're now testing bits 0..15 rather than just 0..7. Shuffle the bits around to avoid this" * tag 'tif-notify-signal-2023-01-06' of git://git.kernel.dk/linux: ARM: renumber bits related to _TIF_WORK_MASK
2023-01-06Merge tag 'ceph-for-6.2-rc3' of https://github.com/ceph/ceph-clientLinus Torvalds4-8/+22
Pull ceph fixes from Ilya Dryomov: "Two file locking fixes from Xiubo" * tag 'ceph-for-6.2-rc3' of https://github.com/ceph/ceph-client: ceph: avoid use-after-free in ceph_fl_release_lock() ceph: switch to vfs_inode_has_locks() to fix file lock bug
2023-01-06Merge tag 'fixes_for_v6.2-rc3' of ↵Linus Torvalds1-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF fixes from Jan Kara: "Two fixups of the UDF changes that went into 6.2-rc1" * tag 'fixes_for_v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: initialize newblock to 0 udf: Fix extension of the last extent in the file
2023-01-06Merge tag 'for-6.2-rc2-tag' of ↵Linus Torvalds9-16/+53
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more regression and regular fixes: - regressions: - fix assertion condition using = instead of == - fix false alert on bad tree level check - fix off-by-one error in delalloc search during lseek - fix compat ro feature check at read-write remount - handle case when read-repair happens with ongoing device replace - updated error messages" * tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix compat_ro checks against remount btrfs: always report error in run_one_delayed_ref() btrfs: handle case when repair happens with dev-replace btrfs: fix off-by-one in delalloc search during lseek btrfs: fix false alert on bad tree level check btrfs: add error message for metadata level mismatch btrfs: fix ASSERT em->len condition in btrfs_get_extent
2023-01-06Merge tag 'riscv-for-linus-6.2-rc3' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - use the correct mask for c.jr/c.jalr when decoding instructions - build fix for get_user() to avoid a sparse warning * tag 'riscv-for-linus-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: uaccess: fix type of 0 variable on error in get_user() riscv, kprobes: Stricter c.jr/c.jalr decoding
2023-01-06Merge tag 'perf-tools-fixes-for-v6.2-1-2023-01-06' of ↵Linus Torvalds13-32/+70
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix segfault when trying to process tracepoints present in a perf.data file and not linked with libtraceevent. - Fix build on uClibc systems by adding missing sys/types.h include, that was being obtained indirectly which stopped being the case when tools/lib/traceevent was removed. - Don't show commands in 'perf help' that depend on linking with libtraceevent when not building with that library, which is now a possibility since we no longer ship a copy in tools/lib/traceevent. - Fix failure in 'perf test' entry testing the combination of 'perf probe' user space function + 'perf record' + 'perf script' where it expects a backtrace leading to glibc's inet_pton() from 'ping' that now happens more than once with glibc 2.35 for IPv6 addreses. - Fix for the inet_pton perf test on s/390 where 'text_to_binary_address' now appears on the backtrace. - Fix build error on riscv due to missing header for 'struct perf_sample'. - Fix 'make -C tools perf_install' install variant by not propagating the 'subdir' to submakes for the 'install_headers' targets. - Fix handling of unsupported cgroup events when using BPF counters in 'perf stat'. - Count all cgroups, not just the last one when using 'perf stat' and combining --for-each-cgroup with --bpf-counters. This makes the output using BPF counters match the output without using it, which was the intention all along, the output should be the same using --bpf-counters or not. - Fix 'perf lock contention' core dump related to not finding the "__sched_text_end" symbol on s/390. - Fix build failure when HEAD is signed: exclude the signature from the version string. - Add missing closedir() calls to in perf_data__open_dir(), plugging a fd leak. * tag 'perf-tools-fixes-for-v6.2-1-2023-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf tools: Fix build on uClibc systems by adding missing sys/types.h include perf stat: Fix handling of --for-each-cgroup with --bpf-counters to match non BPF mode perf stat: Fix handling of unsupported cgroup events when using BPF counters perf test record_probe_libc_inet_pton: Fix test on s/390 where 'text_to_binary_address' now appears on the backtrace perf lock contention: Fix core dump related to not finding the "__sched_text_end" symbol on s/390 perf build: Don't propagate subdir to submakes for install_headers perf test record_probe_libc_inet_pton: Fix failure due to extra inet_pton() backtrace in glibc >= 2.35 perf tools: Fix segfault when trying to process tracepoints in perf.data and not linked with libtraceevent perf tools: Don't include signature in version strings perf help: Use HAVE_LIBTRACEEVENT to filter out unsupported commands perf tools riscv: Fix build error on riscv due to missing header for 'struct perf_sample' perf tools: Fix resources leak in perf_data__open_dir()
2023-01-06Merge tag 'perf-urgent-2023-01-06' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Ingo Molnar: "Intel RAPL updates for new model IDs" * tag 'perf-urgent-2023-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/rapl: Add support for Intel Emerald Rapids perf/x86/rapl: Add support for Intel Meteor Lake perf/x86/rapl: Treat Tigerlake like Icelake
2023-01-06Merge tag 'v6.2-p2' of ↵Linus Torvalds3-5/+7
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a CFI crash in arm64/sm4 as well as a regression in the caam driver" * tag 'v6.2-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: arm64/sm4 - fix possible crash with CFI enabled crypto: caam - fix CAAM io mem access in blob_gen
2023-01-06Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"Chuck Lever7-16/+15
The premise that "Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags" is false. svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd threads when determining which thread to wake up next. Found via KCSAN. Fixes: 28df0988815f ("SUNRPC: Use RMW bitops in single-threaded hot paths") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-01-06nfsd: fix handling of cached open files in nfsd4_open codepathJeff Layton4-71/+42
Commit fb70bf124b05 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file") added the ability to cache an open fd over a compound. There are a couple of problems with the way this currently works: It's racy, as a newly-created nfsd_file can end up with its PENDING bit cleared while the nf is hashed, and the nf_file pointer is still zeroed out. Other tasks can find it in this state and they expect to see a valid nf_file, and can oops if nf_file is NULL. Also, there is no guarantee that we'll end up creating a new nfsd_file if one is already in the hash. If an extant entry is in the hash with a valid nf_file, nfs4_get_vfs_file will clobber its nf_file pointer with the value of op_file and the old nf_file will leak. Fix both issues by making a new nfsd_file_acquirei_opened variant that takes an optional file pointer. If one is present when this is called, we'll take a new reference to it instead of trying to open the file. If the nfsd_file already has a valid nf_file, we'll just ignore the optional file and pass the nfsd_file back as-is. Also rework the tracepoints a bit to allow for an "opened" variant and don't try to avoid counting acquisitions in the case where we already have a cached open file. Fixes: fb70bf124b05 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file") Cc: Trond Myklebust <trondmy@hammerspace.com> Reported-by: Stanislav Saner <ssaner@redhat.com> Reported-and-Tested-by: Ruben Vestergaard <rubenv@drcmr.dk> Reported-and-Tested-by: Torkil Svensgaard <torkil@drcmr.dk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-01-06s390: define RUNTIME_DISCARD_EXIT to fix link error with GNU ld < 2.36Masahiro Yamada1-0/+2
Nathan Chancellor reports that the s390 vmlinux fails to link with GNU ld < 2.36 since commit 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv"). It happens for defconfig, or more specifically for CONFIG_EXPOLINE=y. $ s390x-linux-gnu-ld --version | head -n1 GNU ld (GNU Binutils for Debian) 2.35.2 $ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- allnoconfig $ ./scripts/config -e CONFIG_EXPOLINE $ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- olddefconfig $ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- `.exit.text' referenced in section `.s390_return_reg' of drivers/base/dd.o: defined in discarded section `.exit.text' of drivers/base/dd.o make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1 make: *** [Makefile:1252: vmlinux] Error 2 arch/s390/kernel/vmlinux.lds.S wants to keep EXIT_TEXT: .exit.text : { EXIT_TEXT } But, at the same time, EXIT_TEXT is thrown away by DISCARD because s390 does not define RUNTIME_DISCARD_EXIT. I still do not understand why the latter wins after 99cb0d917ffa, but defining RUNTIME_DISCARD_EXIT seems correct because the comment line in arch/s390/kernel/vmlinux.lds.S says: /* * .exit.text is discarded at runtime, not link time, * to deal with references from __bug_table */ Nathan also found that binutils commit 21401fc7bf67 ("Duplicate output sections in scripts") cured this issue, so we cannot reproduce it with binutils 2.36+, but it is better to not rely on it. Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv") Link: https://lore.kernel.org/all/Y7Jal56f6UBh1abE@dev-arch.thelio-3990X/ Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20230105031306.1455409-1-masahiroy@kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-06s390: expicitly align _edata and _end symbols on page boundaryAlexander Gordeev1-0/+2
Symbols _edata and _end in the linker script are the only unaligned expicitly on page boundary. Although _end is aligned implicitly by BSS_SECTION macro that is still inconsistent and could lead to a bug if a tool or function would assume that _edata is as aligned as others. For example, vmem_map_init() function does not align symbols _etext, _einittext etc. Should these symbols be unaligned as well, the size of ranges to update were short on one page. Instead of fixing every occurrence of this kind in the code and external tools just force the alignment on these two symbols. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-06s390/debug: add _ASM_S390_ prefix to header guardNiklas Schnelle1-3/+3
Using DEBUG_H without a prefix is very generic and inconsistent with other header guards in arch/s390/include/asm. In fact it collides with the same name in the ath9k wireless driver though that depends on !S390 via disabled wireless support. Let's just use a consistent header guard name and prevent possible future trouble. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-06usb: dwc3: gadget: Ignore End Transfer delay on teardownThinh Nguyen1-1/+4
If we delay sending End Transfer for Setup TRB to be prepared, we need to check if the End Transfer was in preparation for a driver teardown/soft-disconnect. In those cases, just send the End Transfer command without delay. In the case of soft-disconnect, there's a very small chance the command may not go through immediately. But should it happen, the Setup TRB will be prepared during the polling of the controller halted state, allowing the command to go through then. In the case of disabling endpoint due to reconfiguration (e.g. set_interface(alt-setting) or usb reset), then it's driven by the host. Typically the host wouldn't immediately cancel the control request and send another control transfer to trigger the End Transfer command timeout. Fixes: 4db0fbb60136 ("usb: dwc3: gadget: Don't delay End Transfer on delayed_status") Cc: stable@vger.kernel.org Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/f1617a323e190b9cc408fb8b65456e32b5814113.1670546756.git.Thinh.Nguyen@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-06usb: dwc3: xilinx: include linux/gpio/consumer.hArnd Bergmann1-0/+1
The newly added gpio consumer calls cause a build failure in configurations that fail to include the right header implicitly: drivers/usb/dwc3/dwc3-xilinx.c: In function 'dwc3_xlnx_init_zynqmp': drivers/usb/dwc3/dwc3-xilinx.c:207:22: error: implicit declaration of function 'devm_gpiod_get_optional'; did you mean 'devm_clk_get_optional'? [-Werror=implicit-function-declaration] 207 | reset_gpio = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_LOW); | ^~~~~~~~~~~~~~~~~~~~~~~ | devm_clk_get_optional Fixes: ca05b38252d7 ("usb: dwc3: xilinx: Add gpio-reset support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230103121755.956027-1-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-06udf: initialize newblock to 0Tom Rix1-3/+1
The clang build reports this error fs/udf/inode.c:805:6: error: variable 'newblock' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (*err < 0) ^~~~~~~~ newblock is never set before error handling jump. Initialize newblock to 0 and remove redundant settings. Fixes: d8b39db5fab8 ("udf: Handle error when adding extent to a file") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20221230175341.1629734-1-trix@redhat.com>
2023-01-06udf: Fix extension of the last extent in the fileJan Kara1-1/+1
When extending the last extent in the file within the last block, we wrongly computed the length of the last extent. This is mostly a cosmetical problem since the extent does not contain any data and the length will be fixed up by following operations but still. Fixes: 1f3868f06855 ("udf: Fix extending file within last block") Signed-off-by: Jan Kara <jack@suse.cz>
2023-01-06Merge branch 'devlink-unregister'David S. Miller6-112/+137
Jakub Kicinski says: ==================== devlink: remove the wait-for-references on unregister Move the registration and unregistration of the devlink instances under their instance locks. Don't perform the netdev-style wait for all references when unregistering the instance. Instead the devlink instance refcount will only ensure that the memory of the instance is not freed. All places which acquire access to devlink instances via a reference must check that the instance is still registered under the instance lock. This fixes the problem of the netdev code accessing devlink instances before they are registered. RFC: https://lore.kernel.org/all/20221217011953.152487-1-kuba@kernel.org/ - rewrite the cover letter - rewrite the commit message for patch 1 - un-export and rename devl_is_alive - squash the netdevsim patches ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06netdevsim: move devlink registration under the instance lockJakub Kicinski1-3/+8
To prevent races with netdev code accessing free devlink instances move the registration under the devlink instance lock. Core now waits for the instance to be registered before accessing it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06netdevsim: rename a labelJakub Kicinski1-2/+2
err_dl_unregister should unregister the devlink instance. Looks like renaming it was missed in one of the reshufflings. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: allow registering parameters after the instanceJakub Kicinski1-11/+11
It's most natural to register the instance first and then its subobjects. Now that we can use the instance lock to protect the atomicity of all init - it should also be safe. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: don't require setting features before registrationJakub Kicinski1-2/+0
Requiring devlink_set_features() to be run before devlink is registered is overzealous. devlink_set_features() itself is a leftover from old workarounds which were trying to prevent initiating reload before probe was complete. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: remove the registration guarantee of referencesJakub Kicinski3-38/+30
The objective of exposing the devlink instance locks to drivers was to let them use these locks to prevent user space from accessing the device before it's fully initialized. This is difficult because devlink_unregister() waits for all references to be released, meaning that devlink_unregister() can't itself be called under the instance lock. To avoid this issue devlink_register() was moved after subobject registration a while ago. Unfortunately the netdev paths get a hold of the devlink instances _before_ they are registered. Ideally netdev should wait for devlink init to finish (synchronizing on the instance lock). This can't work because we don't know if the instance will _ever_ be registered (in case of failures it may not). The other option of returning an error until devlink_register() is called is unappealing (user space would get a notification netdev exist but would have to wait arbitrary amount of time before accessing some of its attributes). Weaken the guarantees of the devlink references. Holding a reference will now only guarantee that the memory of the object is around. Another way of looking at it is that the reference now protects the object not its "registered" status. Use devlink instance lock to synchronize unregistration. This implies that releasing of the "main" reference of the devlink instance moves from devlink_unregister() to devlink_free(). Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: always check if the devlink instance is registeredJakub Kicinski4-12/+60
Always check under the instance lock whether the devlink instance is still / already registered. This is a no-op for the most part, as the unregistration path currently waits for all references. On the init path, however, we may temporarily open up a race with netdev code, if netdevs are registered before the devlink instance. This is temporary, the next change fixes it, and this commit has been split out for the ease of review. Note that in case of iterating over sub-objects which have their own lock (regions and line cards) we assume an implicit dependency between those objects existing and devlink unregistration. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: protect devlink->dev by the instance lockJakub Kicinski3-8/+11
devlink->dev is assumed to be always valid as long as any outstanding reference to the devlink instance exists. In prep for weakening of the references take the instance lock. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: update the code in netns move to latest helpersJakub Kicinski1-3/+4
devlink_pernet_pre_exit() is the only obvious place which takes the instance lock without using the devl_ helpers. Update the code and move the error print after releasing the reference (having unlock and put together feels slightly idiomatic). Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: bump the instance index directly when iteratingJakub Kicinski2-35/+13
xa_find_after() is designed to handle multi-index entries correctly. If a xarray has two entries one which spans indexes 0-3 and one at index 4 xa_find_after(0) will return the entry at index 4. Having to juggle the two callbacks, however, is unnecessary in case of the devlink xarray, as there is 1:1 relationship with indexes. Always use xa_find() and increment the index manually. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06tipc: fix unexpected link reset due to discovery messagesTung Nguyen1-4/+8
This unexpected behavior is observed: node 1 | node 2 ------ | ------ link is established | link is established reboot | link is reset up | send discovery message receive discovery message | link is established | link is established send discovery message | | receive discovery message | link is reset (unexpected) | send reset message link is reset | It is due to delayed re-discovery as described in function tipc_node_check_dest(): "this link endpoint has already reset and re-established contact with the peer, before receiving a discovery message from that node." However, commit 598411d70f85 has changed the condition for calling tipc_node_link_down() which was the acceptance of new media address. This commit fixes this by restoring the old and correct behavior. Fixes: 598411d70f85 ("tipc: make resetting of links non-atomic") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06sysctl: expose all net/core sysctls inside netnsMahesh Bandewar1-5/+0
All were not visible to the non-priv users inside netns. However, with 4ecb90090c84 ("sysctl: allow override of /proc/sys/net with CAP_NET_ADMIN"), these vars are protected from getting modified. A proc with capable(CAP_NET_ADMIN) can change the values so not having them visible inside netns is just causing nuisance to process that check certain values (e.g. net.core.somaxconn) and see different behavior in root-netns vs. other-netns Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06rxrpc: Move client call connection to the I/O threadDavid Howells14-530/+297
Move the connection setup of client calls to the I/O thread so that a whole load of locking and barrierage can be eliminated. This necessitates the app thread waiting for connection to complete before it can begin encrypting data. This also completes the fix for a race that exists between call connection and call disconnection whereby the data transmission code adds the call to the peer error distribution list after the call has been disconnected (say by the rxrpc socket getting closed). The fix is to complete the process of moving call connection, data transmission and call disconnection into the I/O thread and thus forcibly serialising them. Note that the issue may predate the overhaul to an I/O thread model that were included in the merge window for v6.2, but the timing is very much changed by the change given below. Fixes: cf37b5987508 ("rxrpc: Move DATA transmission into call processor work item") Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Move the client conn cache management to the I/O threadDavid Howells6-86/+62
Move the management of the client connection cache to the I/O thread rather than managing it from the namespace as an aggregate across all the local endpoints within the namespace. This will allow a load of locking to be got rid of in a future patch as only the I/O thread will be looking at the this. The downside is that the total number of cached connections on the system can get higher because the limit is now per-local rather than per-netns. We can, however, keep the number of client conns in use across the entire netfs and use that to reduce the expiration time of idle connection. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Remove call->state_lockDavid Howells12-184/+142
All the setters of call->state are now in the I/O thread and thus the state lock is now unnecessary. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Move call state changes from recvmsg to I/O threadDavid Howells5-111/+109
Move the call state changes that are made in rxrpc_recvmsg() to the I/O thread. This means that, thenceforth, only the I/O thread does this and the call state lock can be removed. This requires the Rx phase to be ended when the last packet is received, not when it is processed. Since this now changes the rxrpc call state to SUCCEEDED before we've consumed all the data from it, rxrpc_kernel_check_life() mustn't say the call is dead until the recvmsg queue is empty (unless the call has failed). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Move call state changes from sendmsg to I/O threadDavid Howells3-60/+63
Move all the call state changes that are made in rxrpc_sendmsg() to the I/O thread. This is a step towards removing the call state lock. This requires the switch to the RXRPC_CALL_CLIENT_AWAIT_REPLY and RXRPC_CALL_SERVER_SEND_REPLY states to be done when the last packet is decanted from ->tx_sendmsg to ->tx_buffer in the I/O thread, not when it is added to ->tx_sendmsg by sendmsg(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Wrap accesses to get call state to put the barrier in one placeDavid Howells5-24/+38
Wrap accesses to get the state of a call from outside of the I/O thread in a single place so that the barrier needed to order wrt the error code and abort code is in just that place. Also use a barrier when setting the call state and again when reading the call state such that the auxiliary completion info (error code, abort code) can be read without taking a read lock on the call state lock. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Split out the call state changing functions into their own fileDavid Howells4-89/+108
Split out the functions that change the state of an rxrpc call into their own file. The idea being to remove anything to do with changing the state of a call directly from the rxrpc sendmsg() and recvmsg() paths and have all that done in the I/O thread only, with the ultimate aim of removing the state lock entirely. Moving the code out of sendmsg.c and recvmsg.c makes that easier to manage. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Set up a connection bundle from a call, not rxrpc_conn_parametersDavid Howells7-75/+76
Use the information now stored in struct rxrpc_call to configure the connection bundle and thence the connection, rather than using the rxrpc_conn_parameters struct. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Offload the completion of service conn security to the I/O threadDavid Howells4-14/+40
Offload the completion of the challenge/response cycle on a service connection to the I/O thread. After the RESPONSE packet has been successfully decrypted and verified by the work queue, offloading the changing of the call states to the I/O thread makes iteration over the conn's channel list simpler. Do this by marking the RESPONSE skbuff and putting it onto the receive queue for the I/O thread to collect. We put it on the front of the queue as we've already received the packet for it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Make the set of connection IDs per local endpointDavid Howells5-38/+35
Make the set of connection IDs per local endpoint so that endpoints don't cause each other's connections to get dismissed. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Tidy up abort generation infrastructureDavid Howells17-444/+484
Tidy up the abort generation infrastructure in the following ways: (1) Create an enum and string mapping table to list the reasons an abort might be generated in tracing. (2) Replace the 3-char string with the values from (1) in the places that use that to log the abort source. This gets rid of a memcpy() in the tracepoint. (3) Subsume the rxrpc_rx_eproto tracepoint with the rxrpc_abort tracepoint and use values from (1) to indicate the trace reason. (4) Always make a call to an abort function at the point of the abort rather than stashing the values into variables and using goto to get to a place where it reported. The C optimiser will collapse the calls together as appropriate. The abort functions return a value that can be returned directly if appropriate. Note that this extends into afs also at the points where that generates an abort. To aid with this, the afs sources need to #define RXRPC_TRACE_ONLY_DEFINE_ENUMS before including the rxrpc tracing header because they don't have access to the rxrpc internal structures that some of the tracepoints make use of. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Clean up connection abortDavid Howells8-213/+188
Clean up connection abort, using the connection state_lock to gate access to change that state, and use an rxrpc_call_completion value to indicate the difference between local and remote aborts as these can be pasted directly into the call state. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Implement a mechanism to send an event notification to a connectionDavid Howells6-9/+55
Provide a means by which an event notification can be sent to a connection through such that the I/O thread can pick it up and handle it rather than doing it in a separate workqueue. This is then used to move the deferred final ACK of a call into the I/O thread rather than a separate work queue as part of the drive to do all transmission from the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Only disconnect calls in the I/O threadDavid Howells5-15/+9
Only perform call disconnection in the I/O thread to reduce the locking requirement. This is the first part of a fix for a race that exists between call connection and call disconnection whereby the data transmission code adds the call to the peer error distribution list after the call has been disconnected (say by the rxrpc socket getting closed). The fix is to complete the process of moving call connection, data transmission and call disconnection into the I/O thread and thus forcibly serialising them. Note that the issue may predate the overhaul to an I/O thread model that were included in the merge window for v6.2, but the timing is very much changed by the change given below. Fixes: cf37b5987508 ("rxrpc: Move DATA transmission into call processor work item") Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Only set/transmit aborts in the I/O threadDavid Howells7-18/+50
Only set the abort call completion state in the I/O thread and only transmit ABORT packets from there. rxrpc_abort_call() can then be made to actually send the packet. Further, ABORT packets should only be sent if the call has been exposed to the network (ie. at least one attempted DATA transmission has occurred for it). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Separate call retransmission from other conn eventsDavid Howells3-30/+8
Call the rxrpc_conn_retransmit_call() directly from rxrpc_input_packet() rather than calling it via connection event handling. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Make the local endpoint hold a ref on a connected callDavid Howells4-13/+23
Make the local endpoint and it's I/O thread hold a reference on a connected call until that call is disconnected. Without this, we're reliant on either the AF_RXRPC socket to hold a ref (which is dropped when the call is released) or a queued work item to hold a ref (the work item is being replaced with the I/O thread). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Stash the network namespace pointer in rxrpc_localDavid Howells5-23/+21
Stash the network namespace pointer in the rxrpc_local struct in addition to a pointer to the rxrpc-specific net namespace info. Use this to remove some places where the socket is passed as a parameter. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org