Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, isofs, udf, and quota updates from Jan Kara:
"A lot of material this time:
- removal of a lot of GFP_NOFS usage from ext2, udf, quota (either it
was legacy or replaced with scoped memalloc_nofs_*() API)
- removal of BUG_ONs in quota code
- conversion of UDF to the new mount API
- tightening quota on disk format verification
- fix some potentially unsafe use of RCU pointers in quota code and
annotate everything properly to make sparse happy
- a few other small quota, ext2, udf, and isofs fixes"
* tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (26 commits)
udf: remove SLAB_MEM_SPREAD flag usage
quota: remove SLAB_MEM_SPREAD flag usage
isofs: remove SLAB_MEM_SPREAD flag usage
ext2: remove SLAB_MEM_SPREAD flag usage
ext2: mark as deprecated
udf: convert to new mount API
udf: convert novrs to an option flag
MAINTAINERS: add missing git address for ext2 entry
quota: Detect loops in quota tree
quota: Properly annotate i_dquot arrays with __rcu
quota: Fix rcu annotations of inode dquot pointers
isofs: handle CDs with bad root inode but good Joliet root directory
udf: Avoid invalid LVID used on mount
quota: Fix potential NULL pointer dereference
quota: Drop GFP_NOFS instances under dquot->dq_lock and dqio_sem
quota: Set nofs allocation context when acquiring dqio_sem
ext2: Remove GFP_NOFS use in ext2_xattr_cache_insert()
ext2: Drop GFP_NOFS use in ext2_get_blocks()
ext2: Drop GFP_NOFS allocation from ext2_init_block_alloc_info()
udf: Remove GFP_NOFS allocation in udf_expand_file_adinicb()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
- fsnotify optimizations to reduce cost of fsnotify when nobody is
watching
- fix longstanding wart that system could not be suspended when some
process was waiting for response to fanotify permission event
- some spelling fixes
* tag 'fsnotify_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: allow freeze when waiting response for permission events
fanotify: Fix misspelling of "writable"
fsnotify: Fix misspelling of "writable"
inotify: Fix misspelling of "writable"
fsnotify: Add fsnotify_sb_has_watchers() helper
fsnotify: optimize the case of no parent watcher
|
|
Pull xfs updates from Chandan Babu:
- Online repair updates:
- More ondisk structures being repaired:
- Inode's mode field by trying to obtain file type value from
the a directory entry
- Quota counters
- Link counts of inodes
- FS summary counters
- Support for in-memory btrees has been added to support repair
of rmap btrees
- Misc changes:
- Report corruption of metadata to the health tracking subsystem
- Enable indirect health reporting when resources are scarce
- Reduce memory usage while repairing refcount btree
- Extend "Bmap update" intent item to support atomic extent
swapping on the realtime device
- Extend "Bmap update" intent item to support extended attribute
fork and unwritten extents
- Code cleanups:
- Bmap log intent
- Btree block pointer checking
- Btree readahead
- Buffer target
- Symbolic link code
- Remove mrlock wrapper around the rwsem
- Convert all the GFP_NOFS flag usages to use the scoped
memalloc_nofs_save() API instead of direct calls with the GFP_NOFS
- Refactor and simplify xfile abstraction. Lower level APIs in shmem.c
are required to be exported in order to achieve this
- Skip checking alignment constraints for inode chunk allocations when
block size is larger than inode chunk size
- Do not submit delwri buffers collected during log recovery when an
error has been encountered
- Fix SEEK_HOLE/DATA for file regions which have active COW extents
- Fix lock order inversion when executing error handling path during
shrinking a filesystem
- Remove duplicate ifdefs
* tag 'xfs-6.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (183 commits)
xfs: shrink failure needs to hold AGI buffer
mm/shmem.c: Use new form of *@param in kernel-doc
kernel-doc: Add unary operator * to $type_param_ref
xfs: use kvfree() in xlog_cil_free_logvec()
xfs: xfs_btree_bload_prep_block() should use __GFP_NOFAIL
xfs: fix scrub stats file permissions
xfs: fix log recovery erroring out on refcount recovery failure
xfs: move symlink target write function to libxfs
xfs: move remote symlink target read function to libxfs
xfs: move xfs_symlink_remote.c declarations to xfs_symlink_remote.h
xfs: xfs_bmap_finish_one should map unwritten extents properly
xfs: support deferred bmap updates on the attr fork
xfs: support recovering bmap intent items targetting realtime extents
xfs: add a realtime flag to the bmap update log redo items
xfs: add a xattr_entry helper
xfs: fix xfs_bunmapi to allow unmapping of partial rt extents
xfs: move xfs_bmap_defer_add to xfs_bmap_item.c
xfs: reuse xfs_bmap_update_cancel_item
xfs: add a bi_entry helper
xfs: remove xfs_trans_set_bmap_flags
...
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- fix for folios/netfs data corruption in cifs_extend_writeback
- additional tracepoint added
- updates for special files and symlinks: improvements to allow
selecting use of either WSL or NFS reparse point format on creating
special files
- allocation size improvement for cached files
- minor cleanup patches
- fix to allow changing the password on remount when password for the
session is expired.
- lease key related fixes: caching hardlinked files, deletes of
deferred close files, and an important fix to better reuse lease keys
for compound operations, which also can avoid lease break timeouts
when low on credits
- fix potential data corruption with write/readdir races
- compression cleanups and a fix for compression headers
* tag '6.9-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
cifs: update internal module version number for cifs.ko
smb: common: simplify compression headers
smb: common: fix fields sizes in compression_pattern_payload_v1
smb: client: negotiate compression algorithms
smb3: add dynamic trace point for ioctls
cifs: Fix writeback data corruption
smb: client: return reparse type in /proc/mounts
smb: client: set correct d_type for reparse DFS/DFSR and mount point
smb: client: parse uid, gid, mode and dev from WSL reparse points
smb: client: introduce SMB2_OP_QUERY_WSL_EA
smb: client: Fix a NULL vs IS_ERR() check in wsl_set_xattrs()
smb: client: add support for WSL reparse points
smb: client: reduce number of parameters in smb2_compound_op()
smb: client: fix potential broken compound request
smb: client: move most of reparse point handling code to common file
smb: client: introduce reparse mount option
smb: client: retry compound request without reusing lease
smb: client: do not defer close open handles to deleted files
smb: client: reuse file lease key in compound operations
smb3: update allocation size more accurately on write completion
...
|
|
As Linus suggested this enables pidfs unconditionally. A key property to
retain is the ability to compare pidfds by inode number (cf. [1]).
That's extremely helpful just as comparing namespace file descriptors by
inode number is. They are used in a variety of scenarios where they need
to be compared, e.g., when receiving a pidfd via SO_PEERPIDFD from a
socket to trivially authenticate a the sender and various other
use-cases.
For 64bit systems this is pretty trivial to do. For 32bit it's slightly
more annoying as we discussed but we simply add a dumb ida based
allocator that gets used on 32bit. This gives the same guarantees about
inode numbers on 64bit without any overflow risk. Practically, we'll
never run into overflow issues because we're constrained by the number
of processes that can exist on 32bit and by the number of open files
that can exist on a 32bit system. On 64bit none of this matters and
things are very simple.
If 32bit also needs the uniqueness guarantee they can simply parse the
contents of /proc/<pid>/fd/<nr>. The uniqueness guarantees have a
variety of use-cases. One of the most obvious ones is that they will
make pidfiles (or "pidfdfiles", I guess) reliable as the unique
identifier can be placed into there that won't be reycled. Also a
frequent request.
Note, I took the chance and simplified path_from_stashed() even further.
Instead of passing the inode number explicitly to path_from_stashed() we
let the filesystem handle that internally. So path_from_stashed() ends
up even simpler than it is now. This is also a good solution allowing
the cleanup code to be clean and consistent between 32bit and 64bit. The
cleanup path in prepare_anon_dentry() is also switched around so we put
the inode before the dentry allocation. This means we only have to call
the cleanup handler for the filesystem's inode data once and can rely
->evict_inode() otherwise.
Aside from having to have a bit of extra code for 32bit it actually ends
up a nice cleanup for path_from_stashed() imho.
Tested on both 32 and 64bit including error injection.
Link: https://github.com/systemd/systemd/pull/31713 [1]
Link: https://lore.kernel.org/r/20240312-dingo-sehnlich-b3ecc35c6de7@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Yes, yes, I know the slab people were planning on going slow and letting
every subsystem fight this thing on their own. But let's just rip off
the band-aid and get it over and done with. I don't want to see a
number of unnecessary pull requests just to get rid of a flag that no
longer has any meaning.
This was mainly done with a couple of 'sed' scripts and then some manual
cleanup of the end result.
Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Promote IMA/EVM to a proper LSM
This is the bulk of the diffstat, and the source of all the changes
in the VFS code. Prior to the start of the LSM stacking work it was
important that IMA/EVM were separate from the rest of the LSMs,
complete with their own hooks, infrastructure, etc. as it was the
only way to enable IMA/EVM at the same time as a LSM.
However, now that the bulk of the LSM infrastructure supports
multiple simultaneous LSMs, we can simplify things greatly by
bringing IMA/EVM into the LSM infrastructure as proper LSMs. This is
something I've wanted to see happen for quite some time and Roberto
was kind enough to put in the work to make it happen.
- Use the LSM hook default values to simplify the call_int_hook() macro
Previously the call_int_hook() macro required callers to supply a
default return value, despite a default value being specified when
the LSM hook was defined.
This simplifies the macro by using the defined default return value
which makes life easier for callers and should also reduce the number
of return value bugs in the future (we've had a few pop up recently,
hence this work).
- Use the KMEM_CACHE() macro instead of kmem_cache_create()
The guidance appears to be to use the KMEM_CACHE() macro when
possible and there is no reason why we can't use the macro, so let's
use it.
- Fix a number of comment typos in the LSM hook comment blocks
Not much to say here, we fixed some questionable grammar decisions in
the LSM hook comment blocks.
* tag 'lsm-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (28 commits)
cred: Use KMEM_CACHE() instead of kmem_cache_create()
lsm: use default hook return value in call_int_hook()
lsm: fix typos in security/security.c comment headers
integrity: Remove LSM
ima: Make it independent from 'integrity' LSM
evm: Make it independent from 'integrity' LSM
evm: Move to LSM infrastructure
ima: Move IMA-Appraisal to LSM infrastructure
ima: Move to LSM infrastructure
integrity: Move integrity_kernel_module_request() to IMA
security: Introduce key_post_create_or_update hook
security: Introduce inode_post_remove_acl hook
security: Introduce inode_post_set_acl hook
security: Introduce inode_post_create_tmpfile hook
security: Introduce path_post_mknod hook
security: Introduce file_release hook
security: Introduce file_post_open hook
security: Introduce inode_post_removexattr hook
security: Introduce inode_post_setattr hook
security: Align inode_setattr hook definition with EVM
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Large effort by Eric to lower rtnl_lock pressure and remove locks:
- Make commonly used parts of rtnetlink (address, route dumps
etc) lockless, protected by RCU instead of rtnl_lock.
- Add a netns exit callback which already holds rtnl_lock,
allowing netns exit to take rtnl_lock once in the core instead
of once for each driver / callback.
- Remove locks / serialization in the socket diag interface.
- Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
- Remove the dev_base_lock, depend on RCU where necessary.
- Support busy polling on a per-epoll context basis. Poll length and
budget parameters can be set independently of system defaults.
- Introduce struct net_hotdata, to make sure read-mostly global
config variables fit in as few cache lines as possible.
- Add optional per-nexthop statistics to ease monitoring / debug of
ECMP imbalance problems.
- Support TCP_NOTSENT_LOWAT in MPTCP.
- Ensure that IPv6 temporary addresses' preferred lifetimes are long
enough, compared to other configured lifetimes, and at least 2 sec.
- Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
- Add support for the independent control state machine for bonding
per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
control state machine.
- Add "network ID" to MCTP socket APIs to support hosts with multiple
disjoint MCTP networks.
- Re-use the mono_delivery_time skbuff bit for packets which user
space wants to be sent at a specified time. Maintain the timing
information while traversing veth links, bridge etc.
- Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
- Simplify many places iterating over netdevs by using an xarray
instead of a hash table walk (hash table remains in place, for use
on fastpaths).
- Speed up scanning for expired routes by keeping a dedicated list.
- Speed up "generic" XDP by trying harder to avoid large allocations.
- Support attaching arbitrary metadata to netconsole messages.
Things we sprinkled into general kernel code:
- Enforce VM_IOREMAP flag and range in ioremap_page_range and
introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
bpf_arena).
- Rework selftest harness to enable the use of the full range of ksft
exit code (pass, fail, skip, xfail, xpass).
Netfilter:
- Allow userspace to define a table that is exclusively owned by a
daemon (via netlink socket aliveness) without auto-removing this
table when the userspace program exits. Such table gets marked as
orphaned and a restarting management daemon can re-attach/regain
ownership.
- Speed up element insertions to nftables' concatenated-ranges set
type. Compact a few related data structures.
BPF:
- Add BPF token support for delegating a subset of BPF subsystem
functionality from privileged system-wide daemons such as systemd
through special mount options for userns-bound BPF fs to a trusted
& unprivileged application.
- Introduce bpf_arena which is sparse shared memory region between
BPF program and user space where structures inside the arena can
have pointers to other areas of the arena, and pointers work
seamlessly for both user-space programs and BPF programs.
- Introduce may_goto instruction that is a contract between the
verifier and the program. The verifier allows the program to loop
assuming it's behaving well, but reserves the right to terminate
it.
- Extend the BPF verifier to enable static subprog calls in spin lock
critical sections.
- Support registration of struct_ops types from modules which helps
projects like fuse-bpf that seeks to implement a new struct_ops
type.
- Add support for retrieval of cookies for perf/kprobe multi links.
- Support arbitrary TCP SYN cookie generation / validation in the TC
layer with BPF to allow creating SYN flood handling in BPF
firewalls.
- Add code generation to inline the bpf_kptr_xchg() helper which
improves performance when stashing/popping the allocated BPF
objects.
Wireless:
- Add SPP (signaling and payload protected) AMSDU support.
- Support wider bandwidth OFDMA, as required for EHT operation.
Driver API:
- Major overhaul of the Energy Efficient Ethernet internals to
support new link modes (2.5GE, 5GE), share more code between
drivers (especially those using phylib), and encourage more
uniform behavior. Convert and clean up drivers.
- Define an API for querying per netdev queue statistics from
drivers.
- IPSec: account in global stats for fully offloaded sessions.
- Create a concept of Ethernet PHY Packages at the Device Tree level,
to allow parameterizing the existing PHY package code.
- Enable Rx hashing (RSS) on GTP protocol fields.
Misc:
- Improvements and refactoring all over networking selftests.
- Create uniform module aliases for TC classifiers, actions, and
packet schedulers to simplify creating modprobe policies.
- Address all missing MODULE_DESCRIPTION() warnings in networking.
- Extend the Netlink descriptions in YAML to cover message
encapsulation or "Netlink polymorphism", where interpretation of
nested attributes depends on link type, classifier type or some
other "class type".
Drivers:
- Ethernet high-speed NICs:
- Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
- Intel (100G, ice, idpf):
- support E825-C devices
- nVidia/Mellanox:
- support devices with one port and multiple PCIe links
- Broadcom (bnxt):
- support n-tuple filters
- support configuring the RSS key
- Wangxun (ngbe/txgbe):
- implement irq_domain for TXGBE's sub-interrupts
- Pensando/AMD:
- support XDP
- optimize queue submission and wakeup handling (+17% bps)
- optimize struct layout, saving 28% of memory on queues
- Ethernet NICs embedded and virtual:
- Google cloud vNIC:
- refactor driver to perform memory allocations for new queue
config before stopping and freeing the old queue memory
- Synopsys (stmmac):
- obey queueMaxSDU and implement counters required by 802.1Qbv
- Renesas (ravb):
- support packet checksum offload
- suspend to RAM and runtime PM support
- Ethernet switches:
- nVidia/Mellanox:
- support for nexthop group statistics
- Microchip:
- ksz8: implement PHY loopback
- add support for KSZ8567, a 7-port 10/100Mbps switch
- PTP:
- New driver for RENESAS FemtoClock3 Wireless clock generator.
- Support OCP PTP cards designed and built by Adva.
- CAN:
- Support recvmsg() flags for own, local and remote traffic on CAN
BCM sockets.
- Support for esd GmbH PCIe/402 CAN device family.
- m_can:
- Rx/Tx submission coalescing
- wake on frame Rx
- WiFi:
- Intel (iwlwifi):
- enable signaling and payload protected A-MSDUs
- support wider-bandwidth OFDMA
- support for new devices
- bump FW API to 89 for AX devices; 90 for BZ/SC devices
- MediaTek (mt76):
- mt7915: newer ADIE version support
- mt7925: radio temperature sensor support
- Qualcomm (ath11k):
- support 6 GHz station power modes: Low Power Indoor (LPI),
Standard Power) SP and Very Low Power (VLP)
- QCA6390 & WCN6855: support 2 concurrent station interfaces
- QCA2066 support
- Qualcomm (ath12k):
- refactoring in preparation for Multi-Link Operation (MLO)
support
- 1024 Block Ack window size support
- firmware-2.bin support
- support having multiple identical PCI devices (firmware needs
to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
- QCN9274: support split-PHY devices
- WCN7850: enable Power Save Mode in station mode
- WCN7850: P2P support
- RealTek:
- rtw88: support for more rtw8811cu and rtw8821cu devices
- rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
- rtlwifi: speed up USB firmware initialization
- rtwl8xxxu:
- RTL8188F: concurrent interface support
- Channel Switch Announcement (CSA) support in AP mode
- Broadcom (brcmfmac):
- per-vendor feature support
- per-vendor SAE password setup
- DMI nvram filename quirk for ACEPC W5 Pro"
* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
nexthop: Fix out-of-bounds access during attribute validation
nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
nexthop: Only parse NHA_OP_FLAGS for get messages that require it
bpf: move sleepable flag from bpf_prog_aux to bpf_prog
bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
selftests/bpf: Add kprobe multi triggering benchmarks
ptp: Move from simple ida to xarray
vxlan: Remove generic .ndo_get_stats64
vxlan: Do not alloc tstats manually
devlink: Add comments to use netlink gen tool
nfp: flower: handle acti_netdevs allocation failure
net/packet: Add getsockopt support for PACKET_COPY_THRESH
net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
selftests/bpf: Add bpf_arena_htab test.
selftests/bpf: Add bpf_arena_list test.
selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
bpf: Add helper macro bpf_addr_space_cast()
libbpf: Recognize __arena global variables.
bpftool: Recognize arena map type
...
|
|
Pull smack updates from Casey Schaufler:
- Improvements to the initialization of in-memory inodes
- A fix in ramfs to propery ensure the initialization of in-memory
inodes
- Removal of duplicated code in smack_cred_transfer()
* tag 'Smack-for-6.9' of https://github.com/cschaufler/smack-next:
Smack: use init_task_smack() in smack_cred_transfer()
ramfs: Initialize security of in-memory inodes
smack: Initialize the in-memory inode in smack_inode_init_security()
smack: Always determine inode labels in smack_inode_init_security()
smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity()
smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"As is pretty normal for this tree, there are changes all over the
place, especially for small fixes, selftest improvements, and improved
macro usability.
Some header changes ended up landing via this tree as they depended on
the string header cleanups. Also, a notable set of changes is the work
for the reintroduction of the UBSAN signed integer overflow sanitizer
so that we can continue to make improvements on the compiler side to
make this sanitizer a more viable future security hardening option.
Summary:
- string.h and related header cleanups (Tanzir Hasan, Andy
Shevchenko)
- VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev,
Harshit Mogalapalli)
- selftests/powerpc: Fix load_unaligned_zeropad build failure
(Michael Ellerman)
- hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn)
- Handle tail call optimization better in LKDTM (Douglas Anderson)
- Use long form types in overflow.h (Andy Shevchenko)
- Add flags param to string_get_size() (Andy Shevchenko)
- Add Coccinelle script for potential struct_size() use (Jacob
Keller)
- Fix objtool corner case under KCFI (Josh Poimboeuf)
- Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng)
- Add str_plural() helper (Michal Wajdeczko, Kees Cook)
- Ignore relocations in .notes section
- Add comments to explain how __is_constexpr() works
- Fix m68k stack alignment expectations in stackinit Kunit test
- Convert string selftests to KUnit
- Add KUnit tests for fortified string functions
- Improve reporting during fortified string warnings
- Allow non-type arg to type_max() and type_min()
- Allow strscpy() to be called with only 2 arguments
- Add binary mode to leaking_addresses scanner
- Various small cleanups to leaking_addresses scanner
- Adding wrapping_*() arithmetic helper
- Annotate initial signed integer wrap-around in refcount_t
- Add explicit UBSAN section to MAINTAINERS
- Fix UBSAN self-test warnings
- Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL
- Reintroduce UBSAN's signed overflow sanitizer"
* tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits)
selftests/powerpc: Fix load_unaligned_zeropad build failure
string: Convert helpers selftest to KUnit
string: Convert selftest to KUnit
sh: Fix build with CONFIG_UBSAN=y
compiler.h: Explain how __is_constexpr() works
overflow: Allow non-type arg to type_max() and type_min()
VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
lib/string_helpers: Add flags param to string_get_size()
x86, relocs: Ignore relocations in .notes section
objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks
overflow: Use POD in check_shl_overflow()
lib: stackinit: Adjust target string to 8 bytes for m68k
sparc: vdso: Disable UBSAN instrumentation
kernel.h: Move lib/cmdline.c prototypes to string.h
leaking_addresses: Provide mechanism to scan binary files
leaking_addresses: Ignore input device status lines
leaking_addresses: Use File::Temp for /tmp files
MAINTAINERS: Update LEAKING_ADDRESSES details
fortify: Improve buffer overflow reporting
fortify: Add KUnit tests for runtime overflows
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- Drop needless error path code in remove_arg_zero() (Li kunyu, Kees
Cook)
- binfmt_elf_efpic: Don't use missing interpreter's properties (Max
Filippov)
- Use /bin/bash for execveat selftests
* tag 'execve-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
exec: Simplify remove_arg_zero() error path
selftests/exec: Perform script checks with /bin/bash
exec: Delete unnecessary statements in remove_arg_zero()
fs: binfmt_elf_efpic: don't use missing interpreter's properties
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore updates from Kees Cook:
- Make PSTORE_RAM available by default on arm64 (Nícolas F R A Prado)
- Allow for dynamic initialization in modular build (Guilherme G
Piccoli)
- Add missing allocation failure check (Kunwu Chan)
- Avoid duplicate memory zeroing (Christophe JAILLET)
- Avoid potential double-free during pstorefs umount
* tag 'pstore-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/zone: Don't clear memory twice
pstore/zone: Add a null pointer check to the psz_kmsg_read
efi: pstore: Allow dynamic initialization based on module parameter
arm64: defconfig: Enable PSTORE_RAM
pstore/ram: Register to module device table
pstore: inode: Only d_invalidate() is needed
|
|
Pull nfsd updates from Chuck Lever:
"The bulk of the patches for this release are optimizations, code
clean-ups, and minor bug fixes.
One new feature to mention is that NFSD administrators now have the
ability to revoke NFSv4 open and lock state. NFSD's NFSv3 support has
had this capability for some time.
As always I am grateful to NFSD contributors, reviewers, and testers"
* tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (75 commits)
NFSD: Clean up nfsd4_encode_replay()
NFSD: send OP_CB_RECALL_ANY to clients when number of delegations reaches its limit
NFSD: Document nfsd_setattr() fill-attributes behavior
nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr()
nfsd: Fix a regression in nfsd_setattr()
NFSD: OP_CB_RECALL_ANY should recall both read and write delegations
NFSD: handle GETATTR conflict with write delegation
NFSD: add support for CB_GETATTR callback
NFSD: Document the phases of CREATE_SESSION
NFSD: Fix the NFSv4.1 CREATE_SESSION operation
nfsd: clean up comments over nfs4_client definition
svcrdma: Add Write chunk WRs to the RPC's Send WR chain
svcrdma: Post WRs for Write chunks in svc_rdma_sendto()
svcrdma: Post the Reply chunk and Send WR together
svcrdma: Move write_info for Reply chunks into struct svc_rdma_send_ctxt
svcrdma: Post Send WR chain
svcrdma: Fix retry loop in svc_rdma_send()
svcrdma: Prevent a UAF in svc_rdma_send()
svcrdma: Fix SQ wake-ups
svcrdma: Increase the per-transport rw_ctx count
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, we introduce compressed inode support over fscache
since a lot of native EROFS images are explicitly compressed so that
EROFS over fscache can be more widely used even without Dragonfly
Nydus [1].
Apart from that, there are some folio conversions for compressed
inodes available as well as a lockdep false positive fix.
Summary:
- Some folio conversions for compressed inodes;
- Add compressed inode support over fscache;
- Fix lockdep false positives of erofs_pseudo_mnt"
Link: https://nydus.dev [1]
* tag 'erofs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: support compressed inodes over fscache
erofs: make iov_iter describe target buffers over fscache
erofs: fix lockdep false positives on initializing erofs_pseudo_mnt
erofs: refine managed cache operations to folios
erofs: convert z_erofs_submissionqueue_endio() to folios
erofs: convert z_erofs_fill_bio_vec() to folios
erofs: get rid of `justfound` debugging tag
erofs: convert z_erofs_do_read_page() to folios
erofs: convert z_erofs_onlinepage_.* to folios
|
|
Pull fsverity update from Eric Biggers:
"Slightly improve data verification performance by eliminating an
unnecessary lock"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fsverity: remove hash page spin lock
|
|
Pull fscrypt updates from Eric Biggers:
"Fix flakiness in a test by releasing the quota synchronously when a
key is removed, and other minor cleanups"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: shrink the size of struct fscrypt_inode_info slightly
fscrypt: write CBC-CTS instead of CTS-CBC
fscrypt: clear keyring before calling key_put()
fscrypt: explicitly require that inode->i_blkbits be set
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull affs update from David Sterba:
"One change to AFFS that removes use of SLAB_MEM_SPREAD, which is going
to be removed from MM code"
* tag 'affs-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
affs: remove SLAB_MEM_SPREAD flag usage
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Mostly stabilization, refactoring and cleanup changes. There rest are
minor performance optimizations due to caching or lock contention
reduction and a few notable fixes.
Performance improvements:
- minor speedup in logging when repeatedly allocated structure is
preallocated only once, improves latency and decreases lock
contention
- minor throughput increase (+6%), reduced lock contention after
clearing delayed allocation bits, applies to several common
workload types
- skip full quota rescan if a new relation is added in the same
transaction
Fixes:
- zstd fix for inline compressed file in subpage mode, updated
version from the 6.8 time
- proper qgroup inheritance ioctl parameter validation
- more fiemap followup fixes after reduced locking done in 6.8:
- fix race when detecting delalloc ranges
Core changes:
- more debugging code:
- added assertions for a very rare crash in raid56 calculation
- tree-checker dumps page state to give more insights into
possible reference counting issues
- add checksum calculation offloading sysfs knob, for now enabled
under DEBUG only to determine a good heuristic for deciding the
offload or synchronous, depends on various factors (block group
profile, device speed) and is not as clear as initially thought
(checksum type)
- error handling improvements, added assertions
- more page to folio conversion (defrag, truncate), cached size and
shift
- preparation for more fine grained locking of sectors in subpage
mode
- cleanups and refactoring:
- include cleanups, forward declarations
- pointer-to-structure helpers
- redundant argument removals
- removed unused code
- slab cache updates, last use of SLAB_MEM_SPREAD removed"
* tag 'for-6.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (114 commits)
btrfs: reuse cloned extent buffer during fiemap to avoid re-allocations
btrfs: fix race when detecting delalloc ranges during fiemap
btrfs: fix off-by-one chunk length calculation at contains_pending_extent()
btrfs: qgroup: allow quick inherit if snapshot is created and added to the same parent
btrfs: qgroup: validate btrfs_qgroup_inherit parameter
btrfs: include device major and minor numbers in the device scan notice
btrfs: mark btrfs_put_caching_control() static
btrfs: remove SLAB_MEM_SPREAD flag use
btrfs: qgroup: always free reserved space for extent records
btrfs: tree-checker: dump the page status if hit something wrong
btrfs: compression: remove dead comments in btrfs_compress_heuristic()
btrfs: subpage: make writer lock utilize bitmap
btrfs: subpage: make reader lock utilize bitmap
btrfs: unexport btrfs_subpage_start_writer() and btrfs_subpage_end_and_test_writer()
btrfs: pass a valid extent map cache pointer to __get_extent_map()
btrfs: merge btrfs_del_delalloc_inode() helpers
btrfs: pass btrfs_device to btrfs_scratch_superblocks()
btrfs: handle transaction commit errors in flush_reservations()
btrfs: use KMEM_CACHE() to create btrfs_free_space cache
btrfs: use KMEM_CACHE() to create delayed ref caches
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull zonefs update from Damien Le Moal:
- A single change for this cycle to convert zonefs to use the new
mount API
* tag 'zonefs-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: convert zonefs to use the new mount api
|
|
Pull block updates from Jens Axboe:
- MD pull requests via Song:
- Cleanup redundant checks (Yu Kuai)
- Remove deprecated headers (Marc Zyngier, Song Liu)
- Concurrency fixes (Li Lingfeng)
- Memory leak fix (Li Nan)
- Refactor raid1 read_balance (Yu Kuai, Paul Luse)
- Clean up and fix for md_ioctl (Li Nan)
- Other small fixes (Gui-Dong Han, Heming Zhao)
- MD atomic limits (Christoph)
- NVMe pull request via Keith:
- RDMA target enhancements (Max)
- Fabrics fixes (Max, Guixin, Hannes)
- Atomic queue_limits usage (Christoph)
- Const use for class_register (Ricardo)
- Identification error handling fixes (Shin'ichiro, Keith)
- Improvement and cleanup for cached request handling (Christoph)
- Moving towards atomic queue limits. Core changes and driver bits so
far (Christoph)
- Fix UAF issues in aoeblk (Chun-Yi)
- Zoned fix and cleanups (Damien)
- s390 dasd cleanups and fixes (Jan, Miroslav)
- Block issue timestamp caching (me)
- noio scope guarding for zoned IO (Johannes)
- block/nvme PI improvements (Kanchan)
- Ability to terminate long running discard loop (Keith)
- bdev revalidation fix (Li)
- Get rid of old nr_queues hack for kdump kernels (Ming)
- Support for async deletion of ublk (Ming)
- Improve IRQ bio recycling (Pavel)
- Factor in CPU capacity for remote vs local completion (Qais)
- Add shared_tags configfs entry for null_blk (Shin'ichiro
- Fix for a regression in page refcounts introduced by the folio
unification (Tony)
- Misc fixes and cleanups (Arnd, Colin, John, Kunwu, Li, Navid,
Ricardo, Roman, Tang, Uwe)
* tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux: (221 commits)
block: partitions: only define function mac_fix_string for CONFIG_PPC_PMAC
block/swim: Convert to platform remove callback returning void
cdrom: gdrom: Convert to platform remove callback returning void
block: remove disk_stack_limits
md: remove mddev->queue
md: don't initialize queue limits
md/raid10: use the atomic queue limit update APIs
md/raid5: use the atomic queue limit update APIs
md/raid1: use the atomic queue limit update APIs
md/raid0: use the atomic queue limit update APIs
md: add queue limit helpers
md: add a mddev_is_dm helper
md: add a mddev_add_trace_msg helper
md: add a mddev_trace_remap helper
bcache: move calculation of stripe_size and io_opt into bcache_device_init
virtio_blk: Do not use disk_set_max_open/active_zones()
aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts
block: move capacity validation to blkpg_do_ioctl()
block: prevent division by zero in blk_rq_stat_sum()
drbd: atomically update queue limits in drbd_reconsider_queue_parameters
...
|
|
Pull io_uring updates from Jens Axboe:
- Make running of task_work internal loops more fair, and unify how the
different methods deal with them (me)
- Support for per-ring NAPI. The two minor networking patches are in a
shared branch with netdev (Stefan)
- Add support for truncate (Tony)
- Export SQPOLL utilization stats (Xiaobing)
- Multishot fixes (Pavel)
- Fix for a race in manipulating the request flags via poll (Pavel)
- Cleanup the multishot checking by making it generic, moving it out of
opcode handlers (Pavel)
- Various tweaks and cleanups (me, Kunwu, Alexander)
* tag 'for-6.9/io_uring-20240310' of git://git.kernel.dk/linux: (53 commits)
io_uring: Fix sqpoll utilization check racing with dying sqpoll
io_uring/net: dedup io_recv_finish req completion
io_uring: refactor DEFER_TASKRUN multishot checks
io_uring: fix mshot io-wq checks
io_uring/net: add io_req_msg_cleanup() helper
io_uring/net: simplify msghd->msg_inq checking
io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE
io_uring/net: remove dependency on REQ_F_PARTIAL_IO for sr->done_io
io_uring/net: correctly handle multishot recvmsg retry setup
io_uring/net: clear REQ_F_BL_EMPTY in the multishot retry handler
io_uring: fix io_queue_proc modifying req->flags
io_uring: fix mshot read defer taskrun cqe posting
io_uring/net: fix overflow check in io_recvmsg_mshot_prep()
io_uring/net: correct the type of variable
io_uring/sqpoll: statistics of the true utilization of sq threads
io_uring/net: move recv/recvmsg flags out of retry loop
io_uring/kbuf: flag request if buffer pool is empty after buffer pick
io_uring/net: improve the usercopy for sendmsg/recvmsg
io_uring/net: move receive multishot out of the generic msghdr path
io_uring/net: unify how recvmsg and sendmsg copy in the msghdr
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs uuid updates from Christian Brauner:
"This adds two new ioctl()s for getting the filesystem uuid and
retrieving the sysfs path based on the path of a mounted filesystem.
Getting the filesystem uuid has been implemented in filesystem
specific code for a while it's now lifted as a generic ioctl"
* tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
xfs: add support for FS_IOC_GETFSSYSFSPATH
fs: add FS_IOC_GETFSSYSFSPATH
fat: Hook up sb->s_uuid
fs: FS_IOC_GETUUID
ovl: convert to super_set_uuid()
fs: super_set_uuid()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull block handle updates from Christian Brauner:
"Last cycle we changed opening of block devices, and opening a block
device would return a bdev_handle. This allowed us to implement
support for restricting and forbidding writes to mounted block
devices. It was accompanied by converting and adding helpers to
operate on bdev_handles instead of plain block devices.
That was already a good step forward but ultimately it isn't necessary
to have special purpose helpers for opening block devices internally
that return a bdev_handle.
Fundamentally, opening a block device internally should just be
equivalent to opening files. So now all internal opens of block
devices return files just as a userspace open would. Instead of
introducing a separate indirection into bdev_open_by_*() via struct
bdev_handle bdev_file_open_by_*() is made to just return a struct
file. Opening and closing a block device just becomes equivalent to
opening and closing a file.
This all works well because internally we already have a pseudo fs for
block devices and so opening block devices is simple. There's a few
places where we needed to be careful such as during boot when the
kernel is supposed to mount the rootfs directly without init doing it.
Here we need to take care to ensure that we flush out any asynchronous
file close. That's what we already do for opening, unpacking, and
closing the initramfs. So nothing new here.
The equivalence of opening and closing block devices to regular files
is a win in and of itself. But it also has various other advantages.
We can remove struct bdev_handle completely. Various low-level helpers
are now private to the block layer. Other helpers were simply
removable completely.
A follow-up series that is already reviewed build on this and makes it
possible to remove bdev->bd_inode and allows various clean ups of the
buffer head code as well. All places where we stashed a bdev_handle
now just stash a file and use simple accessors to get to the actual
block device which was already the case for bdev_handle"
* tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
block: remove bdev_handle completely
block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access
bdev: remove bdev pointer from struct bdev_handle
bdev: make struct bdev_handle private to the block layer
bdev: make bdev_{release, open_by_dev}() private to block layer
bdev: remove bdev_open_by_path()
reiserfs: port block device access to file
ocfs2: port block device access to file
nfs: port block device access to files
jfs: port block device access to file
f2fs: port block device access to files
ext4: port block device access to file
erofs: port device access to file
btrfs: port device access to file
bcachefs: port block device access to file
target: port block device access to file
s390: port block device access to file
nvme: port block device access to file
block2mtd: port device access to files
bcache: port block device access to files
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull file locking updates from Christian Brauner:
"A few years ago struct file_lock_context was added to allow for
separate lists to track different types of file locks instead of using
a singly-linked list for all of them.
Now leases no longer need to be tracked using struct file_lock.
However, a lot of the infrastructure is identical for leases and locks
so separating them isn't trivial.
This splits a group of fields used by both file locks and leases into
a new struct file_lock_core. The new core struct is embedded in struct
file_lock. Coccinelle was used to convert a lot of the callers to deal
with the move, with the remaining 25% or so converted by hand.
Afterwards several internal functions in fs/locks.c are made to work
with struct file_lock_core. Ultimately this allows to split struct
file_lock into struct file_lock and struct file_lease. The file lease
APIs are then converted to take struct file_lease"
* tag 'vfs-6.9.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (51 commits)
filelock: fix deadlock detection in POSIX locking
filelock: always define for_each_file_lock()
smb: remove redundant check
filelock: don't do security checks on nfsd setlease calls
filelock: split leases out of struct file_lock
filelock: remove temporary compatibility macros
smb/server: adapt to breakup of struct file_lock
smb/client: adapt to breakup of struct file_lock
ocfs2: adapt to breakup of struct file_lock
nfsd: adapt to breakup of struct file_lock
nfs: adapt to breakup of struct file_lock
lockd: adapt to breakup of struct file_lock
fuse: adapt to breakup of struct file_lock
gfs2: adapt to breakup of struct file_lock
dlm: adapt to breakup of struct file_lock
ceph: adapt to breakup of struct file_lock
afs: adapt to breakup of struct file_lock
9p: adapt to breakup of struct file_lock
filelock: convert seqfile handling to use file_lock_core
filelock: convert locks_translate_pid to take file_lock_core
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pdfd updates from Christian Brauner:
- Until now pidfds could only be created for thread-group leaders but
not for threads. There was no technical reason for this. We simply
had no users that needed support for this. Now we do have users that
need support for this.
This introduces a new PIDFD_THREAD flag for pidfd_open(). If that
flag is set pidfd_open() creates a pidfd that refers to a specific
thread.
In addition, we now allow clone() and clone3() to be called with
CLONE_PIDFD | CLONE_THREAD which wasn't possible before.
A pidfd that refers to an individual thread differs from a pidfd that
refers to a thread-group leader:
(1) Pidfds are pollable. A task may poll a pidfd and get notified
when the task has exited.
For thread-group leader pidfds the polling task is woken if the
thread-group is empty. In other words, if the thread-group
leader task exits when there are still threads alive in its
thread-group the polling task will not be woken when the
thread-group leader exits but rather when the last thread in the
thread-group exits.
For thread-specific pidfds the polling task is woken if the
thread exits.
(2) Passing a thread-group leader pidfd to pidfd_send_signal() will
generate thread-group directed signals like kill(2) does.
Passing a thread-specific pidfd to pidfd_send_signal() will
generate thread-specific signals like tgkill(2) does.
The default scope of the signal is thus determined by the type
of the pidfd.
Since use-cases exist where the default scope of the provided
pidfd needs to be overriden the following flags are added to
pidfd_send_signal():
- PIDFD_SIGNAL_THREAD
Send a thread-specific signal.
- PIDFD_SIGNAL_THREAD_GROUP
Send a thread-group directed signal.
- PIDFD_SIGNAL_PROCESS_GROUP
Send a process-group directed signal.
The scope change will only work if the struct pid is actually
used for this scope.
For example, in order to send a thread-group directed signal the
provided pidfd must be used as a thread-group leader and
similarly for PIDFD_SIGNAL_PROCESS_GROUP the struct pid must be
used as a process group leader.
- Move pidfds from the anonymous inode infrastructure to a tiny pseudo
filesystem. This will unblock further work that we weren't able to do
simply because of the very justified limitations of anonymous inodes.
Moving pidfds to a tiny pseudo filesystem allows for statx on pidfds
to become useful for the first time. They can now be compared by
inode number which are unique for the system lifetime.
Instead of stashing struct pid in file->private_data we can now stash
it in inode->i_private. This makes it possible to introduce concepts
that operate on a process once all file descriptors have been closed.
A concrete example is kill-on-last-close. Another side-effect is that
file->private_data is now freed up for per-file options for pidfds.
Now, each struct pid will refer to a different inode but the same
struct pid will refer to the same inode if it's opened multiple
times. In contrast to now where each struct pid refers to the same
inode.
The tiny pseudo filesystem is not visible anywhere in userspace
exactly like e.g., pipefs and sockfs. There's no lookup, there's no
complex inode operations, nothing. Dentries and inodes are always
deleted when the last pidfd is closed.
We allocate a new inode and dentry for each struct pid and we reuse
that inode and dentry for all pidfds that refer to the same struct
pid. The code is entirely optional and fairly small. If it's not
selected we fallback to anonymous inodes. Heavily inspired by nsfs.
The dentry and inode allocation mechanism is moved into generic
infrastructure that is now shared between nsfs and pidfs. The
path_from_stashed() helper must be provided with a stashing location,
an inode number, a mount, and the private data that is supposed to be
used and it will provide a path that can be passed to dentry_open().
The helper will try retrieve an existing dentry from the provided
stashing location. If a valid dentry is found it is reused. If not a
new one is allocated and we try to stash it in the provided location.
If this fails we retry until we either find an existing dentry or the
newly allocated dentry could be stashed. Subsequent openers of the
same namespace or task are then able to reuse it.
- Currently it is only possible to get notified when a task has exited,
i.e., become a zombie and userspace gets notified with EPOLLIN. We
now also support waiting until the task has been reaped, notifying
userspace with EPOLLHUP.
- Ensure that ESRCH is reported for getfd if a task is exiting instead
of the confusing EBADF.
- Various smaller cleanups to pidfd functions.
* tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
libfs: improve path_from_stashed()
libfs: add stashed_dentry_prune()
libfs: improve path_from_stashed() helper
pidfs: convert to path_from_stashed() helper
nsfs: convert to path_from_stashed() helper
libfs: add path_from_stashed()
pidfd: add pidfs
pidfd: move struct pidfd_fops
pidfd: allow to override signal scope in pidfd_send_signal()
pidfd: change pidfd_send_signal() to respect PIDFD_THREAD
signal: fill in si_code in prepare_kill_siginfo()
selftests: add ESRCH tests for pidfd_getfd()
pidfd: getfd should always report ESRCH if a task is exiting
pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together
pidfd: exit: kill the no longer used thread_group_exited()
pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN))
pid: kill the obsolete PIDTYPE_PID code in transfer_pid()
pidfd: kill the no longer needed do_notify_pidfd() in de_thread()
pidfd_poll: report POLLHUP when pid_task() == NULL
pidfd: implement PIDFD_THREAD flag for pidfd_open()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull iomap updates from Christian Brauner:
- Restore read-write hints in struct bio through the bi_write_hint
member for the sake of UFS devices in mobile applications. This can
result in up to 40% lower write amplification in UFS devices. The
patch series that builds on this will be coming in via the SCSI
maintainers (Bart)
- Overhaul the iomap writeback code. Afterwards ->map_blocks() is able
to map multiple blocks at once as long as they're in the same folio.
This reduces CPU usage for buffered write workloads on e.g., xfs on
systems with lots of cores (Christoph)
- Record processed bytes in iomap_iter() trace event (Kassey)
- Extend iomap_writepage_map() trace event after Christoph's
->map_block() changes to map mutliple blocks at once (Zhang)
* tag 'vfs-6.9.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
iomap: Add processed for iomap_iter
iomap: add pos and dirty_len into trace_iomap_writepage_map
block, fs: Restore the per-bio/request data lifetime fields
fs: Propagate write hints to the struct block_device inode
fs: Move enum rw_hint into a new header file
fs: Split fcntl_rw_hint()
fs: Verify write lifetime constants at compile time
fs: Fix rw_hint validation
iomap: pass the length of the dirty region to ->map_blocks
iomap: map multiple blocks at a time
iomap: submit ioends immediately
iomap: factor out a iomap_writepage_map_block helper
iomap: only call mapping_set_error once for each failed bio
iomap: don't chain bios
iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend
iomap: clean up the iomap_alloc_ioend calling convention
iomap: move all remaining per-folio logic into iomap_writepage_map
iomap: factor out a iomap_writepage_handle_eof helper
iomap: move the PF_MEMALLOC check to iomap_writepages
iomap: move the io_folios field out of struct iomap_ioend
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull ntfs update from Christian Brauner:
"This removes the old ntfs driver. The new ntfs3 driver is a full
replacement that was merged over two years ago. We've went through
various userspace and either they use ntfs3 or they use the fuse
version of ntfs and thus build neither ntfs nor ntfs3. I think that's
a clear sign that we should risk removing the legacy ntfs driver.
Quoting from Arch Linux and Debian:
- Debian does neither build the legacy ntfs nor the new ntfs3:
"Not currently built with Debian's kernel packages, 'ntfs' has been
symlinked to 'ntfs-3g' as it relates to fstab and mount commands.
Debian kernels are built without support of the ntfs3 driver
developed by Paragon Software." (cf. [2])
- Archlinux provides ntfs3 as their default since 5.15:
"All officially supported kernels with versions 5.15 or newer are
built with CONFIG_NTFS3_FS=m and thus support it. Before 5.15,
NTFS read and write support is provided by the NTFS-3G FUSE file
system." (cf. [1]).
It's unmaintained apart from various odd fixes as well. Worst case we
have to reintroduce it if someone really has a valid dependency on it.
But it's worth trying to see whether we can remove it"
Link: https://wiki.archlinux.org/title/NTFS [1]
Link: https://wiki.debian.org/NTFS [2]
* tag 'vfs-6.9.ntfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: remove NTFS classic from docum. index
fs: Remove NTFS classic
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Misc features, cleanups, and fixes for vfs and individual filesystems.
Features:
- Support idmapped mounts for hugetlbfs.
- Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug
where the passed offset is ignored if the file is O_APPEND. The new
flag allows a caller to enforce that the offset is honored to
conform to posix even if the file was opened in append mode.
- Move i_mmap_rwsem in struct address_space to avoid false sharing
between i_mmap and i_mmap_rwsem.
- Convert efs, qnx4, and coda to use the new mount api.
- Add a generic is_dot_dotdot() helper that's used by various
filesystems and the VFS code instead of open-coding it multiple
times.
- Recently we've added stable offsets which allows stable ordering
when iterating directories exported through NFS on e.g., tmpfs
filesystems. Originally an xarray was used for the offset map but
that caused slab fragmentation issues over time. This switches the
offset map to the maple tree which has a dense mode that handles
this scenario a lot better. Includes tests.
- Finally merge the case-insensitive improvement series Gabriel has
been working on for a long time. This cleanly propagates case
insensitive operations through ->s_d_op which in turn allows us to
remove the quite ugly generic_set_encrypted_ci_d_ops() operations.
It also improves performance by trying a case-sensitive comparison
first and then fallback to case-insensitive lookup if that fails.
This also fixes a bug where overlayfs would be able to be mounted
over a case insensitive directory which would lead to all sort of
odd behaviors.
Cleanups:
- Make file_dentry() a simple accessor now that ->d_real() is
simplified because of the backing file work we did the last two
cycles.
- Use the dedicated file_mnt_idmap helper in ntfs3.
- Use smp_load_acquire/store_release() in the i_size_read/write
helpers and thus remove the hack to handle i_size reads in the
filemap code.
- The SLAB_MEM_SPREAD is a nop now. Remove it from various places in
fs/
- It's no longer necessary to perform a second built-in initramfs
unpack call because we retain the contents of the previous
extraction. Remove it.
- Now that we have removed various allocators kfree_rcu() always
works with kmem caches and kmalloc(). So simplify various places
that only use an rcu callback in order to handle the kmem cache
case.
- Convert the pipe code to use a lockdep comparison function instead
of open-coding the nesting making lockdep validation easier.
- Move code into fs-writeback.c that was located in a header but can
be made static as it's only used in that one file.
- Rewrite the alignment checking iterators for iovec and bvec to be
easier to read, and also significantly more compact in terms of
generated code. This saves 270 bytes of text on x86-64 (with
clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also
saves a bit of time for the same workload.
- Switch various places to use KMEM_CACHE instead of
kmem_cache_create().
- Use inode_set_ctime_to_ts() in inode_set_ctime_current()
- Use kzalloc() in name_to_handle_at() to avoid kernel infoleak.
- Various smaller cleanups for eventfds.
Fixes:
- Fix various comments and typos, and unneeded initializations.
- Fix stack allocation hack for clang in the select code.
- Improve dump_mapping() debug code on a best-effort basis.
- Fix build errors in various selftests.
- Avoid wrap-around instrumentation in various places.
- Don't allow user namespaces without an idmapping to be used for
idmapped mounts.
- Fix sysv sb_read() call.
- Fix fallback implementation of the get_name() export operation"
* tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits)
hugetlbfs: support idmapped mounts
qnx4: convert qnx4 to use the new mount api
fs: use inode_set_ctime_to_ts to set inode ctime to current time
libfs: Drop generic_set_encrypted_ci_d_ops
ubifs: Configure dentry operations at dentry-creation time
f2fs: Configure dentry operations at dentry-creation time
ext4: Configure dentry operations at dentry-creation time
libfs: Add helper to choose dentry operations at mount-time
libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
fscrypt: Drop d_revalidate once the key is added
fscrypt: Drop d_revalidate for valid dentries during lookup
fscrypt: Factor out a helper to configure the lookup dentry
ovl: Always reject mounting over case-insensitive directories
libfs: Attempt exact-match comparison first during casefolded lookup
efs: remove SLAB_MEM_SPREAD flag usage
jfs: remove SLAB_MEM_SPREAD flag usage
minix: remove SLAB_MEM_SPREAD flag usage
openpromfs: remove SLAB_MEM_SPREAD flag usage
proc: remove SLAB_MEM_SPREAD flag usage
qnx6: remove SLAB_MEM_SPREAD flag usage
...
|
|
From 2.47 to 2.48
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Unify compression headers (chained and unchained) into a single struct
so we can use it for the initial compression transform header
interchangeably.
Also make the OriginalPayloadSize field to be always visible in the
compression payload header, and have callers subtract its size when not
needed.
Rename the related structs to match the naming convetion used in the
other SMB2 structs.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
See protocol documentation in MS-SMB2 section 2.2.42.2.2
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Change "compress=" mount option to a boolean flag, that, if set,
will enable negotiating compression algorithms with the server.
Do not de/compress anything for now.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
It can be helpful in debugging to know which ioctls are called to better
correlate them with smb3 fsctls (and opens). Add a dynamic trace point
to trace ioctls into cifs.ko
Here is sample output:
TASK-PID CPU# ||||| TIMESTAMP FUNCTION
| | | ||||| | |
new-inotify-ioc-90418 [001] ..... 142157.397024: smb3_ioctl: xid=18 fid=0x0 ioctl cmd=0xc009cf0b
new-inotify-ioc-90457 [007] ..... 142217.943569: smb3_ioctl: xid=22 fid=0x389bf5b6 ioctl cmd=0xc009cf0b
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
cifs writeback doesn't correctly handle the case where
cifs_extend_writeback() hits a point where it is considering an additional
folio, but this would overrun the wsize - at which point it drops out of
the xarray scanning loop and calls xas_pause(). The problem is that
xas_pause() advances the loop counter - thereby skipping that page.
What needs to happen is for xas_reset() to be called any time we decide we
don't want to process the page we're looking at, but rather send the
request we are building and start a new one.
Fix this by copying and adapting the netfslib writepages code as a
temporary measure, with cifs writeback intending to be offloaded to
netfslib in the near future.
This also fixes the issue with the use of filemap_get_folios_tag() causing
retry of a bunch of pages which the extender already dealt with.
This can be tested by creating, say, a 64K file somewhere not on cifs
(otherwise copy-offload may get underfoot), mounting a cifs share with a
wsize of 64000, copying the file to it and then comparing the original file
and the copy:
dd if=/dev/urandom of=/tmp/64K bs=64k count=1
mount //192.168.6.1/test /mnt -o user=...,pass=...,wsize=64000
cp /tmp/64K /mnt/64K
cmp /tmp/64K /mnt/64K
Without the fix, the cmp fails at position 64000 (or shortly thereafter).
Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: samba-technical@lists.samba.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Add support for returning reparse mount option in /proc/mounts.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402262152.YZOwDlCM-lkp@intel.com/
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Set correct dirent->d_type for IO_REPARSE_TAG_DFS{,R} and
IO_REPARSE_TAG_MOUNT_POINT reparse points.
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Parse the extended attributes from WSL reparse points to correctly
report uid, gid mode and dev from ther instantiated inodes.
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Add a new command to smb2_compound_op() for querying WSL extended
attributes from reparse points.
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This was intended to be an IS_ERR() check. The ea_create_context()
function doesn't return NULL.
Fixes: 1eab17fe485c ("smb: client: add support for WSL reparse points")
Reviewed-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Add support for creating special files via WSL reparse points when
using 'reparse=wsl' mount option. They're faster than NFS reparse
points because they don't require extra roundtrips to figure out what
->d_type a specific dirent is as such information is already stored in
query dir responses and then making getdents() calls faster.
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Replace @desired_access, @create_disposition, @create_options and
@mode parameters with a single @oparms.
No functional changes.
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Now that smb2_compound_op() can accept up to 5 commands in a single
compound request, set the appropriate NextCommand and related flags to
all subsequent commands as well as handling the case where a valid
@cfile is passed and therefore skipping create and close requests in
the compound chain.
This fix a potential broken compound request that could be sent from
smb2_get_reparse_inode() if the client found a valid open
file (@cfile) prior to calling smb2_compound_op().
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
In preparation to add support for creating special files also via WSL
reparse points in next commits.
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Allow the user to create special files and symlinks by choosing
between WSL and NFS reparse points via 'reparse={nfs,wsl}' mount
options. If unset or 'reparse=default', the client will default to
creating them via NFS reparse points.
Creating WSL reparse points isn't supported yet, so simply return
error when attempting to mount with 'reparse=wsl' for now.
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
There is a shortcoming in the current implementation of the file
lease mechanism exposed when the lease keys were attempted to be
reused for unlink, rename and set_path_size operations for a client. As
per MS-SMB2, lease keys are associated with the file name. Linux smb
client maintains lease keys with the inode. If the file has any hardlinks,
it is possible that the lease for a file be wrongly reused for an
operation on the hardlink or vice versa. In these cases, the mentioned
compound operations fail with STATUS_INVALID_PARAMETER.
This patch adds a fallback to the old mechanism of not sending any
lease with these compound operations if the request with lease key fails
with STATUS_INVALID_PARAMETER.
Resending the same request without lease key should not hurt any
functionality, but might impact performance especially in cases where
the error is not because of the usage of wrong lease key and we might
end up doing an extra roundtrip.
Signed-off-by: Meetakshi Setiya <msetiya@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When a file/dentry has been deleted before closing all its open
handles, currently, closing them can add them to the deferred
close list. This can lead to problems in creating file with the
same name when the file is re-created before the deferred close
completes. This issue was seen while reusing a client's already
existing lease on a file for compound operations and xfstest 591
failed because of the deferred close handle that remained valid
even after the file was deleted and was being reused to create a
file with the same name. The server in this case returns an error
on open with STATUS_DELETE_PENDING. Recreating the file would
fail till the deferred handles are closed (duration specified in
closetimeo).
This patch fixes the issue by flagging all open handles for the
deleted file (file path to be precise) by setting
status_file_deleted to true in the cifsFileInfo structure. As per
the information classes specified in MS-FSCC, SMB2 query info
response from the server has a DeletePending field, set to true
to indicate that deletion has been requested on that file. If
this is the case, flag the open handles for this file too.
When doing close in cifs_close for each of these handles, check the
value of this boolean field and do not defer close these handles
if the corresponding filepath has been deleted.
Signed-off-by: Meetakshi Setiya <msetiya@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Currently, when a rename, unlink or set path size compound operation
is requested on a file that has a lot of dirty pages to be written
to the server, we do not send the lease key for these requests. As a
result, the server can assume that this request is from a new client, and
send a lease break notification to the same client, on the same
connection. As a response to the lease break, the client can consume
several credits to write the dirty pages to the server. Depending on the
server's credit grant implementation, the server can stop granting more
credits to this connection, and this can cause a deadlock (which can only
be resolved when the lease timer on the server expires).
One of the problems here is that the client is sending no lease key,
even if it has a lease for the file. This patch fixes the problem by
reusing the existing lease key on the file for rename, unlink and set path
size compound operations so that the client does not break its own lease.
A very trivial example could be a set of commands by a client that
maintains open handle (for write) to a file and then tries to copy the
contents of that file to another one, eg.,
tail -f /dev/null > myfile &
mv myfile myfile2
Presently, the network capture on the client shows that the move (or
rename) would trigger a lease break on the same client, for the same file.
With the lease key reused, the lease break request-response overhead is
eliminated, thereby reducing the roundtrips performed for this set of
operations.
The patch fixes the bug described above and also provides perf benefit.
Signed-off-by: Meetakshi Setiya <msetiya@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Changes to allocation size are approximated for extending writes of cached
files until the server returns the actual value (on SMB3 close or query info
for example), but it was setting the estimated value for number of blocks
to larger than the file size even if the file is likely sparse which
breaks various xfstests (e.g. generic/129, 130, 221, 228).
When i_size and i_blocks are updated in write completion do not increase
allocation size more than what was written (rounded up to 512 bytes).
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The SLAB_MEM_SPREAD flag is already a no-op as of 6.8-rc1, remove
its usage so we can delete it from slab. No functional change.
Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-0-02f1753e8303@suse.cz/
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
There are cases where a session is disconnected and password has changed
on the server (or expired) for this user and this currently can not
be fixed without unmount and mounting again. This patch allows
remount to change the password (for the non Kerberos case, Kerberos
ticket refresh is handled differently) when the session is disconnected
and the user can not reconnect due to still using old password.
Future patches should also allow us to setup the keyring (cifscreds)
to have an "alternate password" so we would be able to change
the password before the session drops (without the risk of races
between when the password changes and the disconnect occurs -
ie cases where the old password is still needed because the new
password has not fully rolled out to all servers yet).
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|