Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of driver core changes for 5.19-rc1.
Lots of tiny driver core changes and cleanups happened this cycle, but
the two major things are:
- firmware_loader reorganization and additions including the ability
to have XZ compressed firmware images and the ability for userspace
to initiate the firmware load when it needs to, instead of being
always initiated by the kernel. FPGA devices specifically want this
ability to have their firmware changed over the lifetime of the
system boot, and this allows them to work without having to come up
with yet-another-custom-uapi interface for loading firmware for
them.
- physical location support added to sysfs so that devices that know
this information, can tell userspace where they are located in a
common way. Some ACPI devices already support this today, and more
bus types should support this in the future.
Smaller changes include:
- driver_override api cleanups and fixes
- error path cleanups and fixes
- get_abi script fixes
- deferred probe timeout changes.
It's that last change that I'm the most worried about. It has been
reported to cause boot problems for a number of systems, and I have a
tested patch series that resolves this issue. But I didn't get it
merged into my tree before 5.18-final came out, so it has not gotten
any linux-next testing.
I'll send the fixup patches (there are 2) as a follow-on series to this
pull request.
All have been tested in linux-next for weeks, with no reported issues
other than the above-mentioned boot time-outs"
* tag 'driver-core-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
driver core: fix deadlock in __device_attach
kernfs: Separate kernfs_pr_cont_buf and rename_lock.
topology: Remove unused cpu_cluster_mask()
driver core: Extend deferred probe timeout on driver registration
MAINTAINERS: add Russ Weight as a firmware loader maintainer
driver: base: fix UAF when driver_attach failed
test_firmware: fix end of loop test in upload_read_show()
driver core: location: Add "back" as a possible output for panel
driver core: location: Free struct acpi_pld_info *pld
driver core: Add "*" wildcard support to driver_async_probe cmdline param
driver core: location: Check for allocations failure
arch_topology: Trace the update thermal pressure
kernfs: Rename kernfs_put_open_node to kernfs_unlink_open_file.
export: fix string handling of namespace in EXPORT_SYMBOL_NS
rpmsg: use local 'dev' variable
rpmsg: Fix calling device_lock() on non-initialized device
firmware_loader: describe 'module' parameter of firmware_upload_register()
firmware_loader: Move definitions from sysfs_upload.h to sysfs.h
firmware_loader: Fix configs for sysfs split
selftests: firmware: Add firmware upload selftests
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've refactored the existing atomic write support
implemented by in-memory operations to have storing data in disk
temporarily, which can give us a benefit to accept more atomic writes.
At the same time, we removed the existing volatile write support.
We've also revisited the file pinning and GC flows and found some
corner cases which contributeed abnormal system behaviours.
As usual, there're several minor code refactoring for readability,
sanity check, and clean ups.
Enhancements:
- allow compression for mmap files in compress_mode=user
- kill volatile write support
- change the current atomic write way
- give priority to select unpinned section for foreground GC
- introduce data read/write showing path info
- remove unnecessary f2fs_lock_op in f2fs_new_inode
Bug fixes:
- fix the file pinning flow during checkpoint=disable and GCs
- fix foreground and background GCs to select the right victims and
get free sections on time
- fix GC flags on defragmenting pages
- avoid an infinite loop to flush node pages
- fix fallocate to use file_modified to update permissions
consistently"
* tag 'f2fs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
f2fs: fix to tag gcing flag on page during file defragment
f2fs: replace F2FS_I(inode) and sbi by the local variable
f2fs: add f2fs_init_write_merge_io function
f2fs: avoid unneeded error handling for revoke_entry_slab allocation
f2fs: allow compression for mmap files in compress_mode=user
f2fs: fix typo in comment
f2fs: make f2fs_read_inline_data() more readable
f2fs: fix to do sanity check for inline inode
f2fs: fix fallocate to use file_modified to update permissions consistently
f2fs: don't use casefolded comparison for "." and ".."
f2fs: do not stop GC when requiring a free section
f2fs: keep wait_ms if EAGAIN happens
f2fs: introduce f2fs_gc_control to consolidate f2fs_gc parameters
f2fs: reject test_dummy_encryption when !CONFIG_FS_ENCRYPTION
f2fs: kill volatile write support
f2fs: change the current atomic write way
f2fs: don't need inode lock for system hidden quota
f2fs: stop allocating pinned sections if EAGAIN happens
f2fs: skip GC if possible when checkpoint disabling
f2fs: give priority to select unpinned section for foreground GC
...
|
|
Pull nfsd updates from Chuck Lever:
"We introduce 'courteous server' in this release. Previously NFSD would
purge open and lock state for an unresponsive client after one lease
period (typically 90 seconds). Now, after one lease period, another
client can open and lock those files and the unresponsive client's
lease is purged; otherwise if the unresponsive client's open and lock
state is uncontended, the server retains that open and lock state for
up to 24 hours, allowing the client's workload to resume after a
lengthy network partition.
A longstanding issue with NFSv4 file creation is also addressed.
Previously a file creation can fail internally, returning an error to
the client, but leave the newly created file in place as an artifact.
The file creation code path has been reorganized so that internal
failures and race conditions are less likely to result in an unwanted
file creation.
A fault injector has been added to help exercise paths that are run
during kernel metadata cache invalidation. These caches contain
information maintained by user space about exported filesystems. Many
of our test workloads do not trigger cache invalidation.
There is one patch that is needed to support PREEMPT_RT and a fix for
an ancient 'sleep while spin-locked' splat that seems to have become
easier to hit since v5.18-rc3"
* tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (36 commits)
NFSD: nfsd_file_put() can sleep
NFSD: Add documenting comment for nfsd4_release_lockowner()
NFSD: Modernize nfsd4_release_lockowner()
NFSD: Fix possible sleep during nfsd4_release_lockowner()
nfsd: destroy percpu stats counters after reply cache shutdown
nfsd: Fix null-ptr-deref in nfsd_fill_super()
nfsd: Unregister the cld notifier when laundry_wq create failed
SUNRPC: Use RMW bitops in single-threaded hot paths
NFSD: Clean up the show_nf_flags() macro
NFSD: Trace filecache opens
NFSD: Move documenting comment for nfsd4_process_open2()
NFSD: Fix whitespace
NFSD: Remove dprintk call sites from tail of nfsd4_open()
NFSD: Instantiate a struct file when creating a regular NFSv4 file
NFSD: Clean up nfsd_open_verified()
NFSD: Remove do_nfsd_create()
NFSD: Refactor NFSv4 OPEN(CREATE)
NFSD: Refactor NFSv3 CREATE
NFSD: Refactor nfsd_create_setattr()
NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Almost all of MM here. A few things are still getting finished off,
reviewed, etc.
- Yang Shi has improved the behaviour of khugepaged collapsing of
readonly file-backed transparent hugepages.
- Johannes Weiner has arranged for zswap memory use to be tracked and
managed on a per-cgroup basis.
- Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
runtime enablement of the recent huge page vmemmap optimization
feature.
- Baolin Wang contributes a series to fix some issues around hugetlb
pagetable invalidation.
- Zhenwei Pi has fixed some interactions between hwpoisoned pages and
virtualization.
- Tong Tiangen has enabled the use of the presently x86-only
page_table_check debugging feature on arm64 and riscv.
- David Vernet has done some fixup work on the memcg selftests.
- Peter Xu has taught userfaultfd to handle write protection faults
against shmem- and hugetlbfs-backed files.
- More DAMON development from SeongJae Park - adding online tuning of
the feature and support for monitoring of fixed virtual address
ranges. Also easier discovery of which monitoring operations are
available.
- Nadav Amit has done some optimization of TLB flushing during
mprotect().
- Neil Brown continues to labor away at improving our swap-over-NFS
support.
- David Hildenbrand has some fixes to anon page COWing versus
get_user_pages().
- Peng Liu fixed some errors in the core hugetlb code.
- Joao Martins has reduced the amount of memory consumed by
device-dax's compound devmaps.
- Some cleanups of the arch-specific pagemap code from Anshuman
Khandual.
- Muchun Song has found and fixed some errors in the TLB flushing of
transparent hugepages.
- Roman Gushchin has done more work on the memcg selftests.
... and, of course, many smaller fixes and cleanups. Notably, the
customary million cleanup serieses from Miaohe Lin"
* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
mm: kfence: use PAGE_ALIGNED helper
selftests: vm: add the "settings" file with timeout variable
selftests: vm: add "test_hmm.sh" to TEST_FILES
selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
selftests: vm: add migration to the .gitignore
selftests/vm/pkeys: fix typo in comment
ksm: fix typo in comment
selftests: vm: add process_mrelease tests
Revert "mm/vmscan: never demote for memcg reclaim"
mm/kfence: print disabling or re-enabling message
include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
mm: fix a potential infinite loop in start_isolate_page_range()
MAINTAINERS: add Muchun as co-maintainer for HugeTLB
zram: fix Kconfig dependency warning
mm/shmem: fix shmem folio swapoff hang
cgroup: fix an error handling path in alloc_pagecache_max_30M()
mm: damon: use HPAGE_PMD_SIZE
tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
nodemask.h: fix compilation error with GCC12
...
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- don't over-decrypt memory (Robin Murphy)
- takes min align mask into account for the swiotlb max mapping size
(Tianyu Lan)
- use GFP_ATOMIC in dma-debug (Mikulas Patocka)
- fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me)
- don't fail on highmem CMA pages in dma_direct_alloc_pages (me)
- cleanup swiotlb initialization and share more code with swiotlb-xen
(me, Stefano Stabellini)
* tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping: (23 commits)
dma-direct: don't over-decrypt memory
swiotlb: max mapping size takes min align mask into account
swiotlb: use the right nslabs-derived sizes in swiotlb_init_late
swiotlb: use the right nslabs value in swiotlb_init_remap
swiotlb: don't panic when the swiotlb buffer can't be allocated
dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC
dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages
swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm
x86: remove cruft from <asm/dma-mapping.h>
swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl
swiotlb: merge swiotlb-xen initialization into swiotlb
swiotlb: provide swiotlb_init variants that remap the buffer
swiotlb: pass a gfp_mask argument to swiotlb_init_late
swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction
swiotlb: make the swiotlb_init interface more useful
x86: centralize setting SWIOTLB_FORCE when guest memory encryption is enabled
x86: remove the IOMMU table infrastructure
MIPS/octeon: use swiotlb_init instead of open coding it
arm/xen: don't check for xen_initial_domain() in xen_create_contiguous_region
swiotlb: rename swiotlb_late_init_with_default_size
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core
----
- Support TCPv6 segmentation offload with super-segments larger than
64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).
- Generalize skb freeing deferral to per-cpu lists, instead of
per-socket lists.
- Add a netdev statistic for packets dropped due to L2 address
mismatch (rx_otherhost_dropped).
- Continue work annotating skb drop reasons.
- Accept alternative netdev names (ALT_IFNAME) in more netlink
requests.
- Add VLAN support for AF_PACKET SOCK_RAW GSO.
- Allow receiving skb mark from the socket as a cmsg.
- Enable memcg accounting for veth queues, sysctl tables and IPv6.
BPF
---
- Add libbpf support for User Statically-Defined Tracing (USDTs).
- Speed up symbol resolution for kprobes multi-link attachments.
- Support storing typed pointers to referenced and unreferenced
objects in BPF maps.
- Add support for BPF link iterator.
- Introduce access to remote CPU map elements in BPF per-cpu map.
- Allow middle-of-the-road settings for the
kernel.unprivileged_bpf_disabled sysctl.
- Implement basic types of dynamic pointers e.g. to allow for
dynamically sized ringbuf reservations without extra memory copies.
Protocols
---------
- Retire port only listening_hash table, add a second bind table
hashed by port and address. Avoid linear list walk when binding to
very popular ports (e.g. 443).
- Add bridge FDB bulk flush filtering support allowing user space to
remove all FDB entries matching a condition.
- Introduce accept_unsolicited_na sysctl for IPv6 to implement
router-side changes for RFC9131.
- Support for MPTCP path manager in user space.
- Add MPTCP support for fallback to regular TCP for connections that
have never connected additional subflows or transmitted
out-of-sequence data (partial support for RFC8684 fallback).
- Avoid races in MPTCP-level window tracking, stabilize and improve
throughput.
- Support lockless operation of GRE tunnels with seq numbers enabled.
- WiFi support for host based BSS color collision detection.
- Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.
- Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).
- Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).
- Allow matching on the number of VLAN tags via tc-flower.
- Add tracepoint for tcp_set_ca_state().
Driver API
----------
- Improve error reporting from classifier and action offload.
- Add support for listing line cards in switches (devlink).
- Add helpers for reporting page pool statistics with ethtool -S.
- Add support for reading clock cycles when using PTP virtual clocks,
instead of having the driver convert to time before reporting. This
makes it possible to report time from different vclocks.
- Support configuring low-latency Tx descriptor push via ethtool.
- Separate Clause 22 and Clause 45 MDIO accesses more explicitly.
New hardware / drivers
----------------------
- Ethernet:
- Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
- Sunplus SP7021 SoC (sp7021_emac)
- Add support for Renesas RZ/V2M (in ravb)
- Add support for MediaTek mt7986 switches (in mtk_eth_soc)
- Ethernet PHYs:
- ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
- TI DP83TD510 PHY
- Microchip LAN8742/LAN88xx PHYs
- WiFi:
- Driver for pureLiFi X, XL, XC devices (plfxlc)
- Driver for Silicon Labs devices (wfx)
- Support for WCN6750 (in ath11k)
- Support Realtek 8852ce devices (in rtw89)
- Mobile:
- MediaTek T700 modems (Intel 5G 5000 M.2 cards)
- CAN:
- ctucanfd: add support for CTU CAN FD open-source IP core from
Czech Technical University in Prague
Drivers
-------
- Delete a number of old drivers still using virt_to_bus().
- Ethernet NICs:
- intel: support TSO on tunnels MPLS
- broadcom: support multi-buffer XDP
- nfp: support VF rate limiting
- sfc: use hardware tx timestamps for more than PTP
- mlx5: multi-port eswitch support
- hyper-v: add support for XDP_REDIRECT
- atlantic: XDP support (including multi-buffer)
- macb: improve real-time perf by deferring Tx processing to NAPI
- High-speed Ethernet switches:
- mlxsw: implement basic line card information querying
- prestera: add support for traffic policing on ingress and egress
- Embedded Ethernet switches:
- lan966x: add support for packet DMA (FDMA)
- lan966x: add support for PTP programmable pins
- ti: cpsw_new: enable bc/mc storm prevention
- Qualcomm 802.11ax WiFi (ath11k):
- Wake-on-WLAN support for QCA6390 and WCN6855
- device recovery (firmware restart) support
- support setting Specific Absorption Rate (SAR) for WCN6855
- read country code from SMBIOS for WCN6855/QCA6390
- enable keep-alive during WoWLAN suspend
- implement remain-on-channel support
- MediaTek WiFi (mt76):
- support Wireless Ethernet Dispatch offloading packet movement
between the Ethernet switch and WiFi interfaces
- non-standard VHT MCS10-11 support
- mt7921 AP mode support
- mt7921 IPv6 NS offload support
- Ethernet PHYs:
- micrel: ksz9031/ksz9131: cabletest support
- lan87xx: SQI support for T1 PHYs
- lan937x: add interrupt support for link detection"
* tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits)
ptp: ocp: Add firmware header checks
ptp: ocp: fix PPS source selector debugfs reporting
ptp: ocp: add .init function for sma_op vector
ptp: ocp: vectorize the sma accessor functions
ptp: ocp: constify selectors
ptp: ocp: parameterize input/output sma selectors
ptp: ocp: revise firmware display
ptp: ocp: add Celestica timecard PCI ids
ptp: ocp: Remove #ifdefs around PCI IDs
ptp: ocp: 32-bit fixups for pci start address
Revert "net/smc: fix listen processing for SMC-Rv2"
ath6kl: Use cc-disable-warning to disable -Wdangling-pointer
selftests/bpf: Dynptr tests
bpf: Add dynptr data slices
bpf: Add bpf_dynptr_read and bpf_dynptr_write
bpf: Dynptr support for ring buffers
bpf: Add bpf_dynptr_from_mem for local dynptrs
bpf: Add verifier support for dynptrs
bpf: Suppress 'passing zero to PTR_ERR' warning
bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack
...
|
|
percpu_alloc_percpu event trace"
Fix sparse warning about incorrect gfp_t cast.
Link: https://lkml.kernel.org/r/001979f3-e978-0998-cbed-61a4a2ac87b8@openvz.org
Fixes: f67bed134a05 ("percpu: improve percpu_alloc_percpu event trace")
Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
conversion"
Redefines __def_gfpflag_names array according to akpm@, willy@ and Joe
Perches recommendations.
Link: https://lkml.kernel.org/r/6f811e19-41c6-f3e8-fca6-23a19a62e313@openvz.org
Fixes: fe573327ffb1 ("tracing: incorrect gfp_t conversion")
Signed-off-by: Vasily Averin <vvs@openvz.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Joe Perches <joe@perches.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Pull page cache updates from Matthew Wilcox:
- Appoint myself page cache maintainer
- Fix how scsicam uses the page cache
- Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
- Remove the AOP flags entirely
- Remove pagecache_write_begin() and pagecache_write_end()
- Documentation updates
- Convert several address_space operations to use folios:
- is_dirty_writeback
- readpage becomes read_folio
- releasepage becomes release_folio
- freepage becomes free_folio
- Change filler_t to require a struct file pointer be the first
argument like ->read_folio
* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
nilfs2: Fix some kernel-doc comments
Appoint myself page cache maintainer
fs: Remove aops->freepage
secretmem: Convert to free_folio
nfs: Convert to free_folio
orangefs: Convert to free_folio
fs: Add free_folio address space operation
fs: Convert drop_buffers() to use a folio
fs: Change try_to_free_buffers() to take a folio
jbd2: Convert release_buffer_page() to use a folio
jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
reiserfs: Convert release_buffer_page() to use a folio
fs: Remove last vestiges of releasepage
ubifs: Convert to release_folio
reiserfs: Convert to release_folio
orangefs: Convert to release_folio
ocfs2: Convert to release_folio
nilfs2: Remove comment about releasepage
nfs: Convert to release_folio
jfs: Convert to release_folio
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Features:
- subpage:
- support for PAGE_SIZE > 4K (previously only 64K)
- make it work with raid56
- repair super block num_devices automatically if it does not match
the number of device items
- defrag can convert inline extents to regular extents, up to now
inline files were skipped but the setting of mount option
max_inline could affect the decision logic
- zoned:
- minimal accepted zone size is explicitly set to 4MiB
- make zone reclaim less aggressive and don't reclaim if there are
enough free zones
- add per-profile sysfs tunable of the reclaim threshold
- allow automatic block group reclaim for non-zoned filesystems, with
sysfs tunables
- tree-checker: new check, compare extent buffer owner against owner
rootid
Performance:
- avoid blocking on space reservation when doing nowait direct io
writes (+7% throughput for reads and writes)
- NOCOW write throughput improvement due to refined locking (+3%)
- send: reduce pressure to page cache by dropping extent pages right
after they're processed
Core:
- convert all radix trees to xarray
- add iterators for b-tree node items
- support printk message index
- user bulk page allocation for extent buffers
- switch to bio_alloc API, use on-stack bios where convenient, other
bio cleanups
- use rw lock for block groups to favor concurrent reads
- simplify workques, don't allocate high priority threads for all
normal queues as we need only one
- refactor scrub, process chunks based on their constraints and
similarity
- allocate direct io structures on stack and pass around only
pointers, avoids allocation and reduces potential error handling
Fixes:
- fix count of reserved transaction items for various inode
operations
- fix deadlock between concurrent dio writes when low on free data
space
- fix a few cases when zones need to be finished
VFS, iomap:
- add helper to check if sb write has started (usable for assertions)
- new helper iomap_dio_alloc_bio, export iomap_dio_bio_end_io"
* tag 'for-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (173 commits)
btrfs: zoned: introduce a minimal zone size 4M and reject mount
btrfs: allow defrag to convert inline extents to regular extents
btrfs: add "0x" prefix for unsupported optional features
btrfs: do not account twice for inode ref when reserving metadata units
btrfs: zoned: fix comparison of alloc_offset vs meta_write_pointer
btrfs: send: avoid trashing the page cache
btrfs: send: keep the current inode open while processing it
btrfs: allocate the btrfs_dio_private as part of the iomap dio bio
btrfs: move struct btrfs_dio_private to inode.c
btrfs: remove the disk_bytenr in struct btrfs_dio_private
btrfs: allocate dio_data on stack
iomap: add per-iomap_iter private data
iomap: allow the file system to provide a bio_set for direct I/O
btrfs: add a btrfs_dio_rw wrapper
btrfs: zoned: zone finish unused block group
btrfs: zoned: properly finish block group on metadata write
btrfs: zoned: finish block group when there are no more allocatable bytes left
btrfs: zoned: consolidate zone finish functions
btrfs: zoned: introduce btrfs_zoned_bg_is_full
btrfs: improve error reporting in lookup_inline_extent_backref
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs (and fscache) updates from Gao Xiang:
"After working on it on the mailing list for more than half a year, we
finally form 'erofs over fscache' feature into shape. Hopefully it
could bring more possibility to the communities.
The story mainly started from a new project what we called "RAFS v6" [1]
for Nydus image service almost a year ago, which enhances EROFS to be
a new form of one bootstrap (which includes metadata representing the
whole fs tree) + several data-deduplicated content addressable blobs
(actually treated as multiple devices). Each blob can represent one
container image layer but not quite exactly since all new data can be
fully existed in the previous blobs so no need to introduce another
new blob.
It is actually not a new idea (at least on my side it's much like a
simpilied casync [2] for now) and has many benefits over per-file
blobs or some other exist ways since typically each RAFS v6 image only
has dozens of device blobs instead of thousands of per-file blobs.
It's easy to be signed with user keys as a golden image, transfered
untouchedly with minimal overhead over the network, kept in some type
of storage conveniently, and run with (optional) runtime verification
but without involving too many irrelevant features crossing the system
beyond EROFS itself. At least it's our final goal and we're keeping
working on it. There was also a good summary of this approach from the
casync author [3].
Regardless further optimizations, this work is almost done in the
previous Linux release cycles. In this round, we'd like to introduce
on-demand load for EROFS with the fscache/cachefiles infrastructure,
considering the following advantages:
- Introduce new file-based backend to EROFS. Although each image only
contains dozens of blobs but in densely-deployed runC host for
example, there could still be massive blobs on a machine, which is
messy if each blob is treated as a device. In contrast, fscache and
cachefiles are really great interfaces for us to make them work.
- Introduce on-demand load to fscache and EROFS. Previously, fscache
is mainly used to caching network-likewise filesystems, now it can
support on-demand downloading for local fses too with the exact
localfs on-disk format. It has many advantages which we're been
described in the latest patchset cover letter [4]. In addition to
that, most importantly, the cached data is still stored in the
original local fs on-disk format so that it's still the one signed
with private keys but only could be partially available. Users can
fully trust it during running. Later, users can also back up
cachefiles easily to another machine.
- More reliable on-demand approach in principle. After data is all
available locally, user daemon can be no longer online in some use
cases, which helps daemon crash recovery (filesystems can still in
service) and hot-upgrade (user daemon can be upgraded more
frequently due to new features or protocols introduced.)
- Other format can also be converted to EROFS filesystem format over
the internet on the fly with the new on-demand load feature and
mounted. That is entirely possible with on-demand load feature as
long as such archive format metadata can be fetched in advance like
stargz.
In addition, although currently our target user is Nydus image service [5],
but laterly, it can be used for other use cases like on-demand system
booting, etc. As for the fscache on-demand load feature itself,
strictly it can be used for other local fses too. Laterly we could
promote most code to the iomap infrastructure and also enhance it in
the read-write way if other local fses are interested.
Thanks David Howells for taking so much time and patience on this
these months, many thanks with great respect here again! Thanks Jeffle
for working on this feature and Xin Yin from Bytedance for
asynchronous I/O implementation as well as Zichen Tian, Jia Zhu, and
Yan Song for testing, much appeciated. We're also exploring more
possibly over fscache cache management over FSDAX for secure
containers and working on more improvements and useful features for
fscache, cachefiles, and on-demand load.
In addition to "erofs over fscache", NFS export and idmapped mount are
also completed in this cycle for container use cases as well.
Summary:
- Add erofs on-demand load support over fscache
- Support NFS export for erofs
- Support idmapped mounts for erofs
- Don't prompt for risk any more when using big pcluster
- Fix buffer copy overflow of ztailpacking feature
- Several minor cleanups"
[1] https://lore.kernel.org/r/20210730194625.93856-1-hsiangkao@linux.alibaba.com
[2] https://github.com/systemd/casync
[3] http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html
[4] https://lore.kernel.org/r/20220509074028.74954-1-jefflexu@linux.alibaba.com
[5] https://github.com/dragonflyoss/image-service
* tag 'erofs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (29 commits)
erofs: scan devices from device table
erofs: change to use asynchronous io for fscache readpage/readahead
erofs: add 'fsid' mount option
erofs: implement fscache-based data readahead
erofs: implement fscache-based data read for inline layout
erofs: implement fscache-based data read for non-inline layout
erofs: implement fscache-based metadata read
erofs: register fscache context for extra data blobs
erofs: register fscache context for primary data blob
erofs: add erofs_fscache_read_folios() helper
erofs: add anonymous inode caching metadata for data blobs
erofs: add fscache context helper functions
erofs: register fscache volume
erofs: add fscache mode check helper
erofs: make erofs_map_blocks() generally available
cachefiles: document on-demand read mode
cachefiles: add tracepoints for on-demand read mode
cachefiles: enable on-demand read mode
cachefiles: implement on-demand read
cachefiles: notify the user daemon when withdrawing cookie
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- rwsem cleanups & optimizations/fixes:
- Conditionally wake waiters in reader/writer slowpaths
- Always try to wake waiters in out_nolock path
- Add try_cmpxchg64() implementation, with arch optimizations - and use
it to micro-optimize sched_clock_{local,remote}()
- Various force-inlining fixes to address objdump instrumentation-check
warnings
- Add lock contention tracepoints:
lock:contention_begin
lock:contention_end
- Misc smaller fixes & cleanups
* tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote}
locking/atomic/x86: Introduce arch_try_cmpxchg64
locking/atomic: Add generic try_cmpxchg64 support
futex: Remove a PREEMPT_RT_FULL reference.
locking/qrwlock: Change "queue rwlock" to "queued rwlock"
lockdep: Delete local_irq_enable_in_hardirq()
locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning
locking: Apply contention tracepoints in the slow path
locking: Add lock contention tracepoints
locking/rwsem: Always try to wake waiters in out_nolock path
locking/rwsem: Conditionally wake waiters in reader/writer slowpaths
locking/rwsem: No need to check for handoff bit if wait queue empty
lockdep: Fix -Wunused-parameter for _THIS_IP_
x86/mm: Force-inline __phys_addr_nodebug()
x86/kvm/svm: Force-inline GHCB accessors
task_stack, x86/cea: Force-inline stack helpers
|
|
drivers/net/ethernet/cadence/macb_main.c
5cebb40bc955 ("net: macb: Fix PTP one step sync support")
138badbc21a0 ("net: macb: use NAPI for TX completion path")
https://lore.kernel.org/all/20220523111021.31489367@canb.auug.org.au/
net/smc/af_smc.c
75c1edf23b95 ("net/smc: postpone sk_refcnt increment in connect()")
3aba103006bc ("net/smc: align the connect behaviour with TCP")
https://lore.kernel.org/all/20220524114408.4bf1af38@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
"This includes some small changes to kernel/stop_machine.c and arch/x86
which are deps of the new Intel IFS support.
Highlights:
- New drivers:
- Intel "In Field Scan" (IFS) support
- Winmate FM07/FM07P buttons
- Mellanox SN2201 support
- AMD PMC driver enhancements
- Lots of various other small fixes and hardware-id additions"
* tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (54 commits)
platform/x86/intel/ifs: Add CPU_SUP_INTEL dependency
platform/x86: intel_cht_int33fe: Set driver data
platform/x86: intel-hid: fix _DSM function index handling
platform/x86: toshiba_acpi: use kobj_to_dev()
platform/x86: samsung-laptop: use kobj_to_dev()
platform/x86: gigabyte-wmi: Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI
tools/power/x86/intel-speed-select: Fix warning for perf_cap.cpu
tools/power/x86/intel-speed-select: Display error on turbo mode disabled
Documentation: In-Field Scan
platform/x86/intel/ifs: add ABI documentation for IFS
trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations
platform/x86/intel/ifs: Add IFS sysfs interface
platform/x86/intel/ifs: Add scan test support
platform/x86/intel/ifs: Authenticate and copy to secured memory
platform/x86/intel/ifs: Check IFS Image sanity
platform/x86/intel/ifs: Read IFS firmware image
platform/x86/intel/ifs: Add stub driver for In-Field Scan
stop_machine: Add stop_core_cpuslocked() for per-core operations
x86/msr-index: Define INTEGRITY_CAPABILITIES MSR
x86/microcode/intel: Expose collect_cpu_info_early() for IFS
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and timekeeping updates from Thomas Gleixner:
- Expose CLOCK_TAI to instrumentation to aid with TSN debugging.
- Ensure that the clockevent is stopped when there is no timer armed to
avoid pointless wakeups.
- Make the sched clock frequency handling and rounding consistent.
- Provide a better debugobject hint for delayed works. The timer
callback is always the same, which makes it difficult to identify the
underlying work. Use the work function as a hint instead.
- Move the timer specific sysctl code into the timer subsystem.
- The usual set of improvements and cleanups
* tag 'timers-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Provide a better debugobjects hint for delayed works
time/sched_clock: Fix formatting of frequency reporting code
time/sched_clock: Use Hz as the unit for clock rate reporting below 4kHz
time/sched_clock: Round the frequency reported to nearest rather than down
timekeeping: Consolidate fast timekeeper
timekeeping: Annotate ktime_get_boot_fast_ns() with data_race()
timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped
timekeeping: Introduce fast accessor to clock tai
tracing/timer: Add missing argument documentation of trace points
clocksource: Replace cpumask_weight() with cpumask_empty()
timers: Move timer sysctl into the timer code
clockevents: Use dedicated list iterator variable
timers: Simplify calc_index()
timers: Initialize base::next_expiry_recalc in timers_prepare_cpu()
|
|
git://git.kernel.dk/linux-block
Pull io_uring NVMe command passthrough from Jens Axboe:
"On top of everything else, this adds support for passthrough for
io_uring.
The initial feature for this is NVMe passthrough support, which allows
non-filesystem based IO commands and admin commands.
To support this, io_uring grows support for SQE and CQE members that
are twice as big, allowing to pass in a full NVMe command without
having to copy data around. And to complete with more than just a
single 32-bit value as the output"
* tag 'for-5.19/io_uring-passthrough-2022-05-22' of git://git.kernel.dk/linux-block: (22 commits)
io_uring: cleanup handling of the two task_work lists
nvme: enable uring-passthrough for admin commands
nvme: helper for uring-passthrough checks
blk-mq: fix passthrough plugging
nvme: add vectored-io support for uring-cmd
nvme: wire-up uring-cmd support for io-passthru on char-device.
nvme: refactor nvme_submit_user_cmd()
block: wire-up support for passthrough plugging
fs,io_uring: add infrastructure for uring-cmd
io_uring: support CQE32 for nop operation
io_uring: enable CQE32
io_uring: support CQE32 in /proc info
io_uring: add tracing for additional CQE32 fields
io_uring: overflow processing for CQE32
io_uring: flush completions for CQE32
io_uring: modify io_get_cqe for CQE32
io_uring: add CQE32 completion processing
io_uring: add CQE32 setup processing
io_uring: change ring size calculation for CQE32
io_uring: store add. return values for CQE32
...
|
|
git://git.kernel.dk/linux-block
Pull io_uring socket() support from Jens Axboe:
"This adds support for socket(2) for io_uring. This is handy when using
direct / registered file descriptors with io_uring.
Outside of those two patches, a small series from Dylan on top that
improves the tracing by providing a text representation of the opcode
rather than needing to decode this by reading the header file every
time.
That sits in this branch as it was the last opcode added (until it
wasn't...)"
* tag 'for-5.19/io_uring-socket-2022-05-22' of git://git.kernel.dk/linux-block:
io_uring: use the text representation of ops in trace
io_uring: rename op -> opcode
io_uring: add io_uring_get_opcode
io_uring: add type to op enum
io_uring: add socket(2) support
net: add __sys_socket_file()
|
|
git://git.kernel.dk/linux-block
Pull io_uring xattr support from Jens Axboe:
"Support for the xattr variants"
* tag 'for-5.19/io_uring-xattr-2022-05-22' of git://git.kernel.dk/linux-block:
io_uring: cleanup error-handling around io_req_complete
io_uring: fix trace for reduced sqe padding
io_uring: add fgetxattr and getxattr support
io_uring: add fsetxattr and setxattr support
fs: split off do_getxattr from getxattr
fs: split off setxattr_copy and do_setxattr function from setxattr
|
|
Pull io_uring updates from Jens Axboe:
"Here are the main io_uring changes for 5.19. This contains:
- Fixes for sparse type warnings (Christoph, Vasily)
- Support for multi-shot accept (Hao)
- Support for io_uring managed fixed files, rather than always
needing the applicationt o manage the indices (me)
- Fix for a spurious poll wakeup (Dylan)
- CQE overflow fixes (Dylan)
- Support more types of cancelations (me)
- Support for co-operative task_work signaling, rather than always
forcing an IPI (me)
- Support for doing poll first when appropriate, rather than always
attempting a transfer first (me)
- Provided buffer cleanups and support for mapped buffers (me)
- Improve how io_uring handles inflight SCM files (Pavel)
- Speedups for registered files (Pavel, me)
- Organize the completion data in a struct in io_kiocb rather than
keep it in separate spots (Pavel)
- task_work improvements (Pavel)
- Cleanup and optimize the submission path, in general and for
handling links (Pavel)
- Speedups for registered resource handling (Pavel)
- Support sparse buffers and file maps (Pavel, me)
- Various fixes and cleanups (Almog, Pavel, me)"
* tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block: (111 commits)
io_uring: fix incorrect __kernel_rwf_t cast
io_uring: disallow mixed provided buffer group registrations
io_uring: initialize io_buffer_list head when shared ring is unregistered
io_uring: add fully sparse buffer registration
io_uring: use rcu_dereference in io_close
io_uring: consistently use the EPOLL* defines
io_uring: make apoll_events a __poll_t
io_uring: drop a spurious inline on a forward declaration
io_uring: don't use ERR_PTR for user pointers
io_uring: use a rwf_t for io_rw.flags
io_uring: add support for ring mapped supplied buffers
io_uring: add io_pin_pages() helper
io_uring: add buffer selection support to IORING_OP_NOP
io_uring: fix locking state for empty buffer group
io_uring: implement multishot mode for accept
io_uring: let fast poll support multishot
io_uring: add REQ_F_APOLL_MULTISHOT for requests
io_uring: add IORING_ACCEPT_MULTISHOT for accept
io_uring: only wake when the correct events are set
io_uring: avoid io-wq -EAGAIN looping for !IOPOLL
...
|
|
Fix the decision on when to generate an IDLE ACK by keeping a count of the
number of packets we've received, but not yet soft-ACK'd, and the number of
packets we've processed, but not yet hard-ACK'd, rather than trying to keep
track of which DATA sequence numbers correspond to those points.
We then generate an ACK when either counter exceeds 2. The counters are
both cleared when we transcribe the information into any sort of ACK packet
for transmission. IDLE and DELAY ACKs are skipped if both counters are 0
(ie. no change).
Fixes: 805b21b929e2 ("rxrpc: Send an ACK after every few DATA packets we receive")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Automatically generate trace tag enums from the symbol -> string mapping
tables rather than having the enums as well, thereby reducing duplicated
data.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move to using refcount_t rather than atomic_t for refcounts in rxrpc.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes following sparse warnings:
CHECK mm/vmscan.c
mm/vmscan.c: note: in included file (through
include/trace/trace_events.h, include/trace/define_trace.h,
include/trace/events/vmscan.h):
./include/trace/events/vmscan.h:281:1: sparse: warning:
cast to restricted isolate_mode_t
./include/trace/events/vmscan.h:281:1: sparse: warning:
restricted isolate_mode_t degrades to integer
Link: https://lkml.kernel.org/r/e85d7ff2-fd10-53f8-c24e-ba0458439c1b@openvz.org
Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, trace point mm_page_alloc_zone_locked() doesn't show correct
information.
First, when alloc_flag has ALLOC_HARDER/ALLOC_CMA, page can be allocated
from MIGRATE_HIGHATOMIC/MIGRATE_CMA. Nevertheless, tracepoint use
requested migration type not MIGRATE_HIGHATOMIC and MIGRATE_CMA.
Second, after commit 44042b4498728 ("mm/page_alloc: allow high-order pages
to be stored on the per-cpu lists") percpu-list can store high order
pages. But trace point determine whether it is a refiil of percpu-list by
comparing requested order and 0.
To handle these problems, make mm_page_alloc_zone_locked() only be called
by __rmqueue_smallest with correct migration type. With a new argument
called percpu_refill, it can show roughly whether it is a refill of
percpu-list.
Link: https://lkml.kernel.org/r/20220512025307.57924-1-vvghjk1234@gmail.com
Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Baik Song An <bsahn@etri.re.kr>
Cc: Hong Yeon Kim <kimhy@etri.re.kr>
Cc: Taeung Song <taeung@reallinux.co.kr>
Cc: <linuxgeek@linuxgeek.io>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently 'make C=1 fs/io_uring.o' generates sparse warning:
CHECK fs/io_uring.c
fs/io_uring.c: note: in included file (through
include/trace/trace_events.h, include/trace/define_trace.h, i
nclude/trace/events/io_uring.h):
./include/trace/events/io_uring.h:488:1:
warning: incorrect type in assignment (different base types)
expected unsigned int [usertype] op_flags
got restricted __kernel_rwf_t const [usertype] rw_flags
This happen on cast of sqe->rw_flags which is defined as __kernel_rwf_t,
this type is bitwise and requires __force attribute for any casts.
However rw_flags is a member of the union, and its access can be safely
replaced by using of its neighbours, so let's use poll32_events to fix
the sparse warning.
Signed-off-by: Vasily Averin <vvs@openvz.org>
Link: https://lore.kernel.org/r/6f009241-a63f-ae43-a04b-62841aaef293@openvz.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
drivers/net/ethernet/mellanox/mlx5/core/main.c
b33886971dbc ("net/mlx5: Initialize flow steering during driver probe")
40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support")
f2b41b32cde8 ("net/mlx5: Remove ipsec_ops function table")
https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/
16d42d313350 ("net/mlx5: Drain fw_reset when removing device")
8324a02c342a ("net/mlx5: Add exit route when waiting for FW")
https://lore.kernel.org/all/20220519114119.060ce014@canb.auug.org.au/
tools/testing/selftests/net/mptcp/mptcp_join.sh
e274f7154008 ("selftests: mptcp: add subflow limits test-cases")
b6e074e171bc ("selftests: mptcp: add infinite map testcase")
5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type")
https://lore.kernel.org/all/20220516111918.366d747f@canb.auug.org.au/
net/mptcp/options.c
ba2c89e0ea74 ("mptcp: fix checksum byte order")
1e39e5a32ad7 ("mptcp: infinite mapping sending")
ea66758c1795 ("tcp: allow MPTCP to update the announced window")
https://lore.kernel.org/all/20220519115146.751c3a37@canb.auug.org.au/
net/mptcp/pm.c
95d686517884 ("mptcp: fix subflow accounting on close")
4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs")
https://lore.kernel.org/all/20220516111435.72f35dca@canb.auug.org.au/
net/mptcp/subflow.c
ae66fb2ba6c3 ("mptcp: Do TCP fallback on early DSS checksum failure")
0348c690ed37 ("mptcp: add the fallback check")
f8d4bcacff3b ("mptcp: infinite mapping receiving")
https://lore.kernel.org/all/20220519115837.380bb8d4@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Clean up: This field is now always set to zero.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Replace the temporary fix from commit 4d5004451ab2 ("SUNRPC: Fix the
svc_deferred_event trace class") with the use of __sockaddr and
friends, which is the preferred solution (but only available in 5.18
and newer).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull io_uring fixes from Jens Axboe:
"Two small changes fixing issues from the 5.18 merge window:
- Fix wrong ordering of a tracepoint (Dylan)
- Fix MSG_RING on IOPOLL rings (me)"
* tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block:
io_uring: don't attempt to IOPOLL for MSG_RING requests
io_uring: fix ordering of args in io_uring_queue_async_work
|
|
The f2fs_gc uses a bitmap to indicate pinned sections, but when disabling
chckpoint, we call f2fs_gc() with NULL_SEGNO which selects the same dirty
segment as a victim all the time, resulting in checkpoint=disable failure,
for example. Let's pick another one, if we fail to collect it.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Add tracepoints for on-demand read mode. Currently following tracepoints
are added:
OPEN request / COPEN reply
CLOSE request
READ request / CREAD reply
write through anonymous fd
release of anonymous fd
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220425122143.56815-8-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Fscache/CacheFiles used to serve as a local cache for a remote
networking fs. A new on-demand read mode will be introduced for
CacheFiles, which can boost the scenario where on-demand read semantics
are needed, e.g. container image distribution.
The essential difference between these two modes is seen when a cache
miss occurs: In the original mode, the netfs will fetch the data from
the remote server and then write it to the cache file; in on-demand
read mode, fetching the data and writing it into the cache is delegated
to a user daemon.
As the first step, notify the user daemon when looking up cookie. In
this case, an anonymous fd is sent to the user daemon, through which the
user daemon can write the fetched data to the cache file. Since the user
daemon may move the anonymous fd around, e.g. through dup(), an object
ID uniquely identifying the cache file is also attached.
Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that
the cache file size shall be retrieved at runtime. This helps the
scenario where one cache file contains multiple netfs files, e.g. for
the purpose of deduplication. In this case, netfs itself has no idea the
size of the cache file, whilst the user daemon should give the hint on
it.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220509074028.74954-3-jefflexu@linux.alibaba.com
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Just let the one caller that wants optional WQ_HIGHPRI handling allocate
a separate btrfs_workqueue for that. This allows to rename struct
__btrfs_workqueue to btrfs_workqueue, remove a pointer indirection and
separate allocation for all btrfs_workqueue users and generally simplify
the code.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Fixes the following sparse warnings:
include/trace/events/*: sparse: cast to restricted gfp_t
include/trace/events/*: sparse: restricted gfp_t degrades to integer
gfp_t type is bitwise and requires __force attributes for any casts.
Link: https://lkml.kernel.org/r/331d88fe-f4f7-657c-02a2-d977f15fbff6@openvz.org
Signed-off-by: Vasily Averin <vvs@openvz.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add call_site, bytes_alloc and gfp_flags fields to the output of the
percpu_alloc_percpu ftrace event:
mkdir-4393 [001] 169.334788: percpu_alloc_percpu:
call_site=mem_cgroup_css_alloc+0xa6 reserved=0 is_atomic=0 size=2408 align=8
base_addr=0xffffc7117fc00000 off=402176 ptr=0x3dc867a62300 bytes_alloc=14448
gfp_flags=GFP_KERNEL_ACCOUNT
This is required to track memcg-accounted percpu allocations.
Link: https://lkml.kernel.org/r/a07be858-c8a3-7851-9086-e3262cbcf707@openvz.org
Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
No functional change.
- remove checkpoint=disable check for f2fs_write_checkpoint
- get sec_freed all the time
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Current atomic write has three major issues like below.
- keeps the updates in non-reclaimable memory space and they are even
hard to be migrated, which is not good for contiguous memory
allocation.
- disk spaces used for atomic files cannot be garbage collected, so
this makes it difficult for the filesystem to be defragmented.
- If atomic write operations hit the threshold of either memory usage
or garbage collection failure count, All the atomic write operations
will fail immediately.
To resolve the issues, I will keep a COW inode internally for all the
updates to be flushed from memory, when we need to flush them out in a
situation like high memory pressure. These COW inodes will be tagged
as orphan inodes to be reclaimed in case of sudden power-cut or system
failure during atomic writes.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Add tracing support which may be useful for debugging systems that fail to complete
In Field Scan tests.
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220506225410.1652287-11-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Fix arg ordering in TP_ARGS macro, which fixes the output.
Fixes: 502c87d65564c ("io-uring: Make tracepoints consistent.")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220512091834.728610-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
sched_switch event, 2022-01-20) added a new prev_state argument to the
sched_switch tracepoint, before the prev task_struct pointer.
This reordering of arguments broke BPF programs that use the raw
tracepoint (e.g. tp_btf programs). The type of the second argument has
changed and existing programs that assume a task_struct* argument
(e.g. for bpf_task_storage access) will now fail to verify.
If we instead append the new argument to the end, all existing programs
would continue to work and can conditionally extract the prev_state
argument on supported kernel versions.
Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20)
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel@fb.com
|
|
This adds tracing for the extra1 and extra2 fields.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-10-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There are no more aop flags left, so remove the parameter.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Add trace event to capture the moment of the call for updating the thermal
pressure value. It's helpful to investigate how often those events occur
in a system dealing with throttling. This trace event is needed since the
old 'cdev_update' might not be used by some drivers.
The old 'cdev_update' trace event only provides a cooling state
value: [0, n]. That state value then needs additional tools to translate
it: state -> freq -> capacity -> thermal pressure. This new trace event
just stores proper thermal pressure value in the trace buffer, no need
for additional logic. This is helpful for cooperation when someone can
simply sends to the list the trace buffer output from the platform (no
need from additional information from other subsystems).
There are also platforms which due to some design reasons don't use
cooling devices and thus don't trigger old 'cdev_update' trace event.
They are also important and measuring latency for the thermal signal
raising/decaying characteristics is in scope. This new trace event
would cover them as well.
We already have a trace point 'pelt_thermal_tp' which after a change to
trace event can be paired with this new 'thermal_pressure_update' and
derive more insight what is going on in the system under thermal pressure
(and why).
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20220427080806.1906-1-lukasz.luba@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
It is annoying to translate opcodes to textwhen tracing io_uring. Use the
io_uring_get_opcode function instead to use the text representation.
A downside here might have been that if the opcode is invalid it will not
be obvious, however the opcode is already overridden in these cases to
0 (NOP) in io_init_req(). Therefore this is a non issue.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220426082907.3600028-5-dylany@fb.com
[axboe: don't include register, those are not req opcodes]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
do this for consistency with the other trace messages
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220426082907.3600028-4-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This was used in Android for a long time. Let's upstream it.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
__pad2 is only 1 u64 now, the other one is addr3. Adjust the trace so
that it matches up.
Fixes: a56834e0fafe ("io_uring: add fgetxattr and getxattr support")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add trace function for overflowing CQ ring.
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In trace event class mptcp_dump_mpext, dump the newly added infinite_map
field of struct mptcp_dump_mpext too.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass a boolean flag to indicate if swiotlb needs to be enabled based on
the addressing needs, and replace the verbose argument with a set of
flags, including one to force enable bounce buffering.
Note that this patch removes the possibility to force xen-swiotlb use
with the swiotlb=force parameter on the command line on x86 (arm and
arm64 never supported that), but this interface will be restored shortly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|