summaryrefslogtreecommitdiff
path: root/fs/ceph/addr.c
AgeCommit message (Collapse)AuthorFilesLines
2022-05-10ceph: check folio PG_private bit instead of folio->privateXiubo Li1-4/+7
The pages in the file mapping maybe reclaimed and reused by other subsystems and the page->private maybe used as flags field or something else, if later that pages are used by page caches again the page->private maybe not cleared as expected. Here will check the PG_private bit instead of the folio->private. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/55421 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-10ceph: Convert to release_folioMatthew Wilcox (Oracle)1-12/+12
Use a folio throughout ceph_release_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-05-09mm,fs: Remove aops->readpageMatthew Wilcox (Oracle)1-1/+1
With all implementations of aops->readpage converted to aops->read_folio, we can stop checking whether it's set and remove the member from aops. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09fs: Convert netfs_readpage to netfs_read_folioMatthew Wilcox (Oracle)1-1/+1
This is straightforward because netfs already worked in terms of folios. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09fs: Introduce aops->read_folioMatthew Wilcox (Oracle)1-1/+1
Change all the callers of ->readpage to call ->read_folio in preference, if it exists. This is a transitional duplication, and will be removed by the end of the series. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-08fs: Remove flags parameter from aops->write_beginMatthew Wilcox (Oracle)1-1/+1
There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-05-08fs: Remove aop_flags parameter from netfs_write_begin()Matthew Wilcox (Oracle)1-1/+1
There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-04-01Merge tag 'netfs-prep-20220318' of ↵Linus Torvalds1-63/+53
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull netfs updates from David Howells: "Netfs prep for write helpers. Having had a go at implementing write helpers and content encryption support in netfslib, it seems that the netfs_read_{,sub}request structs and the equivalent write request structs were almost the same and so should be merged, thereby requiring only one set of alloc/get/put functions and a common set of tracepoints. Merging the structs also has the advantage that if a bounce buffer is added to the request struct, a read operation can be performed to fill the bounce buffer, the contents of the buffer can be modified and then a write operation can be performed on it to send the data wherever it needs to go using the same request structure all the way through. The I/O handlers would then transparently perform any required crypto. This should make it easier to perform RMW cycles if needed. The potentially common functions and structs, however, by their names all proclaim themselves to be associated with the read side of things. The bulk of these changes alter this in the following ways: - Rename struct netfs_read_{,sub}request to netfs_io_{,sub}request. - Rename some enums, members and flags to make them more appropriate. - Adjust some comments to match. - Drop "read"/"rreq" from the names of common functions. For instance, netfs_get_read_request() becomes netfs_get_request(). - The ->init_rreq() and ->issue_op() methods become ->init_request() and ->issue_read(). I've kept the latter as a read-specific function and in another branch added an ->issue_write() method. The driver source is then reorganised into a number of files: fs/netfs/buffered_read.c Create read reqs to the pagecache fs/netfs/io.c Dispatchers for read and write reqs fs/netfs/main.c Some general miscellaneous bits fs/netfs/objects.c Alloc, get and put functions fs/netfs/stats.c Optional procfs statistics. and future development can be fitted into this scheme, e.g.: fs/netfs/buffered_write.c Modify the pagecache fs/netfs/buffered_flush.c Writeback from the pagecache fs/netfs/direct_read.c DIO read support fs/netfs/direct_write.c DIO write support fs/netfs/unbuffered_write.c Write modifications directly back Beyond the above changes, there are also some changes that affect how things work: - Make fscache_end_operation() generally available. - In the netfs tracing header, generate enums from the symbol -> string mapping tables rather than manually coding them. - Add a struct for filesystems that uses netfslib to put into their inode wrapper structs to hold extra state that netfslib is interested in, such as the fscache cookie. This allows netfslib functions to be set in filesystem operation tables and jumped to directly without having to have a filesystem wrapper. - Add a member to the struct added above to track the remote inode length as that may differ if local modifications are buffered. We may need to supply an appropriate EOF pointer when storing data (in AFS for example). - Pass extra information to netfs_alloc_request() so that the ->init_request() hook can access it and retain information to indicate the origin of the operation. - Make the ->init_request() hook return an error, thereby allowing a filesystem that isn't allowed to cache an inode (ceph or cifs, for example) to skip readahead. - Switch to using refcount_t for subrequests and add tracepoints to log refcount changes for the request and subrequest structs. - Add a function to consolidate dispatching a read request. Similar code is used in three places and another couple are likely to be added in the future" Link: https://lore.kernel.org/all/2639515.1648483225@warthog.procyon.org.uk/ * tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Maintain netfs_i_context::remote_i_size netfs: Keep track of the actual remote file size netfs: Split some core bits out into their own file netfs: Split fs/netfs/read_helper.c netfs: Rename read_helper.c to io.c netfs: Prepare to split read_helper.c netfs: Add a function to consolidate beginning a read netfs: Add a netfs inode context ceph: Make ceph_init_request() check caps on readahead netfs: Change ->init_request() to return an error code netfs: Refactor arguments for netfs_alloc_read_request netfs: Adjust the netfs_failure tracepoint to indicate non-subreq lines netfs: Trace refcounting on the netfs_io_subrequest struct netfs: Trace refcounting on the netfs_io_request struct netfs: Adjust the netfs_rreq tracepoint slightly netfs: Split netfs_io_* object handling out netfs: Finish off rename of netfs_read_request to netfs_io_request netfs: Rename netfs_read_*request to netfs_io_*request netfs: Generate enums from trace symbol mapping lists fscache: export fscache_end_operation()
2022-03-25Merge tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds1-128/+112
Pull ceph updates from Ilya Dryomov: "The highlights are: - several changes to how snap context and snap realms are tracked (Xiubo Li). In particular, this should resolve a long-standing issue of high kworker CPU usage and various stalls caused by needless iteration over all inodes in the snap realm. - async create fixes to address hangs in some edge cases (Jeff Layton) - support for getvxattr MDS op for querying server-side xattrs, such as file/directory layouts and ephemeral pins (Milind Changire) - average latency is now maintained for all metrics (Venky Shankar) - some tweaks around handling inline data to make it fit better with netfs helper library (David Howells) Also a couple of memory leaks got plugged along with a few assorted fixups. Last but not least, Xiubo has stepped up to serve as a CephFS co-maintainer" * tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-client: (27 commits) ceph: fix memory leak in ceph_readdir when note_last_dentry returns error ceph: uninitialized variable in debug output ceph: use tracked average r/w/m latencies to display metrics in debugfs ceph: include average/stdev r/w/m latency in mds metrics ceph: track average r/w/m latency ceph: use ktime_to_timespec64() rather than jiffies_to_timespec64() ceph: assign the ci only when the inode isn't NULL ceph: fix inode reference leakage in ceph_get_snapdir() ceph: misc fix for code style and logs ceph: allocate capsnap memory outside of ceph_queue_cap_snap() ceph: do not release the global snaprealm until unmounting ceph: remove incorrect and unused CEPH_INO_DOTDOT macro MAINTAINERS: add Xiubo Li as cephfs co-maintainer ceph: eliminate the recursion when rebuilding the snap context ceph: do not update snapshot context when there is no new snapshot ceph: zero the dir_entries memory when allocating it ceph: move to a dedicated slabcache for ceph_cap_snap ceph: add getvxattr op libceph: drop else branches in prepare_read_data{,_cont} ceph: fix comments mentioning i_mutex ...
2022-03-23Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds1-40/+43
Pull filesystem folio updates from Matthew Wilcox: "Primarily this series converts some of the address_space operations to take a folio instead of a page. Notably: - a_ops->is_partially_uptodate() takes a folio instead of a page and changes the type of the 'from' and 'count' arguments to make it obvious they're bytes. - a_ops->invalidatepage() becomes ->invalidate_folio() and has a similar type change. - a_ops->launder_page() becomes ->launder_folio() - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the address_space as an argument. There are a couple of other misc changes up front that weren't worth separating into their own pull request" * tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits) fs: Remove aops ->set_page_dirty fb_defio: Use noop_dirty_folio() fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio fs: Convert __set_page_dirty_buffers to block_dirty_folio nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio() mm: Convert swap_set_page_dirty() to swap_dirty_folio() ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio() btrfs: Convert extent_range_redirty_for_io() to use folios fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio btrfs: Convert from set_page_dirty to dirty_folio fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio() fs: Add aops->dirty_folio fs: Remove aops->launder_page orangefs: Convert launder_page to launder_folio nfs: Convert from launder_page to launder_folio fuse: Convert from launder_page to launder_folio ...
2022-03-23ceph: remove reliance on bdi congestionNeilBrown1-9/+13
The bdi congestion tracking in not widely used and will be removed. CEPHfs is one of a small number of filesystems that uses it, setting just the async (write) congestion flags at what it determines are appropriate times. The only remaining effect of the async flag is to cause (some) WB_SYNC_NONE writes to be skipped. So instead of setting the flag, set an internal flag and change: - .writepages to do nothing if WB_SYNC_NONE and the flag is set - .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE and the flag is set. The writepages change causes a behavioural change in that pageout() can now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be called on the page which (I think) wil further delay the next attempt at writeout. This might be a good thing. Link: https://lkml.kernel.org/r/164549983739.9187.14895675781408171186.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-21ceph: uninitialized variable in debug outputDan Carpenter1-1/+2
If read_mapping_folio() fails then "inline_version" is printed without being initialized. [ jlayton: use CEPH_INLINE_NONE instead of "-1" ] Fixes: 083db6fd3e73 ("ceph: uninline the data on a file opened for writing") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-03-18netfs: Add a netfs inode contextDavid Howells1-27/+4
Add a netfs_i_context struct that should be included in the network filesystem's own inode struct wrapper, directly after the VFS's inode struct, e.g.: struct my_inode { struct { /* These must be contiguous */ struct inode vfs_inode; struct netfs_i_context netfs_ctx; }; }; The netfs_i_context struct so far contains a single field for the network filesystem to use - the cache cookie: struct netfs_i_context { ... struct fscache_cookie *cache; }; Three functions are provided to help with this: (1) void netfs_i_context_init(struct inode *inode, const struct netfs_request_ops *ops); Initialise the netfs context and set the operations. (2) struct netfs_i_context *netfs_i_context(struct inode *inode); Find the netfs context from the VFS inode. (3) struct inode *netfs_inode(struct netfs_i_context *ctx); Find the VFS inode from the netfs context. Changes ======= ver #4) - Fix netfs_is_cache_enabled() to check cookie->cache_priv to see if a cache is present[3]. - Fix netfs_skip_folio_read() to zero out all of the page, not just some of it[3]. ver #3) - Split out the bit to move ceph cap-getting on readahead into ceph_init_request()[1]. - Stick in a comment to the netfs inode structs indicating the contiguity requirements[2]. ver #2) - Adjust documentation to match. - Use "#if IS_ENABLED()" in netfs_i_cookie(), not "#ifdef". - Move the cap check from ceph_readahead() to ceph_init_request() to be called from netfslib. - Remove ceph_readahead() and use netfs_readahead() directly instead. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/8af0d47f17d89c06bbf602496dd845f2b0bf25b3.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/beaf4f6a6c2575ed489adb14b257253c868f9a5c.camel@kernel.org/ [2] Link: https://lore.kernel.org/r/3536452.1647421585@warthog.procyon.org.uk/ [3] Link: https://lore.kernel.org/r/164622984545.3564931.15691742939278418580.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678213320.1200972.16807551936267647470.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692909854.2099075.9535537286264248057.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/306388.1647595110@warthog.procyon.org.uk/ # v4
2022-03-18ceph: Make ceph_init_request() check caps on readaheadDavid Howells1-28/+41
Move the caps check from ceph_readahead() to ceph_init_request(), conditional on the origin being NETFS_READAHEAD so that in a future patch, ceph can point its ->readahead() vector directly at netfs_readahead(). Changes ======= ver #4) - Move the check for NETFS_READAHEAD up in ceph_init_request()[2]. ver #3) - Split from the patch to add a netfs inode context[1]. - Need to store the caps got in rreq->netfs_priv for later freeing. Signed-off-by: David Howells <dhowells@redhat.com> cc: ceph-devel@vger.kernel.org cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/8af0d47f17d89c06bbf602496dd845f2b0bf25b3.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/dd054c962818716e718bd9b446ee5322ca097675.camel@redhat.com/ [2] Link: https://lore.kernel.org/r/164692907694.2099075.10081819855690054094.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/2533821.1647006574@warthog.procyon.org.uk/ # v4
2022-03-18netfs: Finish off rename of netfs_read_request to netfs_io_requestDavid Howells1-3/+3
Adjust helper function names and comments after mass rename of struct netfs_read_*request to struct netfs_io_*request. Changes ======= ver #2) - Make the changes in the docs also. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164622992433.3564931.6684311087845150271.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678196111.1200972.5001114956865989528.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692892567.2099075.13895804222087028813.stgit@warthog.procyon.org.uk/ # v3
2022-03-18netfs: Rename netfs_read_*request to netfs_io_*requestDavid Howells1-8/+8
Rename netfs_read_*request to netfs_io_*request so that the same structures can be used for the write helpers too. perl -p -i -e 's/netfs_read_(request|subrequest)/netfs_io_$1/g' \ `git grep -l 'netfs_read_\(sub\|\)request'` perl -p -i -e 's/nr_rd_ops/nr_outstanding/g' \ `git grep -l nr_rd_ops` perl -p -i -e 's/nr_wr_ops/nr_copy_ops/g' \ `git grep -l nr_wr_ops` perl -p -i -e 's/netfs_read_source/netfs_io_source/g' \ `git grep -l 'netfs_read_source'` perl -p -i -e 's/netfs_io_request_ops/netfs_request_ops/g' \ `git grep -l 'netfs_io_request_ops'` perl -p -i -e 's/init_rreq/init_request/g' \ `git grep -l 'init_rreq'` Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164622988070.3564931.7089670190434315183.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678195157.1200972.366609966927368090.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692891535.2099075.18435198075367420588.stgit@warthog.procyon.org.uk/ # v3
2022-03-15fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio()Matthew Wilcox (Oracle)1-14/+13
Convert all users of fscache_set_page_dirty to use fscache_dirty_folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15ceph: Convert from invalidatepage to invalidate_folioMatthew Wilcox (Oracle)1-17/+17
Mostly a straightforward conversion. Delete the pointer from the debugging output as this has no value. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15ceph: Use folio_invalidate()Matthew Wilcox (Oracle)1-9/+13
Instead of calling ->invalidatepage directly, use folio_invalidate(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-01ceph: uninline the data on a file opened for writingDavid Howells1-107/+47
If a ceph file is made up of inline data, uninline that in the ceph_open() rather than in ceph_page_mkwrite(), ceph_write_iter(), ceph_fallocate() or ceph_write_begin(). This makes it easier to convert to using the netfs library for VM write hooks. Should this also take the inode lock for the duration on uninlining to prevent a race with truncation? [ jlayton: fix up folio locking, update i_inline_version after write ] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-03-01ceph: make ceph_netfs_issue_op() handle inlined dataDavid Howells1-17/+60
Make ceph_netfs_issue_op() handle inlined data on page 0. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-03-01ceph: switch netfs read ops to use rreq->inode instead of rreq->mapping->hostJeff Layton1-3/+3
One fewer pointer dereference, and in the future we may not be able to count on the mapping pointer being populated (e.g. in the DIO case). Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-22netfs: Make ops->init_rreq() optionalJeffle Xu1-5/+0
Make the ops->init_rreq() callback optional. This isn't required for the erofs changes I'm implementing to do on-demand read through fscache[1]. Further, ceph has an empty init_rreq method that can then be removed and it's marked optional in the documentation. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/20211227125444.21187-1-jefflexu@linux.alibaba.com/ [1] Link: https://lore.kernel.org/r/20211228124419.103020-1-jefflexu@linux.alibaba.com Link: https://lore.kernel.org/r/164251410387.3435901.2504600788262093313.stgit@warthog.procyon.org.uk/ # v1
2022-01-129p, afs, ceph, nfs: Use current_is_kswapd() rather than ↵David Howells1-1/+2
gfpflags_allow_blocking() In 9p, afs ceph, and nfs, gfpflags_allow_blocking() (which wraps a test for __GFP_DIRECT_RECLAIM being set) is used to determine if ->releasepage() should wait for the completion of a DIO write to fscache with something like: if (folio_test_fscache(folio)) { if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS)) return false; folio_wait_fscache(folio); } Instead, current_is_kswapd() should be used instead. Note that this is based on a patch originally by Zhaoyang Huang[1]. In addition to extending it to the other network filesystems and putting it on top of my fscache rewrite, it also needs to include linux/swap.h in a bunch of places. Can current_is_kswapd() be moved to linux/mm.h? Changes ======= ver #5: - Dropping the changes for cifs. Originally-signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Co-developed-by: David Howells <dhowells@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: Steve French <smfrench@gmail.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: linux-cachefs@redhat.com cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/1638952658-20285-1-git-send-email-huangzhaoyang@gmail.com/ [1] Link: https://lore.kernel.org/r/164021590773.640689.16777975200823659231.stgit@warthog.procyon.org.uk/ # v4
2022-01-12ceph: add fscache writeback supportJeff Layton1-8/+59
When updating the backing store from the pagecache (a'la writepage or writepages), write to the cache first. This allows us to keep caching files even when they are being written, as long as we have appropriate caps. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20211129162907.149445-3-jlayton@kernel.org/ # v1 Link: https://lore.kernel.org/r/20211207134451.66296-3-jlayton@kernel.org/ # v2 Link: https://lore.kernel.org/r/163906985808.143852.1383891557313186623.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967190257.1823006.16713609520911954804.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021585020.640689.6765214932458435472.stgit@warthog.procyon.org.uk/ # v4
2022-01-12ceph: conversion to new fscache APIJeff Layton1-14/+20
Now that the fscache API has been reworked and simplified, change ceph over to use it. With the old API, we would only instantiate a cookie when the file was open for reads. Change it to instantiate the cookie when the inode is instantiated and call use/unuse when the file is opened/closed. Also, ensure we resize the cached data on truncates, and invalidate the cache in response to the appropriate events. This will allow us to plumb in write support later. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20211129162907.149445-2-jlayton@kernel.org/ # v1 Link: https://lore.kernel.org/r/20211207134451.66296-2-jlayton@kernel.org/ # v2 Link: https://lore.kernel.org/r/163906984277.143852.14697110691303589000.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967188351.1823006.5065634844099079351.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021581427.640689.14128682147127509264.stgit@warthog.procyon.org.uk/ # v4
2021-11-13Merge tag 'ceph-for-5.16-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds1-17/+12
Pull ceph updates from Ilya Dryomov: "One notable change here is that async creates and unlinks introduced in 5.7 are now enabled by default. This should greatly speed up things like rm, tar and rsync. To opt out, wsync mount option can be used. Other than that we have a pile of bug fixes all across the filesystem from Jeff, Xiubo and Kotresh and a metrics infrastructure rework from Luis" * tag 'ceph-for-5.16-rc1' of git://github.com/ceph/ceph-client: ceph: add a new metric to keep track of remote object copies libceph, ceph: move ceph_osdc_copy_from() into cephfs code ceph: clean-up metrics data structures to reduce code duplication ceph: split 'metric' debugfs file into several files ceph: return the real size read when it hits EOF ceph: properly handle statfs on multifs setups ceph: shut down mount on bad mdsmap or fsmap decode ceph: fix mdsmap decode when there are MDS's beyond max_mds ceph: ignore the truncate when size won't change with Fx caps issued ceph: don't rely on error_string to validate blocklisted session. ceph: just use ci->i_version for fscache aux info ceph: shut down access to inode when async create fails ceph: refactor remove_session_caps_cb ceph: fix auth cap handling logic in remove_session_caps_cb ceph: drop private list from remove_session_caps_cb ceph: don't use -ESTALE as special return code in try_get_cap_refs ceph: print inode numbers instead of pointer values ceph: enable async dirops by default libceph: drop ->monmap and err initialization ceph: convert to noop_direct_IO
2021-11-11netfs, 9p, afs, ceph: Use foliosDavid Howells1-37/+43
Convert the netfs helper library to use folios throughout, convert the 9p and afs filesystems to use folios in their file I/O paths and convert the ceph filesystem to use just enough folios to compile. With these changes, afs passes -g quick xfstests. Changes ======= ver #5: - Got rid of folio_end{io,_read,_write}() and inlined the stuff it does instead (Willy decided he didn't want this after all). ver #4: - Fixed a bug in afs_redirty_page() whereby it didn't set the next page index in the loop and returned too early. - Simplified a check in v9fs_vfs_write_folio_locked()[1]. - Undid a change to afs_symlink_readpage()[1]. - Used offset_in_folio() in afs_write_end()[1]. - Changed from using page_endio() to folio_end{io,_read,_write}()[1]. ver #2: - Add 9p foliation. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dominique Martinet <asmadeus@codewreck.org> Tested-by: kafs-testing@auristor.com cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Dominique Martinet <asmadeus@codewreck.org> cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/YYKa3bfQZxK5/wDN@casper.infradead.org/ [1] Link: https://lore.kernel.org/r/2408234.1628687271@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/162877311459.3085614.10601478228012245108.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/162981153551.1901565.3124454657133703341.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/163005745264.2472992.9852048135392188995.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163584187452.4023316.500389675405550116.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/163649328026.309189.1124218109373941936.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/163657852454.834781.9265101983152100556.stgit@warthog.procyon.org.uk/ # v5
2021-11-08ceph: shut down access to inode when async create failsJeff Layton1-5/+11
Add proper error handling for when an async create fails. The inode never existed, so any dirty caps or data are now toast. We already d_drop the dentry in that case, but the now-stale inode may still be around. We want to shut down access to these inodes, and ensure that they can't harbor any more dirty data, which can cause problems at umount time. When this occurs, flag such inodes as being SHUTDOWN, and trash any caps and cap flushes that may be in flight for them, and invalidate the pagecache for the inode. Add a new helper that can check whether an inode or an entire mount is now shut down, and call it instead of accessing the mount_state directly in places where we test that now. URL: https://tracker.ceph.com/issues/51279 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-11-08ceph: convert to noop_direct_IOJeff Layton1-12/+1
We have our own op, but the WARN_ON is not terribly helpful, and it's otherwise identical to the noop one. Just use that. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-09-09Merge tag 'ceph-for-5.15-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds1-1/+1
Pull ceph updates from Ilya Dryomov: - a set of patches to address fsync stalls caused by depending on periodic rather than triggered MDS journal flushes in some cases (Xiubo Li) - a fix for mtime effectively not getting updated in case of competing writers (Jeff Layton) - a couple of fixes for inode reference leaks and various WARNs after "umount -f" (Xiubo Li) - a new ceph.auth_mds extended attribute (Jeff Layton) - a smattering of fixups and cleanups from Jeff, Xiubo and Colin. * tag 'ceph-for-5.15-rc1' of git://github.com/ceph/ceph-client: ceph: fix dereference of null pointer cf ceph: drop the mdsc_get_session/put_session dout messages ceph: lockdep annotations for try_nonblocking_invalidate ceph: don't WARN if we're forcibly removing the session caps ceph: don't WARN if we're force umounting ceph: remove the capsnaps when removing caps ceph: request Fw caps before updating the mtime in ceph_write_iter ceph: reconnect to the export targets on new mdsmaps ceph: print more information when we can't find snaprealm ceph: add ceph_change_snap_realm() helper ceph: remove redundant initializations from mdsc and session ceph: cancel delayed work instead of flushing on mdsc teardown ceph: add a new vxattr to return auth mds for an inode ceph: remove some defunct forward declarations ceph: flush the mdlog before waiting on unsafe reqs ceph: flush mdlog before umounting ceph: make iterate_sessions a global symbol ceph: make ceph_create_session_msg a global symbol ceph: fix comment about short copies in ceph_write_end ceph: fix memory leak on decode error in ceph_handle_caps
2021-09-02ceph: fix comment about short copies in ceph_write_endJeff Layton1-1/+1
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-07-13ceph: Fix race between hole punch and page faultJan Kara1-3/+6
Ceph has a following race between hole punching and page fault: CPU1 CPU2 ceph_fallocate() ... ceph_zero_pagecache_range() ceph_filemap_fault() faults in page in the range being punched ceph_zero_objects() And now we have a page in punched range with invalid data. Fix the problem by using mapping->invalidate_lock similarly to other filesystems. Note that using invalidate_lock also fixes a similar race wrt ->readpage(). CC: Jeff Layton <jlayton@kernel.org> CC: ceph-devel@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>
2021-06-29ceph: add IO size metrics supportXiubo Li1-6/+8
This will collect IO's total size and then calculate the average size, and also will collect the min/max IO sizes. The debugfs will show the size metrics in bytes and will let the userspace applications to switch to what they need. URL: https://tracker.ceph.com/issues/49913 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-06-29ceph: make ceph_netfs_read_ops staticWei Yongjun1-1/+1
The sparse tool complains as follows: fs/ceph/addr.c:316:37: warning: symbol 'ceph_netfs_read_ops' was not declared. Should it be static? This symbol is not used outside of addr.c, so mark it static. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-06-29ceph: remove bogus checks and WARN_ONs from ceph_set_page_dirtyJeff Layton1-9/+1
The checks for page->mapping are odd, as set_page_dirty is an address_space operation, and I don't see where it would be called on a non-pagecache page. The warning about the page lock also seems bogus. The comment over set_page_dirty() says that it can be called without the page lock in some rare cases. I don't think we want to warn if that's the case. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-28ceph: convert some PAGE_SIZE invocations to thp_size()Jeff Layton1-26/+25
Start preparing to allow the use of THPs in the pagecache with ceph by making it use thp_size() in lieu of PAGE_SIZE in the appropriate places. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-28ceph: drop pinned_page parameter from ceph_get_capsJeff Layton1-7/+2
All of the existing callers that don't set this to NULL just drop the page reference at some arbitrary point later in processing. There's no point in keeping a page reference that we don't use, so just drop the reference immediately after checking the Uptodate flag. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-28ceph: only check pool permissions for regular filesJeff Layton1-0/+4
There is no need to do a ceph_pool_perm_check() on anything that isn't a regular file, as the MDS is what handles talking to the OSD in those cases. Just return 0 if it's not a regular file. Reported-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-28ceph: rename the metric helpersXiubo Li1-4/+4
Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-28ceph: use attach/detach_page_private for tracking snap contextJeff Layton1-11/+7
There is some ambiguity around the use of PagePrivate. It's generally expected in core code that if PagePrivate is set then you have a reference to it. It's not clear that ceph always does (and I believe it may not). Change ceph to use attach/detach_page_private so that we keep a reference to the page until the snap context is detached. Link: https://lore.kernel.org/ceph-devel/2503810.1616508988@warthog.procyon.org.uk/T/#mf29e5abbb0ec8035cde0de30778690de7d956f84 Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-28ceph: convert ceph_readpages to ceph_readaheadJeff Layton1-199/+31
Convert ceph_readpages to ceph_readahead and make it use netfs_readahead. With this we can rip out a lot of the old readpage/readpages infrastructure. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-28ceph: convert ceph_write_begin to netfs_write_beginJeff Layton1-129/+57
Convert ceph_write_begin to use the netfs_write_begin helper. Most of the ops we need for it are already in place from the readpage conversion but we do add a new check_write_begin op since ceph needs to be able to vet whether there is an incompatible writeback already in flight before reading in the page. With this, we can also remove the old ceph_do_readpage helper. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-28ceph: convert ceph_readpage to netfs_readpageJeff Layton1-10/+158
Have the ceph KConfig select NETFS_SUPPORT. Add a new netfs ops structure and the operations for it. Convert ceph_readpage to use the new netfs_readpage helper. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-28ceph: rework PageFsCache handlingJeff Layton1-1/+8
With the new fscache API, the PageFsCache bit now indicates that the page is being written to the cache and shouldn't be modified or released until it's finished. Change releasepage and invalidatepage to wait on that bit before returning. Also define FSCACHE_USE_NEW_IO_API so that we opt into the new fscache API. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-04-28ceph: rip out old fscache readpage handlingJeff Layton1-29/+2
With the new netfs read helper functions, we won't need a lot of this infrastructure as it handles the pagecache pages itself. Rip out the read handling for now, and much of the old infrastructure that deals in individual pages. The cookie handling is mostly unchanged, however. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-02-16ceph: allow queueing cap/snap handling after putting cap referencesJeff Layton1-1/+1
Testing with the fscache overhaul has triggered some lockdep warnings about circular lock dependencies involving page_mkwrite and the mmap_lock. It'd be better to do the "real work" without the mmap lock being held. Change the skip_checking_caps parameter in __ceph_put_cap_refs to an enum, and use that to determine whether to queue check_caps, do it synchronously or not at all. Change ceph_page_mkwrite to do a ceph_put_cap_refs_async(). Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15ceph: pass down the flags to grab_cache_page_write_beginJeff Layton1-1/+1
write_begin operations are passed a flags parameter that we need to mirror here, so that we don't (e.g.) recurse back into filesystem code inappropriately. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-15ceph: add new RECOVER mount_state when recovering sessionJeff Layton1-2/+2
When recovering a session (a'la recover_session=clean), we want to do all of the operations that we do on a forced umount, but changing the mount state to SHUTDOWN is can cause queued MDS requests to fail when the session comes back. Most of those can idle until the session is recovered in this situation. Reserve SHUTDOWN state for forced umount, and make a new RECOVER state for the forced reconnect situation. Change several tests for equality with SHUTDOWN to test for that or RECOVER. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-10-12ceph: promote to unsigned long long before shiftingMatthew Wilcox (Oracle)1-1/+1
On 32-bit systems, this shift will overflow for files larger than 4GB. Cc: stable@vger.kernel.org Fixes: 61f68816211e ("ceph: check caps in filemap_fault and page_mkwrite") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>