1 files changed, 346 insertions, 790 deletions
diff --git a/Documentation/filesystems/caching/netfs-api.rst b/Documentation/filesystems/caching/netfs-api.rst
index d9f14b8610ba..f84e9ffdf0b4 100644
--- a/Documentation/filesystems/caching/netfs-api.rst
+++ b/Documentation/filesystems/caching/netfs-api.rst
@@ -1,896 +1,452 @@
 .. SPDX-License-Identifier: GPL-2.0
 
-===============================
-FS-Cache Network Filesystem API
-===============================
+==============================
+Network Filesystem Caching API
+==============================
 
-There's an API by which a network filesystem can make use of the FS-Cache
-facilities.  This is based around a number of principles:
+Fscache provides an API by which a network filesystem can make use of local
+caching facilities.  The API is arranged around a number of principles:
 
- (1) Caches can store a number of different object types.  There are two main
-     object types: indices and files.  The first is a special type used by
-     FS-Cache to make finding objects faster and to make retiring of groups of
-     objects easier.
+ (1) A cache is logically organised into volumes and data storage objects
+     within those volumes.
 
- (2) Every index, file or other object is represented by a cookie.  This cookie
-     may or may not have anything associated with it, but the netfs doesn't
-     need to care.
+ (2) Volumes and data storage objects are represented by various types of
+     cookie.
 
- (3) Barring the top-level index (one entry per cached netfs), the index
-     hierarchy for each netfs is structured according the whim of the netfs.
+ (3) Cookies have keys that distinguish them from their peers.
 
-This API is declared in <linux/fscache.h>.
+ (4) Cookies have coherency data that allows a cache to determine if the
+     cached data is still valid.
 
-.. This document contains the following sections:
-
-	 (1) Network filesystem definition
-	 (2) Index definition
-	 (3) Object definition
-	 (4) Network filesystem (un)registration
-	 (5) Cache tag lookup
-	 (6) Index registration
-	 (7) Data file registration
-	 (8) Miscellaneous object registration
- 	 (9) Setting the data file size
-	(10) Page alloc/read/write
-	(11) Page uncaching
-	(12) Index and data file consistency
-	(13) Cookie enablement
-	(14) Miscellaneous cookie operations
-	(15) Cookie unregistration
-	(16) Index invalidation
-	(17) Data file invalidation
-	(18) FS-Cache specific page flags.
-
-
-Network Filesystem Definition
-=============================
-
-FS-Cache needs a description of the network filesystem.  This is specified
-using a record of the following structure::
-
-	struct fscache_netfs {
-		uint32_t			version;
-		const char			*name;
-		struct fscache_cookie		*primary_index;
-		...
-	};
-
-This first two fields should be filled in before registration, and the third
-will be filled in by the registration function; any other fields should just be
-ignored and are for internal use only.
-
-The fields are:
-
- (1) The name of the netfs (used as the key in the toplevel index).
-
- (2) The version of the netfs (if the name matches but the version doesn't, the
-     entire in-cache hierarchy for this netfs will be scrapped and begun
-     afresh).
-
- (3) The cookie representing the primary index will be allocated according to
-     another parameter passed into the registration function.
-
-For example, kAFS (linux/fs/afs/) uses the following definitions to describe
-itself::
-
-	struct fscache_netfs afs_cache_netfs = {
-		.version	= 0,
-		.name		= "afs",
-	};
-
-
-Index Definition
-================
-
-Indices are used for two purposes:
-
- (1) To aid the finding of a file based on a series of keys (such as AFS's
-     "cell", "volume ID", "vnode ID").
-
- (2) To make it easier to discard a subset of all the files cached based around
-     a particular key - for instance to mirror the removal of an AFS volume.
-
-However, since it's unlikely that any two netfs's are going to want to define
-their index hierarchies in quite the same way, FS-Cache tries to impose as few
-restraints as possible on how an index is structured and where it is placed in
-the tree.  The netfs can even mix indices and data files at the same level, but
-it's not recommended.
-
-Each index entry consists of a key of indeterminate length plus some auxiliary
-data, also of indeterminate length.
-
-There are some limits on indices:
-
- (1) Any index containing non-index objects should be restricted to a single
-     cache.  Any such objects created within an index will be created in the
-     first cache only.  The cache in which an index is created can be
-     controlled by cache tags (see below).
-
- (2) The entry data must be atomically journallable, so it is limited to about
-     400 bytes at present.  At least 400 bytes will be available.
-
- (3) The depth of the index tree should be judged with care as the search
-     function is recursive.  Too many layers will run the kernel out of stack.
-
-
-Object Definition
-=================
-
-To define an object, a structure of the following type should be filled out::
-
-	struct fscache_cookie_def
-	{
-		uint8_t name[16];
-		uint8_t type;
-
-		struct fscache_cache_tag *(*select_cache)(
-			const void *parent_netfs_data,
-			const void *cookie_netfs_data);
-
-		enum fscache_checkaux (*check_aux)(void *cookie_netfs_data,
-						   const void *data,
-						   uint16_t datalen,
-						   loff_t object_size);
-
-		void (*get_context)(void *cookie_netfs_data, void *context);
-
-		void (*put_context)(void *cookie_netfs_data, void *context);
-
-		void (*mark_pages_cached)(void *cookie_netfs_data,
-					  struct address_space *mapping,
-					  struct pagevec *cached_pvec);
-	};
-
-This has the following fields:
-
- (1) The type of the object [mandatory].
-
-     This is one of the following values:
-
-	FSCACHE_COOKIE_TYPE_INDEX
-	    This defines an index, which is a special FS-Cache type.
-
-	FSCACHE_COOKIE_TYPE_DATAFILE
-	    This defines an ordinary data file.
-
-	Any other value between 2 and 255
-	    This defines an extraordinary object such as an XATTR.
-
- (2) The name of the object type (NUL terminated unless all 16 chars are used)
-     [optional].
-
- (3) A function to select the cache in which to store an index [optional].
-
-     This function is invoked when an index needs to be instantiated in a cache
-     during the instantiation of a non-index object.  Only the immediate index
-     parent for the non-index object will be queried.  Any indices above that
-     in the hierarchy may be stored in multiple caches.  This function does not
-     need to be supplied for any non-index object or any index that will only
-     have index children.
-
-     If this function is not supplied or if it returns NULL then the first
-     cache in the parent's list will be chosen, or failing that, the first
-     cache in the master list.
-
- (4) A function to check the auxiliary data [optional].
-
-     This function will be called to check that a match found in the cache for
-     this object is valid.  For instance with AFS it could check the auxiliary
-     data against the data version number returned by the server to determine
-     whether the index entry in a cache is still valid.
-
-     If this function is absent, it will be assumed that matching objects in a
-     cache are always valid.
-
-     The function is also passed the cache's idea of the object size and may
-     use this to manage coherency also.
-
-     If present, the function should return one of the following values:
-
-	FSCACHE_CHECKAUX_OKAY
-	    - the entry is okay as is
-
-	FSCACHE_CHECKAUX_NEEDS_UPDATE
-	    - the entry requires update
-
-	FSCACHE_CHECKAUX_OBSOLETE
-	    - the entry should be deleted
+ (5) I/O is done asynchronously where possible.
 
-     This function can also be used to extract data from the auxiliary data in
-     the cache and copy it into the netfs's structures.
+This API is used by::
 
- (5) A pair of functions to manage contexts for the completion callback
-     [optional].
+	#include <linux/fscache.h>.
 
-     The cache read/write functions are passed a context which is then passed
-     to the I/O completion callback function.  To ensure this context remains
-     valid until after the I/O completion is called, two functions may be
-     provided: one to get an extra reference on the context, and one to drop a
-     reference to it.
-
-     If the context is not used or is a type of object that won't go out of
-     scope, then these functions are not required.  These functions are not
-     required for indices as indices may not contain data.  These functions may
-     be called in interrupt context and so may not sleep.
-
- (6) A function to mark a page as retaining cache metadata [optional].
-
-     This is called by the cache to indicate that it is retaining in-memory
-     information for this page and that the netfs should uncache the page when
-     it has finished.  This does not indicate whether there's data on the disk
-     or not.  Note that several pages at once may be presented for marking.
-
-     The PG_fscache bit is set on the pages before this function would be
-     called, so the function need not be provided if this is sufficient.
-
-     This function is not required for indices as they're not permitted data.
-
- (7) A function to unmark all the pages retaining cache metadata [mandatory].
-
-     This is called by FS-Cache to indicate that a backing store is being
-     unbound from a cookie and that all the marks on the pages should be
-     cleared to prevent confusion.  Note that the cache will have torn down all
-     its tracking information so that the pages don't need to be explicitly
-     uncached.
-
-     This function is not required for indices as they're not permitted data.
-
-
-Network Filesystem (Un)registration
-===================================
-
-The first step is to declare the network filesystem to the cache.  This also
-involves specifying the layout of the primary index (for AFS, this would be the
-"cell" level).
-
-The registration function is::
-
-	int fscache_register_netfs(struct fscache_netfs *netfs);
-
-It just takes a pointer to the netfs definition.  It returns 0 or an error as
-appropriate.
-
-For kAFS, registration is done as follows::
-
-	ret = fscache_register_netfs(&afs_cache_netfs);
-
-The last step is, of course, unregistration::
-
-	void fscache_unregister_netfs(struct fscache_netfs *netfs);
-
-
-Cache Tag Lookup
-================
-
-FS-Cache permits the use of more than one cache.  To permit particular index
-subtrees to be bound to particular caches, the second step is to look up cache
-representation tags.  This step is optional; it can be left entirely up to
-FS-Cache as to which cache should be used.  The problem with doing that is that
-FS-Cache will always pick the first cache that was registered.
-
-To get the representation for a named tag::
-
-	struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name);
-
-This takes a text string as the name and returns a representation of a tag.  It
-will never return an error.  It may return a dummy tag, however, if it runs out
-of memory; this will inhibit caching with this tag.
-
-Any representation so obtained must be released by passing it to this function::
-
-	void fscache_release_cache_tag(struct fscache_cache_tag *tag);
+.. This document contains the following sections:
 
-The tag will be retrieved by FS-Cache when it calls the object definition
-operation select_cache().
+	 (1) Overview
+	 (2) Volume registration
+	 (3) Data file registration
+	 (4) Declaring a cookie to be in use
+	 (5) Resizing a data file (truncation)
+	 (6) Data I/O API
+	 (7) Data file coherency
+	 (8) Data file invalidation
+	 (9) Write back resource management
+	(10) Caching of local modifications
+	(11) Page release and invalidation
+
+
+Overview
+========
+
+The fscache hierarchy is organised on two levels from a network filesystem's
+point of view.  The upper level represents "volumes" and the lower level
+represents "data storage objects".  These are represented by two types of
+cookie, hereafter referred to as "volume cookies" and "cookies".
+
+A network filesystem acquires a volume cookie for a volume using a volume key,
+which represents all the information that defines that volume (e.g. cell name
+or server address, volume ID or share name).  This must be rendered as a
+printable string that can be used as a directory name (ie. no '/' characters
+and shouldn't begin with a '.').  The maximum name length is one less than the
+maximum size of a filename component (allowing the cache backend one char for
+its own purposes).
+
+A filesystem would typically have a volume cookie for each superblock.
+
+The filesystem then acquires a cookie for each file within that volume using an
+object key.  Object keys are binary blobs and only need to be unique within
+their parent volume.  The cache backend is reponsible for rendering the binary
+blob into something it can use and may employ hash tables, trees or whatever to
+improve its ability to find an object.  This is transparent to the network
+filesystem.
+
+A filesystem would typically have a cookie for each inode, and would acquire it
+in iget and relinquish it when evicting the cookie.
+
+Once it has a cookie, the filesystem needs to mark the cookie as being in use.
+This causes fscache to send the cache backend off to look up/create resources
+for the cookie in the background, to check its coherency and, if necessary, to
+mark the object as being under modification.
+
+A filesystem would typically "use" the cookie in its file open routine and
+unuse it in file release and it needs to use the cookie around calls to
+truncate the cookie locally.  It *also* needs to use the cookie when the
+pagecache becomes dirty and unuse it when writeback is complete.  This is
+slightly tricky, and provision is made for it.
+
+When performing a read, write or resize on a cookie, the filesystem must first
+begin an operation.  This copies the resources into a holding struct and puts
+extra pins into the cache to stop cache withdrawal from tearing down the
+structures being used.  The actual operation can then be issued and conflicting
+invalidations can be detected upon completion.
+
+The filesystem is expected to use netfslib to access the cache, but that's not
+actually required and it can use the fscache I/O API directly.
+
+
+Volume Registration
+===================
+
+The first step for a network filsystem is to acquire a volume cookie for the
+volume it wants to access::
+
+	struct fscache_volume *
+	fscache_acquire_volume(const char *volume_key,
+			       const char *cache_name,
+			       const void *coherency_data,
+			       size_t coherency_len);
+
+This function creates a volume cookie with the specified volume key as its name
+and notes the coherency data.
+
+The volume key must be a printable string with no '/' characters in it.  It
+should begin with the name of the filesystem and should be no longer than 254
+characters.  It should uniquely represent the volume and will be matched with
+what's stored in the cache.
+
+The caller may also specify the name of the cache to use.  If specified,
+fscache will look up or create a cache cookie of that name and will use a cache
+of that name if it is online or comes online.  If no cache name is specified,
+it will use the first cache that comes to hand and set the name to that.
+
+The specified coherency data is stored in the cookie and will be matched
+against coherency data stored on disk.  The data pointer may be NULL if no data
+is provided.  If the coherency data doesn't match, the entire cache volume will
+be invalidated.
+
+This function can return errors such as EBUSY if the volume key is already in
+use by an acquired volume or ENOMEM if an allocation failure occured.  It may
+also return a NULL volume cookie if fscache is not enabled.  It is safe to
+pass a NULL cookie to any function that takes a volume cookie.  This will
+cause that function to do nothing.
+
+
+When the network filesystem has finished with a volume, it should relinquish it
+by calling::
+
+	void fscache_relinquish_volume(struct fscache_volume *volume,
+				       const void *coherency_data,
+				       bool invalidate);
+
+This will cause the volume to be committed or removed, and if sealed the
+coherency data will be set to the value supplied.  The amount of coherency data
+must match the length specified when the volume was acquired.  Note that all
+data cookies obtained in this volume must be relinquished before the volume is
+relinquished.
 
 
-Index Registration
-==================
+Data File Registration
+======================
 
-The third step is to inform FS-Cache about part of an index hierarchy that can
-be used to locate files.  This is done by requesting a cookie for each index in
-the path to the file::
+Once it has a volume cookie, a network filesystem can use it to acquire a
+cookie for data storage::
 
 	struct fscache_cookie *
-	fscache_acquire_cookie(struct fscache_cookie *parent,
-			       const struct fscache_object_def *def,
+	fscache_acquire_cookie(struct fscache_volume *volume,
+			       u8 advice,
 			       const void *index_key,
 			       size_t index_key_len,
 			       const void *aux_data,
 			       size_t aux_data_len,
-			       void *netfs_data,
-			       loff_t object_size,
-			       bool enable);
+			       loff_t object_size)
 
-This function creates an index entry in the index represented by parent,
-filling in the index entry by calling the operations pointed to by def.
+This creates the cookie in the volume using the specified index key.  The index
+key is a binary blob of the given length and must be unique for the volume.
+This is saved into the cookie.  There are no restrictions on the content, but
+its length shouldn't exceed about three quarters of the maximum filename length
+to allow for encoding.
 
-A unique key that represents the object within the parent must be pointed to by
-index_key and is of length index_key_len.
+The caller should also pass in a piece of coherency data in aux_data.  A buffer
+of size aux_data_len will be allocated and the coherency data copied in.  It is
+assumed that the size is invariant over time.  The coherency data is used to
+check the validity of data in the cache.  Functions are provided by which the
+coherency data can be updated.
 
-An optional blob of auxiliary data that is to be stored within the cache can be
-pointed to with aux_data and should be of length aux_data_len.  This would
-typically be used for storing coherency data.
+The file size of the object being cached should also be provided.  This may be
+used to trim the data and will be stored with the coherency data.
 
-The netfs may pass an arbitrary value in netfs_data and this will be presented
-to it in the event of any calling back.  This may also be used in tracing or
-logging of messages.
+This function never returns an error, though it may return a NULL cookie on
+allocation failure or if fscache is not enabled.  It is safe to pass in a NULL
+volume cookie and pass the NULL cookie returned to any function that takes it.
+This will cause that function to do nothing.
 
-The cache tracks the size of the data attached to an object and this set to be
-object_size.  For indices, this should be 0.  This value will be passed to the
-->check_aux() callback.
 
-Note that this function never returns an error - all errors are handled
-internally.  It may, however, return NULL to indicate no cookie.  It is quite
-acceptable to pass this token back to this function as the parent to another
-acquisition (or even to the relinquish cookie, read page and write page
-functions - see below).
+When the network filesystem has finished with a cookie, it should relinquish it
+by calling::
 
-Note also that no indices are actually created in a cache until a non-index
-object needs to be created somewhere down the hierarchy.  Furthermore, an index
-may be created in several different caches independently at different times.
-This is all handled transparently, and the netfs doesn't see any of it.
+	void fscache_relinquish_cookie(struct fscache_cookie *cookie,
+				       bool retire);
 
-A cookie will be created in the disabled state if enabled is false.  A cookie
-must be enabled to do anything with it.  A disabled cookie can be enabled by
-calling fscache_enable_cookie() (see below).
+This will cause fscache to either commit the storage backing the cookie or
+delete it.
 
-For example, with AFS, a cell would be added to the primary index.  This index
-entry would have a dependent inode containing volume mappings within this cell::
 
-	cell->cache =
-		fscache_acquire_cookie(afs_cache_netfs.primary_index,
-				       &afs_cell_cache_index_def,
-				       cell->name, strlen(cell->name),
-				       NULL, 0,
-				       cell, 0, true);
+Marking A Cookie In-Use
+=======================
 
-And then a particular volume could be added to that index by ID, creating
-another index for vnodes (AFS inode equivalents)::
+Once a cookie has been acquired by a network filesystem, the filesystem should
+tell fscache when it intends to use the cookie (typically done on file open)
+and should say when it has finished with it (typically on file close)::
 
-	volume->cache =
-		fscache_acquire_cookie(volume->cell->cache,
-				       &afs_volume_cache_index_def,
-				       &volume->vid, sizeof(volume->vid),
-				       NULL, 0,
-				       volume, 0, true);
+	void fscache_use_cookie(struct fscache_cookie *cookie,
+				bool will_modify);
+	void fscache_unuse_cookie(struct fscache_cookie *cookie,
+				  const void *aux_data,
+				  const loff_t *object_size);
 
+The *use* function tells fscache that it will use the cookie and, additionally,
+indicate if the user is intending to modify the contents locally.  If not yet
+done, this will trigger the cache backend to go and gather the resources it
+needs to access/store data in the cache.  This is done in the background, and
+so may not be complete by the time the function returns.
 
-Data File Registration
-======================
+The *unuse* function indicates that a filesystem has finished using a cookie.
+It optionally updates the stored coherency data and object size and then
+decreases the in-use counter.  When the last user unuses the cookie, it is
+scheduled for garbage collection.  If not reused within a short time, the
+resources will be released to reduce system resource consumption.
 
-The fourth step is to request a data file be created in the cache.  This is
-identical to index cookie acquisition.  The only difference is that the type in
-the object definition should be something other than index type::
+A cookie must be marked in-use before it can be accessed for read, write or
+resize - and an in-use mark must be kept whilst there is dirty data in the
+pagecache in order to avoid an oops due to trying to open a file during process
+exit.
 
-	vnode->cache =
-		fscache_acquire_cookie(volume->cache,
-				       &afs_vnode_cache_object_def,
-				       &key, sizeof(key),
-				       &aux, sizeof(aux),
-				       vnode, vnode->status.size, true);
+Note that in-use marks are cumulative.  For each time a cookie is marked
+in-use, it must be unused.
 
 
-Miscellaneous Object Registration
+Resizing A Data File (Truncation)
 =================================
 
-An optional step is to request an object of miscellaneous type be created in
-the cache.  This is almost identical to index cookie acquisition.  The only
-difference is that the type in the object definition should be something other
-than index type.  While the parent object could be an index, it's more likely
-it would be some other type of object such as a data file::
-
-	xattr->cache =
-		fscache_acquire_cookie(vnode->cache,
-				       &afs_xattr_cache_object_def,
-				       &xattr->name, strlen(xattr->name),
-				       NULL, 0,
-				       xattr, strlen(xattr->val), true);
-
-Miscellaneous objects might be used to store extended attributes or directory
-entries for example.
-
-
-Setting the Data File Size
-==========================
+If a network filesystem file is resized locally by truncation, the following
+should be called to notify the cache::
 
-The fifth step is to set the physical attributes of the file, such as its size.
-This doesn't automatically reserve any space in the cache, but permits the
-cache to adjust its metadata for data tracking appropriately::
+	void fscache_resize_cookie(struct fscache_cookie *cookie,
+				   loff_t new_size);
 
-	int fscache_attr_changed(struct fscache_cookie *cookie);
+The caller must have first marked the cookie in-use.  The cookie and the new
+size are passed in and the cache is synchronously resized.  This is expected to
+be called from ``->setattr()`` inode operation under the inode lock.
 
-The cache will return -ENOBUFS if there is no backing cache or if there is no
-space to allocate any extra metadata required in the cache.
 
-Note that attempts to read or write data pages in the cache over this size may
-be rebuffed with -ENOBUFS.
+Data I/O API
+============
 
-This operation schedules an attribute adjustment to happen asynchronously at
-some point in the future, and as such, it may happen after the function returns
-to the caller.  The attribute adjustment excludes read and write operations.
+To do data I/O operations directly through a cookie, the following functions
+are available::
 
+	int fscache_begin_read_operation(struct netfs_cache_resources *cres,
+					 struct fscache_cookie *cookie);
+	int fscache_read(struct netfs_cache_resources *cres,
+			 loff_t start_pos,
+			 struct iov_iter *iter,
+			 enum netfs_read_from_hole read_hole,
+			 netfs_io_terminated_t term_func,
+			 void *term_func_priv);
+	int fscache_write(struct netfs_cache_resources *cres,
+			  loff_t start_pos,
+			  struct iov_iter *iter,
+			  netfs_io_terminated_t term_func,
+			  void *term_func_priv);
 
-Page alloc/read/write
-=====================
+The *begin* function sets up an operation, attaching the resources required to
+the cache resources block from the cookie.  Assuming it doesn't return an error
+(for instance, it will return -ENOBUFS if given a NULL cookie, but otherwise do
+nothing), then one of the other two functions can be issued.
 
-And the sixth step is to store and retrieve pages in the cache.  There are
-three functions that are used to do this.
+The *read* and *write* functions initiate a direct-IO operation.  Both take the
+previously set up cache resources block, an indication of the start file
+position, and an I/O iterator that describes buffer and indicates the amount of
+data.
 
-Note:
+The read function also takes a parameter to indicate how it should handle a
+partially populated region (a hole) in the disk content.  This may be to ignore
+it, skip over an initial hole and place zeros in the buffer or give an error.
 
- (1) A page should not be re-read or re-allocated without uncaching it first.
-
- (2) A read or allocated page must be uncached when the netfs page is released
-     from the pagecache.
-
- (3) A page should only be written to the cache if previous read or allocated.
-
-This permits the cache to maintain its page tracking in proper order.
-
-
-PAGE READ
----------
-
-Firstly, the netfs should ask FS-Cache to examine the caches and read the
-contents cached for a particular page of a particular file if present, or else
-allocate space to store the contents if not::
+The read and write functions can be given an optional termination function that
+will be run on completion::
 
 	typedef
-	void (*fscache_rw_complete_t)(struct page *page,
-				      void *context,
-				      int error);
-
-	int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
-				       struct page *page,
-				       fscache_rw_complete_t end_io_func,
-				       void *context,
-				       gfp_t gfp);
-
-The cookie argument must specify a cookie for an object that isn't an index,
-the page specified will have the data loaded into it (and is also used to
-specify the page number), and the gfp argument is used to control how any
-memory allocations made are satisfied.
-
-If the cookie indicates the inode is not cached:
-
- (1) The function will return -ENOBUFS.
-
-Else if there's a copy of the page resident in the cache:
-
- (1) The mark_pages_cached() cookie operation will be called on that page.
+	void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error,
+				      bool was_async);
 
- (2) The function will submit a request to read the data from the cache's
-     backing device directly into the page specified.
+If a termination function is given, the operation will be run asynchronously
+and the termination function will be called upon completion.  If not given, the
+operation will be run synchronously.  Note that in the asynchronous case, it is
+possible for the operation to complete before the function returns.
 
- (3) The function will return 0.
+Both the read and write functions end the operation when they complete,
+detaching any pinned resources.
 
- (4) When the read is complete, end_io_func() will be invoked with:
+The read operation will fail with ESTALE if invalidation occurred whilst the
+operation was ongoing.
 
-       * The netfs data supplied when the cookie was created.
 
-       * The page descriptor.
+Data File Coherency
+===================
 
-       * The context argument passed to the above function.  This will be
-         maintained with the get_context/put_context functions mentioned above.
-
-       * An argument that's 0 on success or negative for an error code.
-
-     If an error occurs, it should be assumed that the page contains no usable
-     data.  fscache_readpages_cancel() may need to be called.
-
-     end_io_func() will be called in process context if the read is results in
-     an error, but it might be called in interrupt context if the read is
-     successful.
-
-Otherwise, if there's not a copy available in cache, but the cache may be able
-to store the page:
-
- (1) The mark_pages_cached() cookie operation will be called on that page.
-
- (2) A block may be reserved in the cache and attached to the object at the
-     appropriate place.
-
- (3) The function will return -ENODATA.
-
-This function may also return -ENOMEM or -EINTR, in which case it won't have
-read any data from the cache.
-
-
-Page Allocate
--------------
-
-Alternatively, if there's not expected to be any data in the cache for a page
-because the file has been extended, a block can simply be allocated instead::
-
-	int fscache_alloc_page(struct fscache_cookie *cookie,
-			       struct page *page,
-			       gfp_t gfp);
-
-This is similar to the fscache_read_or_alloc_page() function, except that it
-never reads from the cache.  It will return 0 if a block has been allocated,
-rather than -ENODATA as the other would.  One or the other must be performed
-before writing to the cache.
-
-The mark_pages_cached() cookie operation will be called on the page if
-successful.
-
-
-Page Write
-----------
-
-Secondly, if the netfs changes the contents of the page (either due to an
-initial download or if a user performs a write), then the page should be
-written back to the cache::
-
-	int fscache_write_page(struct fscache_cookie *cookie,
-			       struct page *page,
-			       loff_t object_size,
-			       gfp_t gfp);
-
-The cookie argument must specify a data file cookie, the page specified should
-contain the data to be written (and is also used to specify the page number),
-object_size is the revised size of the object and the gfp argument is used to
-control how any memory allocations made are satisfied.
-
-The page must have first been read or allocated successfully and must not have
-been uncached before writing is performed.
-
-If the cookie indicates the inode is not cached then:
-
- (1) The function will return -ENOBUFS.
-
-Else if space can be allocated in the cache to hold this page:
-
- (1) PG_fscache_write will be set on the page.
-
- (2) The function will submit a request to write the data to cache's backing
-     device directly from the page specified.
-
- (3) The function will return 0.
-
- (4) When the write is complete PG_fscache_write is cleared on the page and
-     anyone waiting for that bit will be woken up.
-
-Else if there's no space available in the cache, -ENOBUFS will be returned.  It
-is also possible for the PG_fscache_write bit to be cleared when no write took
-place if unforeseen circumstances arose (such as a disk error).
-
-Writing takes place asynchronously.
-
-
-Multiple Page Read
-------------------
-
-A facility is provided to read several pages at once, as requested by the
-readpages() address space operation::
-
-	int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
-					struct address_space *mapping,
-					struct list_head *pages,
-					int *nr_pages,
-					fscache_rw_complete_t end_io_func,
-					void *context,
-					gfp_t gfp);
-
-This works in a similar way to fscache_read_or_alloc_page(), except:
-
- (1) Any page it can retrieve data for is removed from pages and nr_pages and
-     dispatched for reading to the disk.  Reads of adjacent pages on disk may
-     be merged for greater efficiency.
-
- (2) The mark_pages_cached() cookie operation will be called on several pages
-     at once if they're being read or allocated.
-
- (3) If there was an general error, then that error will be returned.
-
-     Else if some pages couldn't be allocated or read, then -ENOBUFS will be
-     returned.
-
-     Else if some pages couldn't be read but were allocated, then -ENODATA will
-     be returned.
-
-     Otherwise, if all pages had reads dispatched, then 0 will be returned, the
-     list will be empty and ``*nr_pages`` will be 0.
-
- (4) end_io_func will be called once for each page being read as the reads
-     complete.  It will be called in process context if error != 0, but it may
-     be called in interrupt context if there is no error.
-
-Note that a return of -ENODATA, -ENOBUFS or any other error does not preclude
-some of the pages being read and some being allocated.  Those pages will have
-been marked appropriately and will need uncaching.
-
-
-Cancellation of Unread Pages
-----------------------------
-
-If one or more pages are passed to fscache_read_or_alloc_pages() but not then
-read from the cache and also not read from the underlying filesystem then
-those pages will need to have any marks and reservations removed.  This can be
-done by calling::
-
-	void fscache_readpages_cancel(struct fscache_cookie *cookie,
-				      struct list_head *pages);
-
-prior to returning to the caller.  The cookie argument should be as passed to
-fscache_read_or_alloc_pages().  Every page in the pages list will be examined
-and any that have PG_fscache set will be uncached.
-
-
-Page Uncaching
-==============
-
-To uncache a page, this function should be called::
-
-	void fscache_uncache_page(struct fscache_cookie *cookie,
-				  struct page *page);
-
-This function permits the cache to release any in-memory representation it
-might be holding for this netfs page.  This function must be called once for
-each page on which the read or write page functions above have been called to
-make sure the cache's in-memory tracking information gets torn down.
-
-Note that pages can't be explicitly deleted from the a data file.  The whole
-data file must be retired (see the relinquish cookie function below).
-
-Furthermore, note that this does not cancel the asynchronous read or write
-operation started by the read/alloc and write functions, so the page
-invalidation functions must use::
-
-	bool fscache_check_page_write(struct fscache_cookie *cookie,
-				      struct page *page);
-
-to see if a page is being written to the cache, and::
-
-	void fscache_wait_on_page_write(struct fscache_cookie *cookie,
-					struct page *page);
-
-to wait for it to finish if it is.
-
-
-When releasepage() is being implemented, a special FS-Cache function exists to
-manage the heuristics of coping with vmscan trying to eject pages, which may
-conflict with the cache trying to write pages to the cache (which may itself
-need to allocate memory)::
-
-	bool fscache_maybe_release_page(struct fscache_cookie *cookie,
-					struct page *page,
-					gfp_t gfp);
-
-This takes the netfs cookie, and the page and gfp arguments as supplied to
-releasepage().  It will return false if the page cannot be released yet for
-some reason and if it returns true, the page has been uncached and can now be
-released.
-
-To make a page available for release, this function may wait for an outstanding
-storage request to complete, or it may attempt to cancel the storage request -
-in which case the page will not be stored in the cache this time.
-
-
-Bulk Image Page Uncache
------------------------
-
-A convenience routine is provided to perform an uncache on all the pages
-attached to an inode.  This assumes that the pages on the inode correspond on a
-1:1 basis with the pages in the cache::
-
-	void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
-					     struct inode *inode);
-
-This takes the netfs cookie that the pages were cached with and the inode that
-the pages are attached to.  This function will wait for pages to finish being
-written to the cache and for the cache to finish with the page generally.  No
-error is returned.
-
-
-Index and Data File consistency
-===============================
-
-To find out whether auxiliary data for an object is up to data within the
-cache, the following function can be called::
-
-	int fscache_check_consistency(struct fscache_cookie *cookie,
-				      const void *aux_data);
-
-This will call back to the netfs to check whether the auxiliary data associated
-with a cookie is correct; if aux_data is non-NULL, it will update the auxiliary
-data buffer first.  It returns 0 if it is and -ESTALE if it isn't; it may also
-return -ENOMEM and -ERESTARTSYS.
-
-To request an update of the index data for an index or other object, the
-following function should be called::
+To request an update of the coherency data and file size on a cookie, the
+following should be called::
 
 	void fscache_update_cookie(struct fscache_cookie *cookie,
-				   const void *aux_data);
-
-This function will update the cookie's auxiliary data buffer from aux_data if
-that is non-NULL and then schedule this to be stored on disk.  The update
-method in the parent index definition will be called to transfer the data.
-
-Note that partial updates may happen automatically at other times, such as when
-data blocks are added to a data file object.
-
-
-Cookie Enablement
-=================
-
-Cookies exist in one of two states: enabled and disabled.  If a cookie is
-disabled, it ignores all attempts to acquire child cookies; check, update or
-invalidate its state; allocate, read or write backing pages - though it is
-still possible to uncache pages and relinquish the cookie.
-
-The initial enablement state is set by fscache_acquire_cookie(), but the cookie
-can be enabled or disabled later.  To disable a cookie, call::
-
-	void fscache_disable_cookie(struct fscache_cookie *cookie,
-				    const void *aux_data,
-    				    bool invalidate);
-
-If the cookie is not already disabled, this locks the cookie against other
-enable and disable ops, marks the cookie as being disabled, discards or
-invalidates any backing objects and waits for cessation of activity on any
-associated object before unlocking the cookie.
-
-All possible failures are handled internally.  The caller should consider
-calling fscache_uncache_all_inode_pages() afterwards to make sure all page
-markings are cleared up.
-
-Cookies can be enabled or reenabled with::
-
-    	void fscache_enable_cookie(struct fscache_cookie *cookie,
 				   const void *aux_data,
-				   loff_t object_size,
-    				   bool (*can_enable)(void *data),
-    				   void *data)
-
-If the cookie is not already enabled, this locks the cookie against other
-enable and disable ops, invokes can_enable() and, if the cookie is not an index
-cookie, will begin the procedure of acquiring backing objects.
-
-The optional can_enable() function is passed the data argument and returns a
-ruling as to whether or not enablement should actually be permitted to begin.
+				   const loff_t *object_size);
 
-All possible failures are handled internally.  The cookie will only be marked
-as enabled if provisional backing objects are allocated.
+This will update the cookie's coherency data and/or file size.
 
-The object's data size is updated from object_size and is passed to the
-->check_aux() function.
 
-In both cases, the cookie's auxiliary data buffer is updated from aux_data if
-that is non-NULL inside the enablement lock before proceeding.
-
-
-Miscellaneous Cookie operations
-===============================
+Data File Invalidation
+======================
 
-There are a number of operations that can be used to control cookies:
+Sometimes it will be necessary to invalidate an object that contains data.
+Typically this will be necessary when the server informs the network filesystem
+of a remote third-party change - at which point the filesystem has to throw
+away the state and cached data that it had for an file and reload from the
+server.
 
-     * Cookie pinning::
+To indicate that a cache object should be invalidated, the following should be
+called::
 
-	int fscache_pin_cookie(struct fscache_cookie *cookie);
-	void fscache_unpin_cookie(struct fscache_cookie *cookie);
+	void fscache_invalidate(struct fscache_cookie *cookie,
+				const void *aux_data,
+				loff_t size,
+				unsigned int flags);
 
-     These operations permit data cookies to be pinned into the cache and to
-     have the pinning removed.  They are not permitted on index cookies.
+This increases the invalidation counter in the cookie to cause outstanding
+reads to fail with -ESTALE, sets the coherency data and file size from the
+information supplied, blocks new I/O on the cookie and dispatches the cache to
+go and get rid of the old data.
 
-     The pinning function will return 0 if successful, -ENOBUFS in the cookie
-     isn't backed by a cache, -EOPNOTSUPP if the cache doesn't support pinning,
-     -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
-     -EIO if there's any other problem.
+Invalidation runs asynchronously in a worker thread so that it doesn't block
+too much.
 
-   * Data space reservation::
 
-	int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
+Write-Back Resource Management
+==============================
 
-     This permits a netfs to request cache space be reserved to store up to the
-     given amount of a file.  It is permitted to ask for more than the current
-     size of the file to allow for future file expansion.
+To write data to the cache from network filesystem writeback, the cache
+resources required need to be pinned at the point the modification is made (for
+instance when the page is marked dirty) as it's not possible to open a file in
+a thread that's exiting.
 
-     If size is given as zero then the reservation will be cancelled.
+The following facilities are provided to manage this:
 
-     The function will return 0 if successful, -ENOBUFS in the cookie isn't
-     backed by a cache, -EOPNOTSUPP if the cache doesn't support reservations,
-     -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
-     -EIO if there's any other problem.
+ * An inode flag, ``I_PINNING_FSCACHE_WB``, is provided to indicate that an
+   in-use is held on the cookie for this inode.  It can only be changed if the
+   the inode lock is held.
 
-     Note that this doesn't pin an object in a cache; it can still be culled to
-     make space if it's not in use.
+ * A flag, ``unpinned_fscache_wb`` is placed in the ``writeback_control``
+   struct that gets set if ``__writeback_single_inode()`` clears
+   ``I_PINNING_FSCACHE_WB`` because all the dirty pages were cleared.
 
+To support this, the following functions are provided::
 
-Cookie Unregistration
-=====================
+	int fscache_set_page_dirty(struct page *page,
+				   struct fscache_cookie *cookie);
+	void fscache_unpin_writeback(struct writeback_control *wbc,
+				     struct fscache_cookie *cookie);
+	void fscache_clear_inode_writeback(struct fscache_cookie *cookie,
+					   struct inode *inode,
+					   const void *aux);
 
-To get rid of a cookie, this function should be called::
+The *set* function is intended to be called from the filesystem's
+``set_page_dirty`` address space operation.  If ``I_PINNING_FSCACHE_WB`` is not
+set, it sets that flag and increments the use count on the cookie (the caller
+must already have called ``fscache_use_cookie()``).
 
-	void fscache_relinquish_cookie(struct fscache_cookie *cookie,
-				       const void *aux_data,
-				       bool retire);
+The *unpin* function is intended to be called from the filesystem's
+``write_inode`` superblock operation.  It cleans up after writing by unusing
+the cookie if unpinned_fscache_wb is set in the writeback_control struct.
 
-If retire is non-zero, then the object will be marked for recycling, and all
-copies of it will be removed from all active caches in which it is present.
-Not only that but all child objects will also be retired.
+The *clear* function is intended to be called from the netfs's ``evict_inode``
+superblock operation.  It must be called *after*
+``truncate_inode_pages_final()``, but *before* ``clear_inode()``.  This cleans
+up any hanging ``I_PINNING_FSCACHE_WB``.  It also allows the coherency data to
+be updated.
 
-If retire is zero, then the object may be available again when next the
-acquisition function is called.  Retirement here will overrule the pinning on a
-cookie.
 
-The cookie's auxiliary data will be updated from aux_data if that is non-NULL
-so that the cache can lazily update it on disk.
+Caching of Local Modifications
+==============================
 
-One very important note - relinquish must NOT be called for a cookie unless all
-the cookies for "child" indices, objects and pages have been relinquished
-first.
+If a network filesystem has locally modified data that it wants to write to the
+cache, it needs to mark the pages to indicate that a write is in progress, and
+if the mark is already present, it needs to wait for it to be removed first
+(presumably due to an already in-progress operation).  This prevents multiple
+competing DIO writes to the same storage in the cache.
 
+Firstly, the netfs should determine if caching is available by doing something
+like::
 
-Index Invalidation
-==================
+	bool caching = fscache_cookie_enabled(cookie);
 
-There is no direct way to invalidate an index subtree.  To do this, the caller
-should relinquish and retire the cookie they have, and then acquire a new one.
+If caching is to be attempted, pages should be waited for and then marked using
+the following functions provided by the netfs helper library::
 
+	void set_page_fscache(struct page *page);
+	void wait_on_page_fscache(struct page *page);
+	int wait_on_page_fscache_killable(struct page *page);
 
-Data File Invalidation
-======================
+Once all the pages in the span are marked, the netfs can ask fscache to
+schedule a write of that region::
 
-Sometimes it will be necessary to invalidate an object that contains data.
-Typically this will be necessary when the server tells the netfs of a foreign
-change - at which point the netfs has to throw away all the state it had for an
-inode and reload from the server.
+	void fscache_write_to_cache(struct fscache_cookie *cookie,
+				    struct address_space *mapping,
+				    loff_t start, size_t len, loff_t i_size,
+				    netfs_io_terminated_t term_func,
+				    void *term_func_priv,
+				    bool caching)
 
-To indicate that a cache object should be invalidated, the following function
-can be called::
+And if an error occurs before that point is reached, the marks can be removed
+by calling::
 
-	void fscache_invalidate(struct fscache_cookie *cookie);
+	void fscache_clear_page_bits(struct fscache_cookie *cookie,
+				     struct address_space *mapping,
+				     loff_t start, size_t len,
+				     bool caching)
 
-This can be called with spinlocks held as it defers the work to a thread pool.
-All extant storage, retrieval and attribute change ops at this point are
-cancelled and discarded.  Some future operations will be rejected until the
-cache has had a chance to insert a barrier in the operations queue.  After
-that, operations will be queued again behind the invalidation operation.
+In both of these functions, the cookie representing the cache object to be
+written to and a pointer to the mapping to which the source pages are attached
+are passed in; start and len indicate the size of the region that's going to be
+written (it doesn't have to align to page boundaries necessarily, but it does
+have to align to DIO boundaries on the backing filesystem).  The caching
+parameter indicates if caching should be skipped, and if false, the functions
+do nothing.
 
-The invalidation operation will perform an attribute change operation and an
-auxiliary data update operation as it is very likely these will have changed.
+The write function takes some additional parameters: i_size indicates the size
+of the netfs file and term_func indicates an optional completion function, to
+which term_func_priv will be passed, along with the error or amount written.
 
-Using the following function, the netfs can wait for the invalidation operation
-to have reached a point at which it can start submitting ordinary operations
-once again::
+Note that the write function will always run asynchronously and will unmark all
+the pages upon completion before calling term_func.
 
-	void fscache_wait_on_invalidate(struct fscache_cookie *cookie);
 
+Page Release and Invalidation
+=============================
 
-FS-cache Specific Page Flag
-===========================
+Fscache keeps track of whether we have any data in the cache yet for a cache
+object we've just created.  It knows it doesn't have to do any reading until it
+has done a write and then the page it wrote from has been released by the VM,
+after which it *has* to look in the cache.
 
-FS-Cache makes use of a page flag, PG_private_2, for its own purpose.  This is
-given the alternative name PG_fscache.
+To inform fscache that a page might now be in the cache, the following function
+should be called from the ``releasepage`` address space op::
 
-PG_fscache is used to indicate that the page is known by the cache, and that
-the cache must be informed if the page is going to go away.  It's an indication
-to the netfs that the cache has an interest in this page, where an interest may
-be a pointer to it, resources allocated or reserved for it, or I/O in progress
-upon it.
+	void fscache_note_page_release(struct fscache_cookie *cookie);
 
-The netfs can use this information in methods such as releasepage() to
-determine whether it needs to uncache a page or update it.
+if the page has been released (ie. releasepage returned true).
 
-Furthermore, if this bit is set, releasepage() and invalidatepage() operations
-will be called on a page to get rid of it, even if PG_private is not set.  This
-allows caching to attempted on a page before read_cache_pages() to be called
-after fscache_read_or_alloc_pages() as the former will try and release pages it
-was given under certain circumstances.
+Page release and page invalidation should also wait for any mark left on the
+page to say that a DIO write is underway from that page::
 
-This bit does not overlap with such as PG_private.  This means that FS-Cache
-can be used with a filesystem that uses the block buffering code.
+	void wait_on_page_fscache(struct page *page);
+	int wait_on_page_fscache_killable(struct page *page);
 
-There are a number of operations defined on this flag::
 
-	int PageFsCache(struct page *page);
-	void SetPageFsCache(struct page *page)
-	void ClearPageFsCache(struct page *page)
-	int TestSetPageFsCache(struct page *page)
-	int TestClearPageFsCache(struct page *page)
+API Function Reference
+======================
 
-These functions are bit test, bit set, bit clear, bit test and set and bit
-test and clear operations on PG_fscache.
+.. kernel-doc:: include/linux/fscache.h