summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_inode.h
diff options
context:
space:
mode:
authorDarrick J. Wong <darrick.wong@oracle.com>2019-04-15 23:13:20 +0300
committerDarrick J. Wong <darrick.wong@oracle.com>2019-04-16 20:01:57 +0300
commitcb357bf3d105f68ff5a5adcf89f1b285da675e2f (patch)
tree58bc5720f5fcfd8a684e707f7dc257b5321496fb /fs/xfs/xfs_inode.h
parent4fb7951fde64985bad80dcd2d721430ba584f125 (diff)
downloadlinux-cb357bf3d105f68ff5a5adcf89f1b285da675e2f.tar.xz
xfs: implement per-inode writeback completion queues
When scheduling writeback of dirty file data in the page cache, XFS uses IO completion workqueue items to ensure that filesystem metadata only updates after the write completes successfully. This is essential for converting unwritten extents to real extents at the right time and performing COW remappings. Unfortunately, XFS queues each IO completion work item to an unbounded workqueue, which means that the kernel can spawn dozens of threads to try to handle the items quickly. These threads need to take the ILOCK to update file metadata, which results in heavy ILOCK contention if a large number of the work items target a single file, which is inefficient. Worse yet, the writeback completion threads get stuck waiting for the ILOCK while holding transaction reservations, which can use up all available log reservation space. When that happens, metadata updates to other parts of the filesystem grind to a halt, even if the filesystem could otherwise have handled it. Even worse, if one of the things grinding to a halt happens to be a thread in the middle of a defer-ops finish holding the same ILOCK and trying to obtain more log reservation having exhausted the permanent reservation, we now have an ABBA deadlock - writeback completion has a transaction reserved and wants the ILOCK, and someone else has the ILOCK and wants a transaction reservation. Therefore, we create a per-inode writeback io completion queue + work item. When writeback finishes, it can add the ioend to the per-inode queue and let the single worker item process that queue. This dramatically cuts down on the number of kworkers and ILOCK contention in the system, and seems to have eliminated an occasional deadlock I was seeing while running generic/476. Testing with a program that simulates a heavy random-write workload to a single file demonstrates that the number of kworkers drops from approximately 120 threads per file to 1, without dramatically changing write bandwidth or pagecache access latency. Note that we leave the xfs-conv workqueue's max_active alone because we still want to be able to run ioend processing for as many inodes as the system can handle. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
Diffstat (limited to 'fs/xfs/xfs_inode.h')
-rw-r--r--fs/xfs/xfs_inode.h7
1 files changed, 7 insertions, 0 deletions
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 7bb1961918de..87e701b638ae 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -65,6 +65,11 @@ typedef struct xfs_inode {
/* VFS inode */
struct inode i_vnode; /* embedded VFS inode */
+
+ /* pending io completions */
+ spinlock_t i_ioend_lock;
+ struct work_struct i_ioend_work;
+ struct list_head i_ioend_list;
} xfs_inode_t;
/* Convert from vfs inode to xfs inode */
@@ -511,4 +516,6 @@ bool xfs_inode_verify_forks(struct xfs_inode *ip);
int xfs_iunlink_init(struct xfs_perag *pag);
void xfs_iunlink_destroy(struct xfs_perag *pag);
+void xfs_end_io(struct work_struct *work);
+
#endif /* __XFS_INODE_H__ */