<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/fs-writeback.c, branch v5.10.257</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.257</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.10.257'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-02-06T15:40:14+00:00</updated>
<entry>
<title>writeback: fix 100% CPU usage when dirtytime_expire_interval is 0</title>
<updated>2026-02-06T15:40:14+00:00</updated>
<author>
<name>Laveesh Bansal</name>
<email>laveeshb@laveeshbansal.com</email>
</author>
<published>2026-02-03T20:32:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=68140cf5076bcaee4c40033036b4b31f95dcb063'/>
<id>urn:sha1:68140cf5076bcaee4c40033036b4b31f95dcb063</id>
<content type='text'>
[ Upstream commit 543467d6fe97e27e22a26e367fda972dbefebbff ]

When vm.dirtytime_expire_seconds is set to 0, wakeup_dirtytime_writeback()
schedules delayed work with a delay of 0, causing immediate execution.
The function then reschedules itself with 0 delay again, creating an
infinite busy loop that causes 100% kworker CPU usage.

Fix by:
- Only scheduling delayed work in wakeup_dirtytime_writeback() when
  dirtytime_expire_interval is non-zero
- Cancelling the delayed work in dirtytime_interval_handler() when
  the interval is set to 0
- Adding a guard in start_dirtytime_writeback() for defensive coding

Tested by booting kernel in QEMU with virtme-ng:
- Before fix: kworker CPU spikes to ~73%
- After fix: CPU remains at normal levels
- Setting interval back to non-zero correctly resumes writeback

Fixes: a2f4870697a5 ("fs: make sure the timestamps for lazytime inodes eventually get written")
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220227
Signed-off-by: Laveesh Bansal &lt;laveeshb@laveeshbansal.com&gt;
Link: https://patch.msgid.link/20260106145059.543282-2-laveeshb@laveeshbansal.com
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
[ adapted system_percpu_wq to system_wq for the workqueue used in dirtytime_interval_handler() ]
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>fs: writeback: fix use-after-free in __mark_inode_dirty()</title>
<updated>2025-12-06T21:08:24+00:00</updated>
<author>
<name>Jiufei Xue</name>
<email>jiufei.xue@samsung.com</email>
</author>
<published>2025-11-28T14:41:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e2a14bbae5d8bacaa301362744a110e2be40a3a3'/>
<id>urn:sha1:e2a14bbae5d8bacaa301362744a110e2be40a3a3</id>
<content type='text'>
[ Upstream commit d02d2c98d25793902f65803ab853b592c7a96b29 ]

An use-after-free issue occurred when __mark_inode_dirty() get the
bdi_writeback that was in the progress of switching.

CPU: 1 PID: 562 Comm: systemd-random- Not tainted 6.6.56-gb4403bd46a8e #1
......
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __mark_inode_dirty+0x124/0x418
lr : __mark_inode_dirty+0x118/0x418
sp : ffffffc08c9dbbc0
........
Call trace:
 __mark_inode_dirty+0x124/0x418
 generic_update_time+0x4c/0x60
 file_modified+0xcc/0xd0
 ext4_buffered_write_iter+0x58/0x124
 ext4_file_write_iter+0x54/0x704
 vfs_write+0x1c0/0x308
 ksys_write+0x74/0x10c
 __arm64_sys_write+0x1c/0x28
 invoke_syscall+0x48/0x114
 el0_svc_common.constprop.0+0xc0/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x40/0xe4
 el0t_64_sync_handler+0x120/0x12c
 el0t_64_sync+0x194/0x198

Root cause is:

systemd-random-seed                         kworker
----------------------------------------------------------------------
___mark_inode_dirty                     inode_switch_wbs_work_fn

  spin_lock(&amp;inode-&gt;i_lock);
  inode_attach_wb
  locked_inode_to_wb_and_lock_list
     get inode-&gt;i_wb
     spin_unlock(&amp;inode-&gt;i_lock);
     spin_lock(&amp;wb-&gt;list_lock)
  spin_lock(&amp;inode-&gt;i_lock)
  inode_io_list_move_locked
  spin_unlock(&amp;wb-&gt;list_lock)
  spin_unlock(&amp;inode-&gt;i_lock)
                                    spin_lock(&amp;old_wb-&gt;list_lock)
                                      inode_do_switch_wbs
                                        spin_lock(&amp;inode-&gt;i_lock)
                                        inode-&gt;i_wb = new_wb
                                        spin_unlock(&amp;inode-&gt;i_lock)
                                    spin_unlock(&amp;old_wb-&gt;list_lock)
                                    wb_put_many(old_wb, nr_switched)
                                      cgwb_release
                                      old wb released
  wb_wakeup_delayed() accesses wb,
  then trigger the use-after-free
  issue

Fix this race condition by holding inode spinlock until
wb_wakeup_delayed() finished.

Signed-off-by: Jiufei Xue &lt;jiufei.xue@samsung.com&gt;
Link: https://lore.kernel.org/20250728100715.3863241-1-jiufei.xue@samsung.com
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Nazar Kalashnikov &lt;sivartiwe@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: fix call of incorrect macro</title>
<updated>2023-05-17T09:48:10+00:00</updated>
<author>
<name>Maxim Korotkov</name>
<email>korotkov.maxim.s@gmail.com</email>
</author>
<published>2023-01-19T10:44:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3b7798b42e1da6bbea70e6b09c41829865014f25'/>
<id>urn:sha1:3b7798b42e1da6bbea70e6b09c41829865014f25</id>
<content type='text'>
[ Upstream commit 3e46c89c74f2c38e5337d2cf44b0b551adff1cb4 ]

 the variable 'history' is of type u16, it may be an error
 that the hweight32 macro was used for it
 I guess macro hweight16 should be used

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 2a81490811d0 ("writeback: implement foreign cgroup inode detection")
Signed-off-by: Maxim Korotkov &lt;korotkov.maxim.s@gmail.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/r/20230119104443.3002-1-korotkov.maxim.s@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs</title>
<updated>2023-05-17T09:47:34+00:00</updated>
<author>
<name>Baokun Li</name>
<email>libaokun1@huawei.com</email>
</author>
<published>2023-04-10T13:08:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2b00b2a0e6425095602a0ba05e36a5b9afe86817'/>
<id>urn:sha1:2b00b2a0e6425095602a0ba05e36a5b9afe86817</id>
<content type='text'>
[ Upstream commit 1ba1199ec5747f475538c0d25a32804e5ba1dfde ]

KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
 &lt;TASK&gt;
 dump_stack_lvl+0x7f/0xc0
 print_report+0x2ba/0x340
 kasan_report+0xc4/0x120
 kasan_check_range+0x1b7/0x2e0
 __kasan_check_write+0x24/0x40
 bdi_split_work_to_wbs+0x5c5/0x7b0
 sync_inodes_sb+0x195/0x630
 sync_inodes_one_sb+0x3a/0x50
 iterate_supers+0x106/0x1b0
 ksys_sync+0x98/0x160
[...]
==================================================================

The race that causes the above issue is as follows:

           cpu1                     cpu2
-------------------------|-------------------------
inode_switch_wbs
 INIT_WORK(&amp;isw-&gt;work, inode_switch_wbs_work_fn)
 queue_rcu_work(isw_wq, &amp;isw-&gt;work)
 // queue_work async
  inode_switch_wbs_work_fn
   wb_put_many(old_wb, nr_switched)
    percpu_ref_put_many
     ref-&gt;data-&gt;release(ref)
     cgwb_release
      queue_work(cgwb_release_wq, &amp;wb-&gt;release_work)
      // queue_work async
       &amp;wb-&gt;release_work
       cgwb_release_workfn
                            ksys_sync
                             iterate_supers
                              sync_inodes_one_sb
                               sync_inodes_sb
                                bdi_split_work_to_wbs
                                 kmalloc(sizeof(*work), GFP_ATOMIC)
                                 // alloc memory failed
        percpu_ref_exit
         ref-&gt;data = NULL
         kfree(data)
                                 wb_get(wb)
                                  percpu_ref_get(&amp;wb-&gt;refcnt)
                                   percpu_ref_get_many(ref, 1)
                                    atomic_long_add(nr, &amp;ref-&gt;data-&gt;count)
                                     atomic64_add(i, v)
                                     // trigger null-ptr-deref

bdi_split_work_to_wbs() traverses &amp;bdi-&gt;wb_list to split work into all
wbs.  If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.

This issue was introduced in v4.3-rc7 (see fix tag1).  Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue.  For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb-&gt;state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.

To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.

Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li &lt;libaokun1@huawei.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andreas Dilger &lt;adilger.kernel@dilger.ca&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Hou Tao &lt;houtao1@huawei.com&gt;
Cc: yangerkun &lt;yangerkun@huawei.com&gt;
Cc: Zhang Yi &lt;yi.zhang@huawei.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs-writeback: writeback_sb_inodes：Recalculate 'wrote' according skipped pages</title>
<updated>2022-06-09T08:21:22+00:00</updated>
<author>
<name>Zhihao Cheng</name>
<email>chengzhihao1@huawei.com</email>
</author>
<published>2022-05-10T13:38:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9a9dc60da79a280269d3e8794725f049c367d686'/>
<id>urn:sha1:9a9dc60da79a280269d3e8794725f049c367d686</id>
<content type='text'>
commit 68f4c6eba70df70a720188bce95c85570ddfcc87 upstream.

Commit 505a666ee3fc ("writeback: plug writeback in wb_writeback() and
writeback_inodes_wb()") has us holding a plug during wb_writeback, which
may cause a potential ABBA dead lock:

    wb_writeback		fat_file_fsync
blk_start_plug(&amp;plug)
for (;;) {
  iter i-1: some reqs have been added into plug-&gt;mq_list  // LOCK A
  iter i:
    progress = __writeback_inodes_wb(wb, work)
    . writeback_sb_inodes // fat's bdev
    .   __writeback_single_inode
    .   . generic_writepages
    .   .   __block_write_full_page
    .   .   . . 	    __generic_file_fsync
    .   .   . . 	      sync_inode_metadata
    .   .   . . 	        writeback_single_inode
    .   .   . . 		  __writeback_single_inode
    .   .   . . 		    fat_write_inode
    .   .   . . 		      __fat_write_inode
    .   .   . . 		        sync_dirty_buffer	// fat's bdev
    .   .   . . 			  lock_buffer(bh)	// LOCK B
    .   .   . . 			    submit_bh
    .   .   . . 			      blk_mq_get_tag	// LOCK A
    .   .   . trylock_buffer(bh)  // LOCK B
    .   .   .   redirty_page_for_writepage
    .   .   .     wbc-&gt;pages_skipped++
    .   .   --wbc-&gt;nr_to_write
    .   wrote += write_chunk - wbc.nr_to_write  // wrote &gt; 0
    .   requeue_inode
    .     redirty_tail_locked
    if (progress)    // progress &gt; 0
      continue;
  iter i+1:
      queue_io
      // similar process with iter i, infinite for-loop !
}
blk_finish_plug(&amp;plug)   // flush plug won't be called

Above process triggers a hungtask like:
[  399.044861] INFO: task bb:2607 blocked for more than 30 seconds.
[  399.046824]       Not tainted 5.18.0-rc1-00005-gefae4d9eb6a2-dirty
[  399.051539] task:bb              state:D stack:    0 pid: 2607 ppid:
2426 flags:0x00004000
[  399.051556] Call Trace:
[  399.051570]  __schedule+0x480/0x1050
[  399.051592]  schedule+0x92/0x1a0
[  399.051602]  io_schedule+0x22/0x50
[  399.051613]  blk_mq_get_tag+0x1d3/0x3c0
[  399.051640]  __blk_mq_alloc_requests+0x21d/0x3f0
[  399.051657]  blk_mq_submit_bio+0x68d/0xca0
[  399.051674]  __submit_bio+0x1b5/0x2d0
[  399.051708]  submit_bio_noacct+0x34e/0x720
[  399.051718]  submit_bio+0x3b/0x150
[  399.051725]  submit_bh_wbc+0x161/0x230
[  399.051734]  __sync_dirty_buffer+0xd1/0x420
[  399.051744]  sync_dirty_buffer+0x17/0x20
[  399.051750]  __fat_write_inode+0x289/0x310
[  399.051766]  fat_write_inode+0x2a/0xa0
[  399.051783]  __writeback_single_inode+0x53c/0x6f0
[  399.051795]  writeback_single_inode+0x145/0x200
[  399.051803]  sync_inode_metadata+0x45/0x70
[  399.051856]  __generic_file_fsync+0xa3/0x150
[  399.051880]  fat_file_fsync+0x1d/0x80
[  399.051895]  vfs_fsync_range+0x40/0xb0
[  399.051929]  __x64_sys_fsync+0x18/0x30

In my test, 'need_resched()' (which is imported by 590dca3a71 "fs-writeback:
unplug before cond_resched in writeback_sb_inodes") in function
'writeback_sb_inodes()' seldom comes true, unless cond_resched() is deleted
from write_cache_pages().

Fix it by correcting wrote number according number of skipped pages
in writeback_sb_inodes().

Goto Link to find a reproducer.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215837
Cc: stable@vger.kernel.org # v4.3
Signed-off-by: Zhihao Cheng &lt;chengzhihao1@huawei.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20220510133805.1988292-1-chengzhihao1@huawei.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: fix obtain a reference to a freeing memcg css</title>
<updated>2021-07-14T14:56:31+00:00</updated>
<author>
<name>Muchun Song</name>
<email>songmuchun@bytedance.com</email>
</author>
<published>2021-04-02T09:11:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6939c39a4106199877f93c96d931cff45c0d1dad'/>
<id>urn:sha1:6939c39a4106199877f93c96d931cff45c0d1dad</id>
<content type='text'>
[ Upstream commit 8b0ed8443ae6458786580d36b7d5f8125535c5d4 ]

The caller of wb_get_create() should pin the memcg, because
wb_get_create() relies on this guarantee. The rcu read lock
only can guarantee that the memcg css returned by css_from_id()
cannot be released, but the reference of the memcg can be zero.

  rcu_read_lock()
  memcg_css = css_from_id()
  wb_get_create(memcg_css)
      cgwb_create(memcg_css)
          // css_get can change the ref counter from 0 back to 1
          css_get(memcg_css)
  rcu_read_unlock()

Fix it by holding a reference to the css before calling
wb_get_create(). This is not a problem I encountered in the
real world. Just the result of a code review.

Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Link: https://lore.kernel.org/r/20210402091145.80635-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>writeback, cgroup: increment isw_nr_in_flight before grabbing an inode</title>
<updated>2021-07-14T14:56:02+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2021-06-29T02:35:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0c1d1517d61610d343f3bee0602c21a858f67114'/>
<id>urn:sha1:0c1d1517d61610d343f3bee0602c21a858f67114</id>
<content type='text'>
[ Upstream commit 8826ee4fe75051f8cbfa5d4a9aa70565938e724c ]

isw_nr_in_flight is used to determine whether the inode switch queue
should be flushed from the umount path.  Currently it's increased after
grabbing an inode and even scheduling the switch work.  It means the
umount path can walk past cleanup_offline_cgwb() with active inode
references, which can result in a "Busy inodes after unmount." message and
use-after-free issues (with inode-&gt;i_sb which gets freed).

Fix it by incrementing isw_nr_in_flight before doing anything with the
inode and decrementing in the case when switching wasn't scheduled.

The problem hasn't yet been seen in the real life and was discovered by
Jan Kara by looking into the code.

Link: https://lkml.kernel.org/r/20210608230225.2078447-4-guro@fb.com
Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Suggested-by: Jan Kara &lt;jack@suse.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block_dump: remove block_dump feature in mark_inode_dirty()</title>
<updated>2021-07-14T14:55:58+00:00</updated>
<author>
<name>zhangyi (F)</name>
<email>yi.zhang@huawei.com</email>
</author>
<published>2021-03-13T03:01:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f58625bf2ca9d8ecdd02c8021d754a7d437c207f'/>
<id>urn:sha1:f58625bf2ca9d8ecdd02c8021d754a7d437c207f</id>
<content type='text'>
[ Upstream commit 12e0613715e1cf305fffafaf0e89d810d9a85cc0 ]

block_dump is an old debugging interface, one of it's functions is used
to print the information about who write which file on disk. If we
enable block_dump through /proc/sys/vm/block_dump and turn on debug log
level, we can gather information about write process name, target file
name and disk from kernel message. This feature is realized in
block_dump___mark_inode_dirty(), it print above information into kernel
message directly when marking inode dirty, so it is noisy and can easily
trigger log storm. At the same time, get the dentry refcount is also not
safe, we found it will lead to deadlock on ext4 file system with
data=journal mode.

After tracepoints has been introduced into the kernel, we got a
tracepoint in __mark_inode_dirty(), which is a better replacement of
block_dump___mark_inode_dirty(). The only downside is that it only trace
the inode number and not a file name, but it probably doesn't matter
because the original printed file name in block_dump is not accurate in
some cases, and we can still find it through the inode number and device
id. So this patch delete the dirting inode part of block_dump feature.

Signed-off-by: zhangyi (F) &lt;yi.zhang@huawei.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20210313030146.2882027-2-yi.zhang@huawei.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: fix lazytime expiration handling in __writeback_single_inode()</title>
<updated>2021-01-27T10:54:53+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2021-01-12T19:02:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=13ef6bccab397c02d5a48d236316fd5f626f8b01'/>
<id>urn:sha1:13ef6bccab397c02d5a48d236316fd5f626f8b01</id>
<content type='text'>
commit 1e249cb5b7fc09ff216aa5a12f6c302e434e88f9 upstream.

When lazytime is enabled and an inode is being written due to its
in-memory updated timestamps having expired, either due to a sync() or
syncfs() system call or due to dirtytime_expire_interval having elapsed,
the VFS needs to inform the filesystem so that the filesystem can copy
the inode's timestamps out to the on-disk data structures.

This is done by __writeback_single_inode() calling
mark_inode_dirty_sync(), which then calls -&gt;dirty_inode(I_DIRTY_SYNC).

However, this occurs after __writeback_single_inode() has already
cleared the dirty flags from -&gt;i_state.  This causes two bugs:

- mark_inode_dirty_sync() redirties the inode, causing it to remain
  dirty.  This wastefully causes the inode to be written twice.  But
  more importantly, it breaks cases where sync_filesystem() is expected
  to clean dirty inodes.  This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
  ioctl (as reported at
  https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
  as possibly filesystem freezing (freeze_super()).

- Since -&gt;i_state doesn't contain I_DIRTY_TIME when -&gt;dirty_inode() is
  called from __writeback_single_inode() for lazytime expiration,
  xfs_fs_dirty_inode() ignores the notification.  (XFS only cares about
  lazytime expirations, and it assumes that i_state will contain
  I_DIRTY_TIME during those.)  Therefore, lazy timestamps aren't
  persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS.

Fix this by moving the call to mark_inode_dirty_sync() to earlier in
__writeback_single_inode(), before the dirty flags are cleared from
i_state.  This makes filesystems be properly notified of the timestamp
expiration, and it avoids incorrectly redirtying the inode.

This fixes xfstest generic/580 (which tests
FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
enabled.  It also fixes the new lazytime xfstest I've proposed, which
reproduces the above-mentioned XFS bug
(https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).

Alternatively, we could call -&gt;dirty_inode(I_DIRTY_SYNC) directly.  But
due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
right thing to do because mark_inode_dirty_sync() now knows not to move
the inode to a writeback list if it is currently queued for sync.

Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Cc: stable@vger.kernel.org
Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback")
Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.org
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block</title>
<updated>2020-10-13T19:12:44+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-10-13T19:12:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3ad11d7ac8872b1c8da54494721fad8907ee41f7'/>
<id>urn:sha1:3ad11d7ac8872b1c8da54494721fad8907ee41f7</id>
<content type='text'>
Pull block updates from Jens Axboe:

 - Series of merge handling cleanups (Baolin, Christoph)

 - Series of blk-throttle fixes and cleanups (Baolin)

 - Series cleaning up BDI, seperating the block device from the
   backing_dev_info (Christoph)

 - Removal of bdget() as a generic API (Christoph)

 - Removal of blkdev_get() as a generic API (Christoph)

 - Cleanup of is-partition checks (Christoph)

 - Series reworking disk revalidation (Christoph)

 - Series cleaning up bio flags (Christoph)

 - bio crypt fixes (Eric)

 - IO stats inflight tweak (Gabriel)

 - blk-mq tags fixes (Hannes)

 - Buffer invalidation fixes (Jan)

 - Allow soft limits for zone append (Johannes)

 - Shared tag set improvements (John, Kashyap)

 - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

 - DM no-wait support (Mike, Konstantin)

 - Request allocation improvements (Ming)

 - Allow md/dm/bcache to use IO stat helpers (Song)

 - Series improving blk-iocost (Tejun)

 - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
   Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
  block: fix uapi blkzoned.h comments
  blk-mq: move cancel of hctx-&gt;run_work to the front of blk_exit_queue
  blk-mq: get rid of the dead flush handle code path
  block: get rid of unnecessary local variable
  block: fix comment and add lockdep assert
  blk-mq: use helper function to test hw stopped
  block: use helper function to test queue register
  block: remove redundant mq check
  block: invoke blk_mq_exit_sched no matter whether have .exit_sched
  percpu_ref: don't refer to ref-&gt;data if it isn't allocated
  block: ratelimit handle_bad_sector() message
  blk-throttle: Re-use the throtl_set_slice_end()
  blk-throttle: Open code __throtl_de/enqueue_tg()
  blk-throttle: Move service tree validation out of the throtl_rb_first()
  blk-throttle: Move the list operation after list validation
  blk-throttle: Fix IO hang for a corner case
  blk-throttle: Avoid tracking latency if low limit is invalid
  blk-throttle: Avoid getting the current time if tg-&gt;last_finish_time is 0
  blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
  block: Remove redundant 'return' statement
  ...
</content>
</entry>
</feed>
