<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/fs-writeback.c, branch v6.1.168</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-02-06T15:44:25+00:00</updated>
<entry>
<title>writeback: fix 100% CPU usage when dirtytime_expire_interval is 0</title>
<updated>2026-02-06T15:44:25+00:00</updated>
<author>
<name>Laveesh Bansal</name>
<email>laveeshb@laveeshbansal.com</email>
</author>
<published>2026-02-03T20:19:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a11e76dff023a08086d606fe1d8def6b5564b308'/>
<id>urn:sha1:a11e76dff023a08086d606fe1d8def6b5564b308</id>
<content type='text'>
[ Upstream commit 543467d6fe97e27e22a26e367fda972dbefebbff ]

When vm.dirtytime_expire_seconds is set to 0, wakeup_dirtytime_writeback()
schedules delayed work with a delay of 0, causing immediate execution.
The function then reschedules itself with 0 delay again, creating an
infinite busy loop that causes 100% kworker CPU usage.

Fix by:
- Only scheduling delayed work in wakeup_dirtytime_writeback() when
  dirtytime_expire_interval is non-zero
- Cancelling the delayed work in dirtytime_interval_handler() when
  the interval is set to 0
- Adding a guard in start_dirtytime_writeback() for defensive coding

Tested by booting kernel in QEMU with virtme-ng:
- Before fix: kworker CPU spikes to ~73%
- After fix: CPU remains at normal levels
- Setting interval back to non-zero correctly resumes writeback

Fixes: a2f4870697a5 ("fs: make sure the timestamps for lazytime inodes eventually get written")
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220227
Signed-off-by: Laveesh Bansal &lt;laveeshb@laveeshbansal.com&gt;
Link: https://patch.msgid.link/20260106145059.543282-2-laveeshb@laveeshbansal.com
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
[ adapted system_percpu_wq to system_wq for the workqueue used in dirtytime_interval_handler() ]
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: Avoid excessively long inode switching times</title>
<updated>2025-10-19T14:23:22+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2025-09-12T10:38:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db62b1e7b1d3f8e97b59f9f8e07829356d0ff3e4'/>
<id>urn:sha1:db62b1e7b1d3f8e97b59f9f8e07829356d0ff3e4</id>
<content type='text'>
[ Upstream commit 9a6ebbdbd41235ea3bc0c4f39e2076599b8113cc ]

With lazytime mount option enabled we can be switching many dirty inodes
on cgroup exit to the parent cgroup. The numbers observed in practice
when systemd slice of a large cron job exits can easily reach hundreds
of thousands or millions. The logic in inode_do_switch_wbs() which sorts
the inode into appropriate place in b_dirty list of the target wb
however has linear complexity in the number of dirty inodes thus overall
time complexity of switching all the inodes is quadratic leading to
workers being pegged for hours consuming 100% of the CPU and switching
inodes to the parent wb.

Simple reproducer of the issue:
  FILES=10000
  # Filesystem mounted with lazytime mount option
  MNT=/mnt/
  echo "Creating files and switching timestamps"
  for (( j = 0; j &lt; 50; j ++ )); do
      mkdir $MNT/dir$j
      for (( i = 0; i &lt; $FILES; i++ )); do
          echo "foo" &gt;$MNT/dir$j/file$i
      done
      touch -a -t 202501010000 $MNT/dir$j/file*
  done
  wait
  echo "Syncing and flushing"
  sync
  echo 3 &gt;/proc/sys/vm/drop_caches

  echo "Reading all files from a cgroup"
  mkdir /sys/fs/cgroup/unified/mycg1 || exit
  echo $$ &gt;/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit
  for (( j = 0; j &lt; 50; j ++ )); do
      cat /mnt/dir$j/file* &gt;/dev/null &amp;
  done
  wait
  echo "Switching wbs"
  # Now rmdir the cgroup after the script exits

We need to maintain b_dirty list ordering to keep writeback happy so
instead of sorting inode into appropriate place just append it at the
end of the list and clobber dirtied_time_when. This may result in inode
writeback starting later after cgroup switch however cgroup switches are
rare so it shouldn't matter much. Since the cgroup had write access to
the inode, there are no practical concerns of the possible DoS issues.

Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>writeback: Avoid softlockup when switching many inodes</title>
<updated>2025-10-19T14:23:22+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2025-09-12T10:38:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=68be7f61083768e88521492093139277c98ba87f'/>
<id>urn:sha1:68be7f61083768e88521492093139277c98ba87f</id>
<content type='text'>
[ Upstream commit 66c14dccd810d42ec5c73bb8a9177489dfd62278 ]

process_inode_switch_wbs_work() can be switching over 100 inodes to a
different cgroup. Since switching an inode requires counting all dirty &amp;
under-writeback pages in the address space of each inode, this can take
a significant amount of time. Add a possibility to reschedule after
processing each inode to avoid softlockups.

Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: writeback: fix use-after-free in __mark_inode_dirty()</title>
<updated>2025-09-09T16:54:12+00:00</updated>
<author>
<name>Jiufei Xue</name>
<email>jiufei.xue@samsung.com</email>
</author>
<published>2025-07-28T10:07:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1edc2feb9c759a9883dfe81cb5ed231412d8b2e4'/>
<id>urn:sha1:1edc2feb9c759a9883dfe81cb5ed231412d8b2e4</id>
<content type='text'>
[ Upstream commit d02d2c98d25793902f65803ab853b592c7a96b29 ]

An use-after-free issue occurred when __mark_inode_dirty() get the
bdi_writeback that was in the progress of switching.

CPU: 1 PID: 562 Comm: systemd-random- Not tainted 6.6.56-gb4403bd46a8e #1
......
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __mark_inode_dirty+0x124/0x418
lr : __mark_inode_dirty+0x118/0x418
sp : ffffffc08c9dbbc0
........
Call trace:
 __mark_inode_dirty+0x124/0x418
 generic_update_time+0x4c/0x60
 file_modified+0xcc/0xd0
 ext4_buffered_write_iter+0x58/0x124
 ext4_file_write_iter+0x54/0x704
 vfs_write+0x1c0/0x308
 ksys_write+0x74/0x10c
 __arm64_sys_write+0x1c/0x28
 invoke_syscall+0x48/0x114
 el0_svc_common.constprop.0+0xc0/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x40/0xe4
 el0t_64_sync_handler+0x120/0x12c
 el0t_64_sync+0x194/0x198

Root cause is:

systemd-random-seed                         kworker
----------------------------------------------------------------------
___mark_inode_dirty                     inode_switch_wbs_work_fn

  spin_lock(&amp;inode-&gt;i_lock);
  inode_attach_wb
  locked_inode_to_wb_and_lock_list
     get inode-&gt;i_wb
     spin_unlock(&amp;inode-&gt;i_lock);
     spin_lock(&amp;wb-&gt;list_lock)
  spin_lock(&amp;inode-&gt;i_lock)
  inode_io_list_move_locked
  spin_unlock(&amp;wb-&gt;list_lock)
  spin_unlock(&amp;inode-&gt;i_lock)
                                    spin_lock(&amp;old_wb-&gt;list_lock)
                                      inode_do_switch_wbs
                                        spin_lock(&amp;inode-&gt;i_lock)
                                        inode-&gt;i_wb = new_wb
                                        spin_unlock(&amp;inode-&gt;i_lock)
                                    spin_unlock(&amp;old_wb-&gt;list_lock)
                                    wb_put_many(old_wb, nr_switched)
                                      cgwb_release
                                      old wb released
  wb_wakeup_delayed() accesses wb,
  then trigger the use-after-free
  issue

Fix this race condition by holding inode spinlock until
wb_wakeup_delayed() finished.

Signed-off-by: Jiufei Xue &lt;jiufei.xue@samsung.com&gt;
Link: https://lore.kernel.org/20250728100715.3863241-1-jiufei.xue@samsung.com
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs</title>
<updated>2023-11-20T10:51:50+00:00</updated>
<author>
<name>Jingbo Xu</name>
<email>jefflexu@linux.alibaba.com</email>
</author>
<published>2023-10-14T12:55:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2351c03529b264e72e0d9b0af1a8d2bd1ccaea50'/>
<id>urn:sha1:2351c03529b264e72e0d9b0af1a8d2bd1ccaea50</id>
<content type='text'>
[ Upstream commit 6654408a33e6297d8e1d2773409431d487399b95 ]

The cgwb cleanup routine will try to release the dying cgwb by switching
the attached inodes.  It fetches the attached inodes from wb-&gt;b_attached
list, omitting the fact that inodes only with dirty timestamps reside in
wb-&gt;b_dirty_time list, which is the case when lazytime is enabled.  This
causes enormous zombie memory cgroup when lazytime is enabled, as inodes
with dirty timestamps can not be switched to a live cgwb for a long time.

It is reasonable not to switch cgwb for inodes with dirty data, as
otherwise it may break the bandwidth restrictions.  However since the
writeback of inode metadata is not accounted for, let's also switch
inodes with dirty timestamps to avoid zombie memory and block cgroups
when laztytime is enabled.

Fixes: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jingbo Xu &lt;jefflexu@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20231014125511.102978-1-jefflexu@linux.alibaba.com
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs-writeback: do not requeue a clean inode having skipped pages</title>
<updated>2023-10-25T10:03:09+00:00</updated>
<author>
<name>Chunhai Guo</name>
<email>guochunhai@vivo.com</email>
</author>
<published>2023-09-16T04:51:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c194e184a8996026b50298c56ab056152a9bf463'/>
<id>urn:sha1:c194e184a8996026b50298c56ab056152a9bf463</id>
<content type='text'>
[ Upstream commit be049c3a088d512187407b7fd036cecfab46d565 ]

When writing back an inode and performing an fsync on it concurrently, a
deadlock issue may arise as shown below. In each writeback iteration, a
clean inode is requeued to the wb-&gt;b_dirty queue due to non-zero
pages_skipped, without anything actually being written. This causes an
infinite loop and prevents the plug from being flushed, resulting in a
deadlock. We now avoid requeuing the clean inode to prevent this issue.

    wb_writeback        fsync (inode-Y)
blk_start_plug(&amp;plug)
for (;;) {
  iter i-1: some reqs with page-X added into plug-&gt;mq_list // f2fs node page-X with PG_writeback
                        filemap_fdatawrite
                          __filemap_fdatawrite_range // write inode-Y with sync_mode WB_SYNC_ALL
                           do_writepages
                            f2fs_write_data_pages
                             __f2fs_write_data_pages // wb_sync_req[DATA]++ for WB_SYNC_ALL
                              f2fs_write_cache_pages
                               f2fs_write_single_data_page
                                f2fs_do_write_data_page
                                 f2fs_outplace_write_data
                                  f2fs_update_data_blkaddr
                                   f2fs_wait_on_page_writeback
                                     wait_on_page_writeback // wait for f2fs node page-X
  iter i:
    progress = __writeback_inodes_wb(wb, work)
    . writeback_sb_inodes
    .   __writeback_single_inode // write inode-Y with sync_mode WB_SYNC_NONE
    .   . do_writepages
    .   .   f2fs_write_data_pages
    .   .   .  __f2fs_write_data_pages // skip writepages due to (wb_sync_req[DATA]&gt;0)
    .   .   .   wbc-&gt;pages_skipped += get_dirty_pages(inode) // wbc-&gt;pages_skipped = 1
    .   if (!(inode-&gt;i_state &amp; I_DIRTY_ALL)) // i_state = I_SYNC | I_SYNC_QUEUED
    .    total_wrote++;  // total_wrote = 1
    .   requeue_inode // requeue inode-Y to wb-&gt;b_dirty queue due to non-zero pages_skipped
    if (progress) // progress = 1
      continue;
  iter i+1:
      queue_io
      // similar process with iter i, infinite for-loop !
}
blk_finish_plug(&amp;plug)   // flush plug won't be called

Signed-off-by: Chunhai Guo &lt;guochunhai@vivo.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Message-Id: &lt;20230916045131.957929-1-guochunhai@vivo.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>writeback: fix call of incorrect macro</title>
<updated>2023-05-17T09:53:33+00:00</updated>
<author>
<name>Maxim Korotkov</name>
<email>korotkov.maxim.s@gmail.com</email>
</author>
<published>2023-01-19T10:44:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7c4c6e2a40757d7697541ebbb1a8f77919abb879'/>
<id>urn:sha1:7c4c6e2a40757d7697541ebbb1a8f77919abb879</id>
<content type='text'>
[ Upstream commit 3e46c89c74f2c38e5337d2cf44b0b551adff1cb4 ]

 the variable 'history' is of type u16, it may be an error
 that the hweight32 macro was used for it
 I guess macro hweight16 should be used

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 2a81490811d0 ("writeback: implement foreign cgroup inode detection")
Signed-off-by: Maxim Korotkov &lt;korotkov.maxim.s@gmail.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/r/20230119104443.3002-1-korotkov.maxim.s@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs</title>
<updated>2023-04-26T12:28:39+00:00</updated>
<author>
<name>Baokun Li</name>
<email>libaokun1@huawei.com</email>
</author>
<published>2023-04-10T13:08:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3e6bd2653ff86ee31fdbb821abe05af4d309aedf'/>
<id>urn:sha1:3e6bd2653ff86ee31fdbb821abe05af4d309aedf</id>
<content type='text'>
commit 1ba1199ec5747f475538c0d25a32804e5ba1dfde upstream.

KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
 &lt;TASK&gt;
 dump_stack_lvl+0x7f/0xc0
 print_report+0x2ba/0x340
 kasan_report+0xc4/0x120
 kasan_check_range+0x1b7/0x2e0
 __kasan_check_write+0x24/0x40
 bdi_split_work_to_wbs+0x5c5/0x7b0
 sync_inodes_sb+0x195/0x630
 sync_inodes_one_sb+0x3a/0x50
 iterate_supers+0x106/0x1b0
 ksys_sync+0x98/0x160
[...]
==================================================================

The race that causes the above issue is as follows:

           cpu1                     cpu2
-------------------------|-------------------------
inode_switch_wbs
 INIT_WORK(&amp;isw-&gt;work, inode_switch_wbs_work_fn)
 queue_rcu_work(isw_wq, &amp;isw-&gt;work)
 // queue_work async
  inode_switch_wbs_work_fn
   wb_put_many(old_wb, nr_switched)
    percpu_ref_put_many
     ref-&gt;data-&gt;release(ref)
     cgwb_release
      queue_work(cgwb_release_wq, &amp;wb-&gt;release_work)
      // queue_work async
       &amp;wb-&gt;release_work
       cgwb_release_workfn
                            ksys_sync
                             iterate_supers
                              sync_inodes_one_sb
                               sync_inodes_sb
                                bdi_split_work_to_wbs
                                 kmalloc(sizeof(*work), GFP_ATOMIC)
                                 // alloc memory failed
        percpu_ref_exit
         ref-&gt;data = NULL
         kfree(data)
                                 wb_get(wb)
                                  percpu_ref_get(&amp;wb-&gt;refcnt)
                                   percpu_ref_get_many(ref, 1)
                                    atomic_long_add(nr, &amp;ref-&gt;data-&gt;count)
                                     atomic64_add(i, v)
                                     // trigger null-ptr-deref

bdi_split_work_to_wbs() traverses &amp;bdi-&gt;wb_list to split work into all
wbs.  If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.

This issue was introduced in v4.3-rc7 (see fix tag1).  Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue.  For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb-&gt;state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.

To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.

Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li &lt;libaokun1@huawei.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andreas Dilger &lt;adilger.kernel@dilger.ca&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Hou Tao &lt;houtao1@huawei.com&gt;
Cc: yangerkun &lt;yangerkun@huawei.com&gt;
Cc: Zhang Yi &lt;yi.zhang@huawei.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>fs: do not update freeing inode i_io_list</title>
<updated>2022-11-22T22:00:00+00:00</updated>
<author>
<name>Svyatoslav Feldsherov</name>
<email>feldsherov@google.com</email>
</author>
<published>2022-11-15T20:20:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4e3c51f4e805291b057d12f5dda5aeb50a538dc4'/>
<id>urn:sha1:4e3c51f4e805291b057d12f5dda5aeb50a538dc4</id>
<content type='text'>
After commit cbfecb927f42 ("fs: record I_DIRTY_TIME even if inode
already has I_DIRTY_INODE") writeback_single_inode can push inode with
I_DIRTY_TIME set to b_dirty_time list. In case of freeing inode with
I_DIRTY_TIME set this can happen after deletion of inode from i_io_list
at evict. Stack trace is following.

evict
fat_evict_inode
fat_truncate_blocks
fat_flush_inodes
writeback_inode
sync_inode_metadata(inode, sync=0)
writeback_single_inode(inode, wbc) &lt;- wbc-&gt;sync_mode == WB_SYNC_NONE

This will lead to use after free in flusher thread.

Similar issue can be triggered if writeback_single_inode in the
stack trace update inode-&gt;i_io_list. Add explicit check to avoid it.

Fixes: cbfecb927f42 ("fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE")
Reported-by: syzbot+6ba92bd00d5093f7e371@syzkaller.appspotmail.com
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Svyatoslav Feldsherov &lt;feldsherov@google.com&gt;
Link: https://lore.kernel.org/r/20221115202001.324188-1-feldsherov@google.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE</title>
<updated>2022-09-30T03:02:00+00:00</updated>
<author>
<name>Lukas Czerner</name>
<email>lczerner@redhat.com</email>
</author>
<published>2022-08-25T10:06:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cbfecb927f429a6fa613d74b998496bd71e4438a'/>
<id>urn:sha1:cbfecb927f429a6fa613d74b998496bd71e4438a</id>
<content type='text'>
Currently the I_DIRTY_TIME will never get set if the inode already has
I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
true, however ext4 will only update the on-disk inode in
-&gt;dirty_inode(), not on actual writeback. As a result if the inode
already has I_DIRTY_INODE state by the time we get to
__mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
into on-disk inode and will not get updated until the next I_DIRTY_INODE
update, which might never come if we crash or get a power failure.

The problem can be reproduced on ext4 by running xfstest generic/622
with -o iversion mount option.

Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
I_DIRTY_INODE. Also make sure that the case is properly handled in
writeback_single_inode() as well. Additionally changes in
xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.

Thanks Jan Kara for suggestions on how to make this work properly.

Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: stable@kernel.org
Signed-off-by: Lukas Czerner &lt;lczerner@redhat.com&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/r/20220825100657.44217-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</content>
</entry>
</feed>
