<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/mm/page-writeback.c, branch v5.15.209</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.15.209</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.15.209'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-06-27T10:05:26+00:00</updated>
<entry>
<title>mm: fix ratelimit_pages update error in dirty_ratio_handler()</title>
<updated>2025-06-27T10:05:26+00:00</updated>
<author>
<name>Jinliang Zheng</name>
<email>alexjlzheng@tencent.com</email>
</author>
<published>2025-04-15T09:02:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6fe586c14d8f44f15f20c0963976c7b26876f6a4'/>
<id>urn:sha1:6fe586c14d8f44f15f20c0963976c7b26876f6a4</id>
<content type='text'>
commit f83f362d40ccceb647f7d80eb92206733d76a36b upstream.

In dirty_ratio_handler(), vm_dirty_bytes must be set to zero before
calling writeback_set_ratelimit(), as global_dirty_limits() always
prioritizes the value of vm_dirty_bytes.

It's domain_dirty_limits() that's relevant here, not node_dirty_ok:

  dirty_ratio_handler
    writeback_set_ratelimit
      global_dirty_limits(&amp;dirty_thresh)           &lt;- ratelimit_pages based on dirty_thresh
        domain_dirty_limits
          if (bytes)                               &lt;- bytes = vm_dirty_bytes &lt;--------+
            thresh = f1(bytes)                     &lt;- prioritizes vm_dirty_bytes      |
          else                                                                        |
            thresh = f2(ratio)                                                        |
      ratelimit_pages = f3(dirty_thresh)                                              |
    vm_dirty_bytes = 0                             &lt;- it's late! ---------------------+

This causes ratelimit_pages to still use the value calculated based on
vm_dirty_bytes, which is wrong now.


The impact visible to userspace is difficult to capture directly because
there is no procfs/sysfs interface exported to user space.  However, it
will have a real impact on the balance of dirty pages.

For example:

1. On default, we have vm_dirty_ratio=40, vm_dirty_bytes=0

2. echo 8192 &gt; dirty_bytes, then vm_dirty_bytes=8192,
   vm_dirty_ratio=0, and ratelimit_pages is calculated based on
   vm_dirty_bytes now.

3. echo 20 &gt; dirty_ratio, then since vm_dirty_bytes is not reset to
   zero when writeback_set_ratelimit() -&gt; global_dirty_limits() -&gt;
   domain_dirty_limits() is called, reallimit_pages is still calculated
   based on vm_dirty_bytes instead of vm_dirty_ratio.  This does not
   conform to the actual intent of the user.

Link: https://lkml.kernel.org/r/20250415090232.7544-1-alexjlzheng@tencent.com
Fixes: 9d823e8f6b1b ("writeback: per task dirty rate limit")
Signed-off-by: Jinliang Zheng &lt;alexjlzheng@tencent.com&gt;
Reviewed-by: MengEn Sun &lt;mengensun@tencent.com&gt;
Cc: Andrea Righi &lt;andrea@betterlinux.com&gt;
Cc: Fenggaung Wu &lt;fengguang.wu@intel.com&gt;
Cc: Jinliang Zheng &lt;alexjlzheng@tencent.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"</title>
<updated>2024-07-18T11:07:33+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2024-06-21T14:42:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2820005edae13b140f2d54267d1bd6bb23915f59'/>
<id>urn:sha1:2820005edae13b140f2d54267d1bd6bb23915f59</id>
<content type='text'>
commit 30139c702048f1097342a31302cbd3d478f50c63 upstream.

Patch series "mm: Avoid possible overflows in dirty throttling".

Dirty throttling logic assumes dirty limits in page units fit into
32-bits.  This patch series makes sure this is true (see patch 2/2 for
more details).


This patch (of 2):

This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.

The commit is broken in several ways.  Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh &gt;= 1&lt;&lt;32 (which is actually common - the
default settings with 4GB of RAM will trigger this).  Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs.  We have
div64_ul() in case we want to be safe &amp; cheap.  Thirdly, if dirty
thresholds are larger than 1&lt;&lt;32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.

Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-By: Zach O'Keefe &lt;zokeefe@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: avoid overflows in dirty throttling logic</title>
<updated>2024-07-18T11:07:32+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2024-06-21T14:42:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a25e8536184516b55ef89ab91dd2eea429de28d2'/>
<id>urn:sha1:a25e8536184516b55ef89ab91dd2eea429de28d2</id>
<content type='text'>
commit 385d838df280eba6c8680f9777bfa0d0bfe7e8b2 upstream.

The dirty throttling logic is interspersed with assumptions that dirty
limits in PAGE_SIZE units fit into 32-bit (so that various multiplications
fit into 64-bits).  If limits end up being larger, we will hit overflows,
possible divisions by 0 etc.  Fix these problems by never allowing so
large dirty limits as they have dubious practical value anyway.  For
dirty_bytes / dirty_background_bytes interfaces we can just refuse to set
so large limits.  For dirty_ratio / dirty_background_ratio it isn't so
simple as the dirty limit is computed from the amount of available memory
which can change due to memory hotplug etc.  So when converting dirty
limits from ratios to numbers of pages, we just don't allow the result to
exceed UINT_MAX.

This is root-only triggerable problem which occurs when the operator
sets dirty limits to &gt;16 TB.

Link: https://lkml.kernel.org/r/20240621144246.11148-2-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reported-by: Zach O'Keefe &lt;zokeefe@google.com&gt;
Reviewed-By: Zach O'Keefe &lt;zokeefe@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again</title>
<updated>2024-02-23T07:55:03+00:00</updated>
<author>
<name>Zach O'Keefe</name>
<email>zokeefe@google.com</email>
</author>
<published>2024-01-18T18:19:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5099871b370335809c0fd1abad74d9c7c205d43f'/>
<id>urn:sha1:5099871b370335809c0fd1abad74d9c7c205d43f</id>
<content type='text'>
commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78 upstream.

(struct dirty_throttle_control *)-&gt;thresh is an unsigned long, but is
passed as the u32 divisor argument to div_u64().  On architectures where
unsigned long is 64 bytes, the argument will be implicitly truncated.

Use div64_u64() instead of div_u64() so that the value used in the "is
this a safe division" check is the same as the divisor.

Also, remove redundant cast of the numerator to u64, as that should happen
implicitly.

This would be difficult to exploit in memcg domain, given the ratio-based
arithmetic domain_drity_limits() uses, but is much easier in global
writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using e.g.
vm.dirty_bytes=(1&lt;&lt;32)*PAGE_SIZE so that dtc-&gt;thresh == (1&lt;&lt;32)

Link: https://lkml.kernel.org/r/20240118181954.1415197-1-zokeefe@google.com
Fixes: f6789593d5ce ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()")
Signed-off-by: Zach O'Keefe &lt;zokeefe@google.com&gt;
Cc: Maxim Patlasov &lt;MPatlasov@parallels.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: avoid use-after-free after removing device</title>
<updated>2022-08-31T15:16:47+00:00</updated>
<author>
<name>Khazhismel Kumykov</name>
<email>khazhy@chromium.org</email>
</author>
<published>2022-08-01T15:50:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f96b9f7c1676923bce871e728bb49c0dfa5013cc'/>
<id>urn:sha1:f96b9f7c1676923bce871e728bb49c0dfa5013cc</id>
<content type='text'>
commit f87904c075515f3e1d8f4a7115869d3b914674fd upstream.

When a disk is removed, bdi_unregister gets called to stop further
writeback and wait for associated delayed work to complete.  However,
wb_inode_writeback_end() may schedule bandwidth estimation dwork after
this has completed, which can result in the timer attempting to access the
just freed bdi_writeback.

Fix this by checking if the bdi_writeback is alive, similar to when
scheduling writeback work.

Since this requires wb-&gt;work_lock, and wb_inode_writeback_end() may get
called from interrupt, switch wb-&gt;work_lock to an irqsafe lock.

Link: https://lkml.kernel.org/r/20220801155034.3772543-1-khazhy@google.com
Fixes: 45a2966fd641 ("writeback: fix bandwidth estimate for spiky workload")
Signed-off-by: Khazhismel Kumykov &lt;khazhy@google.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Michael Stapelberg &lt;stapelberg+linux@google.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2021-09-03T17:08:28+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-09-03T17:08:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=14726903c835101cd8d0a703b609305094350d61'/>
<id>urn:sha1:14726903c835101cd8d0a703b609305094350d61</id>
<content type='text'>
Merge misc updates from Andrew Morton:
 "173 patches.

  Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
  pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
  bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
  hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
  oom-kill, migration, ksm, percpu, vmstat, and madvise)"

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (173 commits)
  mm/madvise: add MADV_WILLNEED to process_madvise()
  mm/vmstat: remove unneeded return value
  mm/vmstat: simplify the array size calculation
  mm/vmstat: correct some wrong comments
  mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
  selftests: vm: add COW time test for KSM pages
  selftests: vm: add KSM merging time test
  mm: KSM: fix data type
  selftests: vm: add KSM merging across nodes test
  selftests: vm: add KSM zero page merging test
  selftests: vm: add KSM unmerge test
  selftests: vm: add KSM merge test
  mm/migrate: correct kernel-doc notation
  mm: wire up syscall process_mrelease
  mm: introduce process_mrelease system call
  memblock: make memblock_find_in_range method private
  mm/mempolicy.c: use in_task() in mempolicy_slab_node()
  mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
  mm/mempolicy: advertise new MPOL_PREFERRED_MANY
  mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
  ...
</content>
</entry>
<entry>
<title>writeback: use READ_ONCE for unlocked reads of writeback stats</title>
<updated>2021-09-03T16:58:10+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2021-09-02T21:53:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=20792ebf3eeb828a692b29f3000673cb9ca83c3a'/>
<id>urn:sha1:20792ebf3eeb828a692b29f3000673cb9ca83c3a</id>
<content type='text'>
We do some unlocked reads of writeback statistics like
avg_write_bandwidth, dirty_ratelimit, or bw_time_stamp.  Generally we are
fine with getting somewhat out-of-date values but actually getting
different values in various parts of the functions because the compiler
decided to reload value from original memory location could confuse
calculations.  Use READ_ONCE for these unlocked accesses and WRITE_ONCE
for the updates to be on the safe side.

Link: https://lkml.kernel.org/r/20210713104716.22868-5-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Michael Stapelberg &lt;stapelberg+linux@google.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: rename domain_update_bandwidth()</title>
<updated>2021-09-03T16:58:10+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2021-09-02T21:53:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=42dd235cb15c06c8446f9aed695648a3c2eb8d11'/>
<id>urn:sha1:42dd235cb15c06c8446f9aed695648a3c2eb8d11</id>
<content type='text'>
Rename domain_update_bandwidth() to domain_update_dirty_limit().  The
original name is a misnomer.  The function has nothing to do with a
bandwidth, it updates dirty limits.

Link: https://lkml.kernel.org/r/20210713104716.22868-4-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Michael Stapelberg &lt;stapelberg+linux@google.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: fix bandwidth estimate for spiky workload</title>
<updated>2021-09-03T16:58:10+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2021-09-02T21:53:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=45a2966fd64147518dc5bca25f447bd0fb5359ac'/>
<id>urn:sha1:45a2966fd64147518dc5bca25f447bd0fb5359ac</id>
<content type='text'>
Michael Stapelberg has reported that for workload with short big spikes of
writes (GCC linker seem to trigger this frequently) the write throughput
is heavily underestimated and tends to steadily sink until it reaches
zero.  This has rather bad impact on writeback throttling (causing
stalls).  The problem is that writeback throughput estimate gets updated
at most once per 200 ms.  One update happens early after we submit pages
for writeback (at that point writeout of only small fraction of pages is
completed and thus observed throughput is tiny).  Next update happens only
during the next write spike (updates happen only from inode writeback and
dirty throttling code) and if that is more than 1s after previous spike,
we decide system was idle and just ignore whatever was written until this
moment.

Fix the problem by making sure writeback throughput estimate is also
updated shortly after writeback completes to get reasonable estimate of
throughput for spiky workloads.

[jack@suse.cz: avoid division by 0 in wb_update_dirty_ratelimit()]

Link: https://lore.kernel.org/lkml/20210617095309.3542373-1-stapelberg+linux@google.com
Link: https://lkml.kernel.org/r/20210713104716.22868-3-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reported-by: Michael Stapelberg &lt;stapelberg+linux@google.com&gt;
Tested-by: Michael Stapelberg &lt;stapelberg+linux@google.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: reliably update bandwidth estimation</title>
<updated>2021-09-03T16:58:10+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2021-09-02T21:53:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fee468fdf41cdf36ba6b5a780e2474d0a3e066ac'/>
<id>urn:sha1:fee468fdf41cdf36ba6b5a780e2474d0a3e066ac</id>
<content type='text'>
Currently we trigger writeback bandwidth estimation from
balance_dirty_pages() and from wb_writeback().  However neither of these
need to trigger when the system is relatively idle and writeback is
triggered e.g.  from fsync(2).  Make sure writeback estimates happen
reliably by triggering them from do_writepages().

Link: https://lkml.kernel.org/r/20210713104716.22868-2-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Michael Stapelberg &lt;stapelberg+linux@google.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
