<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/ocfs2/suballoc.h, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-05-18T06:24:54+00:00</updated>
<entry>
<title>ocfs2: fix the issue with discontiguous allocation in the global_bitmap</title>
<updated>2025-05-18T06:24:54+00:00</updated>
<author>
<name>Heming Zhao</name>
<email>heming.zhao@suse.com</email>
</author>
<published>2025-04-14T06:01:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1b3b9158521a50f79e0f9b5435a05822cf67d998'/>
<id>urn:sha1:1b3b9158521a50f79e0f9b5435a05822cf67d998</id>
<content type='text'>
commit bd1261b16d9131d79723d982d54295e7f309797a upstream.

commit 4eb7b93e0310 ("ocfs2: improve write IO performance when
fragmentation is high") introduced another regression.

The following ocfs2-test case can trigger this issue:
&gt; discontig_runner.sh =&gt; activate_discontig_bg.sh =&gt; resv_unwritten:
&gt; ${RESV_UNWRITTEN_BIN} -f ${WORK_PLACE}/large_testfile -s 0 -l \
&gt; $((${FILE_MAJOR_SIZE_M}*1024*1024))

In my env, test disk size (by "fdisk -l &lt;dev&gt;"):
&gt; 53687091200 bytes, 104857600 sectors.

Above command is:
&gt; /usr/local/ocfs2-test/bin/resv_unwritten -f \
&gt; /mnt/ocfs2/ocfs2-activate-discontig-bg-dir/large_testfile -s 0 -l \
&gt; 53187969024

Error log:
&gt; [*] Reserve 50724M space for a LARGE file, reserve 200M space for future test.
&gt; ioctl error 28: "No space left on device"
&gt; resv allocation failed Unknown error -1
&gt; reserve unwritten region from 0 to 53187969024.

Call flow:
__ocfs2_change_file_space //by ioctl OCFS2_IOC_RESVSP64
 ocfs2_allocate_unwritten_extents //start:0 len:53187969024
  while()
   + ocfs2_get_clusters //cpos:0, alloc_size:1623168 (cluster number)
   + ocfs2_extend_allocation
     + ocfs2_lock_allocators
     |  + choose OCFS2_AC_USE_MAIN &amp; ocfs2_cluster_group_search
     |
     + ocfs2_add_inode_data
        ocfs2_add_clusters_in_btree
         __ocfs2_claim_clusters
          ocfs2_claim_suballoc_bits
          + During the allocation of the final part of the large file
	    (after ~47GB), no chain had the required contiguous
            bits_wanted. Consequently, the allocation failed.

How to fix:
When OCFS2 is encountering fragmented allocation, the file system should
stop attempting bits_wanted contiguous allocation and instead provide the
largest available contiguous free bits from the cluster groups.

Link: https://lkml.kernel.org/r/20250414060125.19938-2-heming.zhao@suse.com
Fixes: 4eb7b93e0310 ("ocfs2: improve write IO performance when fragmentation is high")
Signed-off-by: Heming Zhao &lt;heming.zhao@suse.com&gt;
Reported-by: Gautham Ananthakrishna &lt;gautham.ananthakrishna@oracle.com&gt;
Reviewed-by: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
Cc: Mark Fasheh &lt;mark@fasheh.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Junxiao Bi &lt;junxiao.bi@oracle.com&gt;
Cc: Changwei Ge &lt;gechangwei@live.cn&gt;
Cc: Jun Piao &lt;piaojun@huawei.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ocfs2: improve write IO performance when fragmentation is high</title>
<updated>2024-04-26T04:07:03+00:00</updated>
<author>
<name>Heming Zhao</name>
<email>heming.zhao@suse.com</email>
</author>
<published>2024-03-28T12:52:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4eb7b93e03101fd3f35e69affe566e4b1e3e3dca'/>
<id>urn:sha1:4eb7b93e03101fd3f35e69affe566e4b1e3e3dca</id>
<content type='text'>
The group_search function ocfs2_cluster_group_search() should
bypass groups with insufficient space to avoid unnecessary
searches.

This patch is particularly useful when ocfs2 is handling huge
number small files, and volume fragmentation is very high.
In this case, ocfs2 is busy with looking up available la window
from //global_bitmap.

This patch introduces a new member in the Group Description (gd)
struct called 'bg_contig_free_bits', representing the max
contigous free bits in this gd. When ocfs2 allocates a new
la window from //global_bitmap, 'bg_contig_free_bits' helps
expedite the search process.

Let's image below path.

1. la state (-&gt;local_alloc_state) is set THROTTLED or DISABLED.

2. when user delete a large file and trigger
   ocfs2_local_alloc_seen_free_bits set osb-&gt;local_alloc_state
   unconditionally.

3. a write IOs thread run and trigger the worst performance path

```
ocfs2_reserve_clusters_with_limit
 ocfs2_reserve_local_alloc_bits
  ocfs2_local_alloc_slide_window //[1]
   + ocfs2_local_alloc_reserve_for_window //[2]
   + ocfs2_local_alloc_new_window //[3]
      ocfs2_recalc_la_window
```

[1]:
will be called when la window bits used up.

[2]:
under la state is ENABLED, and this func only check global_bitmap
free bits, it will succeed in general.

[3]:
will use the default la window size to search clusters then fail.
ocfs2_recalc_la_window attempts other la window sizes.
the timing complexity is O(n^4), resulting in a significant time
cost for scanning global bitmap. This leads to a dramatic slowdown
in write I/Os (e.g., user space 'dd').

i.e.
an ocfs2 partition size: 1.45TB, cluster size: 4KB,
la window default size: 106MB.
The partition is fragmentation by creating &amp; deleting huge mount of
small files.

before this patch, the timing of [3] should be
(the number got from real world):
- la window size change order (size: MB):
  106, 53, 26.5, 13, 6.5, 3.25, 1.6, 0.8
  only 0.8MB succeed, 0.8MB also triggers la window to disable.
  ocfs2_local_alloc_new_window retries 8 times, first 7 times totally
  runs in worst case.
- group chain number: 242
  ocfs2_claim_suballoc_bits calls for-loop 242 times
- each chain has 49 block group
  ocfs2_search_chain calls while-loop 49 times
- each bg has 32256 blocks
  ocfs2_block_group_find_clear_bits calls while-loop for 32256 bits.
  for ocfs2_find_next_zero_bit uses ffz() to find zero bit, let's use
  (32256/64) (this is not worst value) for timing calucation.

the loop times: 7*242*49*(32256/64) = 41835024 (~42 million times)

In the worst case, user space writes 1MB data will trigger 42M scanning
times.

under this patch, the timing is '7*242*49 = 83006', reduced by three
orders of magnitude.

Link: https://lkml.kernel.org/r/20240328125203.20892-2-heming.zhao@suse.com
Signed-off-by: Heming Zhao &lt;heming.zhao@suse.com&gt;
Reviewed-by: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
Cc: Changwei Ge &lt;gechangwei@live.cn&gt;
Cc: Gang He &lt;ghe@suse.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Jun Piao &lt;piaojun@huawei.com&gt;
Cc: Junxiao Bi &lt;junxiao.bi@oracle.com&gt;
Cc: Mark Fasheh &lt;mark@fasheh.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/ocfs2/suballoc.h: fix spelling typo in comment</title>
<updated>2022-10-03T21:21:42+00:00</updated>
<author>
<name>Jiangshan Yi</name>
<email>yijiangshan@kylinos.cn</email>
</author>
<published>2022-09-05T06:16:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1c320cfa17701cbf98085b34ad6159e9f41e5268'/>
<id>urn:sha1:1c320cfa17701cbf98085b34ad6159e9f41e5268</id>
<content type='text'>
Fix spelling typo in comment.

Link: https://lkml.kernel.org/r/20220905061656.1829179-1-13667453960@163.com
Signed-off-by: Jiangshan Yi &lt;yijiangshan@kylinos.cn&gt;
Reported-by: k2ci &lt;kernel-bot@kylinos.cn&gt;
Acked-by: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>treewide: remove editor modelines and cruft</title>
<updated>2021-05-07T07:26:34+00:00</updated>
<author>
<name>Masahiro Yamada</name>
<email>masahiroy@kernel.org</email>
</author>
<published>2021-05-07T01:06:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fa60ce2cb4506701c43bd4cf3ca23d970daf1b9c'/>
<id>urn:sha1:fa60ce2cb4506701c43bd4cf3ca23d970daf1b9c</id>
<content type='text'>
The section "19) Editor modelines and other cruft" in
Documentation/process/coding-style.rst clearly says, "Do not include any
of these in source files."

I recently receive a patch to explicitly add a new one.

Let's do treewide cleanups, otherwise some people follow the existing code
and attempt to upstream their favoriate editor setups.

It is even nicer if scripts/checkpatch.pl can check it.

If we like to impose coding style in an editor-independent manner, I think
editorconfig (patch [1]) is a saner solution.

[1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/

Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
Acked-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Reviewed-by: Miguel Ojeda &lt;ojeda@kernel.org&gt;	[auxdisplay]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ocfs2: suballoc.h: delete a duplicated word</title>
<updated>2020-08-07T18:33:21+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2020-08-07T06:17:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7eba77d59e0593b91af0c438594cb70700893b18'/>
<id>urn:sha1:7eba77d59e0593b91af0c438594cb70700893b18</id>
<content type='text'>
Drop the repeated word "is" in a comment.

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
Cc: Mark Fasheh &lt;mark@fasheh.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
Link: http://lkml.kernel.org/r/20200720001421.28823-1-rdunlap@infradead.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 145</title>
<updated>2019-05-30T18:25:18+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-24T10:04:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=328970de0e39d596e0ef44080e7642224b29ecde'/>
<id>urn:sha1:328970de0e39d596e0ef44080e7642224b29ecde</id>
<content type='text'>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version this program is distributed in the
  hope that it will be useful but without any warranty without even
  the implied warranty of merchantability or fitness for a particular
  purpose see the gnu general public license for more details you
  should have received a copy of the gnu general public license along
  with this program if not write to the free software foundation inc
  59 temple place suite 330 boston ma 021110 1307 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 84 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Richard Fontana &lt;rfontana@redhat.com&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Reviewed-by: Kate Stewart &lt;kstewart@linuxfoundation.org&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190524100844.756442981@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ocfs2: rollback alloc_dinode counts when ocfs2_block_group_set_bits() failed</title>
<updated>2014-04-03T23:20:56+00:00</updated>
<author>
<name>Younger Liu</name>
<email>younger.liucn@gmail.com</email>
</author>
<published>2014-04-03T21:47:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db66c71577d525c0cd65e66ff675747565783ba4'/>
<id>urn:sha1:db66c71577d525c0cd65e66ff675747565783ba4</id>
<content type='text'>
After updating alloc_dinode counts in ocfs2_alloc_dinode_update_counts(),
if ocfs2_alloc_dinode_update_bitmap() failed, there is a rare case that
some space may be lost.

So, roll back alloc_dinode counts when ocfs2_block_group_set_bits()
failed.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Younger Liu &lt;younger.liucn@gmail.com&gt;
Reviewed-by: Mark Fasheh &lt;mfasheh@suse.de&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ocfs2: remove redundant ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits()</title>
<updated>2014-01-22T00:19:42+00:00</updated>
<author>
<name>Younger Liu</name>
<email>liuyiyang@hisense.com</email>
</author>
<published>2014-01-21T23:48:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0a2fcd8988ac682f443fd5b0a7c48154a7b42ef2'/>
<id>urn:sha1:0a2fcd8988ac682f443fd5b0a7c48154a7b42ef2</id>
<content type='text'>
ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits() are
already provided in suballoc.c.  So, the same functions in
move_extents.c are not needed any more.

Declare the functions in suballoc.h and remove redundant functions in
move_extents.c.

Signed-off-by: Younger Liu &lt;liuyiyang@hisense.com&gt;
Cc: Younger Liu &lt;younger.liucn@gmail.com&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ocfs2: ac-&gt;ac_allow_chain_relink=0 won't disable group relink</title>
<updated>2013-02-28T03:10:09+00:00</updated>
<author>
<name>Xiaowei.Hu</name>
<email>xiaowei.hu@oracle.com</email>
</author>
<published>2013-02-28T01:02:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=309a85b6861fedbb48a22d45e0e079d1be993b3a'/>
<id>urn:sha1:309a85b6861fedbb48a22d45e0e079d1be993b3a</id>
<content type='text'>
ocfs2_block_group_alloc_discontig() disables chain relink by setting
ac-&gt;ac_allow_chain_relink = 0 because it grabs clusters from multiple
cluster groups.

It doesn't keep the credits for all chain relink,but
ocfs2_claim_suballoc_bits overrides this in this call trace:
ocfs2_block_group_claim_bits()-&gt;ocfs2_claim_clusters()-&gt;
__ocfs2_claim_clusters()-&gt;ocfs2_claim_suballoc_bits()
ocfs2_claim_suballoc_bits set ac-&gt;ac_allow_chain_relink = 1; then call
ocfs2_search_chain() one time and disable it again, and then we run out
of credits.

Fix is to allow relink by default and disable it in
ocfs2_block_group_alloc_discontig.

Without this patch, End-users will run into a crash due to run out of
credits, backtrace like this:

  RIP: 0010:[&lt;ffffffffa0808b14&gt;]  [&lt;ffffffffa0808b14&gt;]
  jbd2_journal_dirty_metadata+0x164/0x170 [jbd2]
  RSP: 0018:ffff8801b919b5b8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88022139ddc0 RCX: ffff880159f652d0
  RDX: ffff880178aa3000 RSI: ffff880159f652d0 RDI: ffff880087f09bf8
  RBP: ffff8801b919b5e8 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000001e00 R11: 00000000000150b0 R12: ffff880159f652d0
  R13: ffff8801a0cae908 R14: ffff880087f09bf8 R15: ffff88018d177800
  FS:  00007fc9b0b6b6e0(0000) GS:ffff88022fd40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 000000000040819c CR3: 0000000184017000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process dd (pid: 9945, threadinfo ffff8801b919a000, task ffff880149a264c0)
  Call Trace:
    ocfs2_journal_dirty+0x2f/0x70 [ocfs2]
    ocfs2_relink_block_group+0x111/0x480 [ocfs2]
    ocfs2_search_chain+0x455/0x9a0 [ocfs2]
    ...

Signed-off-by: Xiaowei.Hu &lt;xiaowei.hu@oracle.com&gt;
Reviewed-by: Srinivas Eeda &lt;srinivas.eeda@oracle.com&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ocfs2: allow return of new inode block location before allocation of the inode</title>
<updated>2010-09-08T06:25:59+00:00</updated>
<author>
<name>Mark Fasheh</name>
<email>mfasheh@suse.com</email>
</author>
<published>2010-08-13T22:15:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e49e27674d1dd2717ad90b21ece8f83102153315'/>
<id>urn:sha1:e49e27674d1dd2717ad90b21ece8f83102153315</id>
<content type='text'>
This allows code which needs to know the eventual block number of an inode
but can't allocate it yet due to transaction or lock ordering. For example,
ocfs2_create_inode_in_orphan() currently gives a junk blkno for preparation
of the orphan dir because it can't yet know where the actual inode is placed
- that code is actually in ocfs2_mknod_locked. This is a problem when the
orphan dirs are indexed as the junk inode number will create an index entry
which goes unused (and fails the later removal from the orphan dir).  Now
with these interfaces, ocfs2_create_inode_in_orphan() can run the block
group search (and get back the inode block number) *before* any actual
allocation occurs.

Signed-off-by: Mark Fasheh &lt;mfasheh@suse.com&gt;
Signed-off-by: Tao Ma &lt;tao.ma@oracle.com&gt;
</content>
</entry>
</feed>
