<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git, branch v5.19.12</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.19.12</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.19.12'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2022-09-28T09:32:29+00:00</updated>
<entry>
<title>Linux 5.19.12</title>
<updated>2022-09-28T09:32:29+00:00</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2022-09-28T09:32:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=58df6af8cea3c5377c0220d9fb47cbf85a216f54'/>
<id>urn:sha1:58df6af8cea3c5377c0220d9fb47cbf85a216f54</id>
<content type='text'>
Link: https://lore.kernel.org/r/20220926100806.522017616@linuxfoundation.org
Tested-by: Holger Hoffstätte &lt;holger@applied-asynchrony.com&gt;
Tested-by: Ronald Warsow &lt;rwarsow@gmx.de&gt;
Tested-by: Fenil Jain &lt;fkjainco@gmail.com&gt;
Tested-by: Justin M. Forbes &lt;jforbes@fedoraproject.org&gt;
Tested-by: Florian Fainelli &lt;f.fainelli@gmail.com&gt;
Tested-by: Shuah Khan &lt;skhan@linuxfoundation.org&gt;
Tested-by: Zan Aziz &lt;zanaziz313@gmail.com&gt;
Tested-by: Ron Economos &lt;re@w6rz.net&gt;
Tested-by: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Tested-by: Linux Kernel Functional Testing &lt;lkft@linaro.org&gt;
Tested-by: Sudip Mukherjee &lt;sudip.mukherjee@codethink.co.uk&gt;
Tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Tested-by: Jiri Slaby &lt;jirislaby@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ext4: make directory inode spreading reflect flexbg size</title>
<updated>2022-09-28T09:32:28+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-09-08T09:21:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=547262c5b37393991d4f5ff3670718cd30c00a77'/>
<id>urn:sha1:547262c5b37393991d4f5ff3670718cd30c00a77</id>
<content type='text'>
commit 613c5a85898d1cd44e68f28d65eccf64a8ace9cf upstream.

Currently the Orlov inode allocator searches for free inodes for a
directory only in flex block groups with at most inodes_per_group/16
more directory inodes than average per flex block group. However with
growing size of flex block group this becomes unnecessarily strict.
Scale allowed difference from average directory count per flex block
group with flex block group size as we do with other metrics.

Tested-by: Stefan Wahren &lt;stefan.wahren@i2se.com&gt;
Tested-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Cc: stable@kernel.org
Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/r/20220908092136.11770-3-jack@suse.cz
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ext4: fixup possible uninitialized variable access in ext4_mb_choose_next_group_cr1()</title>
<updated>2022-09-28T09:32:28+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-09-22T09:09:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cdefe8dd61c9030f74e5f6b549af9ca81ddf561d'/>
<id>urn:sha1:cdefe8dd61c9030f74e5f6b549af9ca81ddf561d</id>
<content type='text'>
commit a078dff870136090b5779ca2831870a6c5539d36 upstream.

Variable 'grp' may be left uninitialized if there's no group with
suitable average fragment size (or larger). Fix the problem by
initializing it earlier.

Link: https://lore.kernel.org/r/20220922091542.pkhedytey7wzp5fi@quack3
Fixes: 83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
Cc: stable@kernel.org
Reported-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "block: freeze the queue earlier in del_gendisk"</title>
<updated>2022-09-28T09:32:28+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-09-19T14:40:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=48a12961e800aa404fefa845aa95098d18a30f19'/>
<id>urn:sha1:48a12961e800aa404fefa845aa95098d18a30f19</id>
<content type='text'>
commit 4c66a326b5ab784cddd72de07ac5b6210e9e1b06 upstream.

This reverts commit a09b314005f3a0956ebf56e01b3b80339df577cc.

Dusty Mabe reported consistent hang during CoreOS shutdown with a MD
RAID1 setup.  Although apparently similar hangs happened before,
and this patch most likely is not the root cause it made it much
more severe.  Revert it until we can figure out what is going on
with the md driver.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20220919144049.978907-1-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ext4: use buckets for cr 1 block scan instead of rbtree</title>
<updated>2022-09-28T09:32:28+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-09-08T09:21:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=398a0fdb38d9ab5e68023667cc5e6e3d109e2d6b'/>
<id>urn:sha1:398a0fdb38d9ab5e68023667cc5e6e3d109e2d6b</id>
<content type='text'>
commit 83e80a6e3543f37f74c8e48a5f305b054b65ce2a upstream.

Using rbtree for sorting groups by average fragment size is relatively
expensive (needs rbtree update on every block freeing or allocation) and
leads to wide spreading of allocations because selection of block group
is very sentitive both to changes in free space and amount of blocks
allocated. Furthermore selecting group with the best matching average
fragment size is not necessary anyway, even more so because the
variability of fragment sizes within a group is likely large so average
is not telling much. We just need a group with large enough average
fragment size so that we have high probability of finding large enough
free extent and we don't want average fragment size to be too big so
that we are likely to find free extent only somewhat larger than what we
need.

So instead of maintaing rbtree of groups sorted by fragment size keep
bins (lists) or groups where average fragment size is in the interval
[2^i, 2^(i+1)). This structure requires less updates on block allocation
/ freeing, generally avoids chaotic spreading of allocations into block
groups, and still is able to quickly (even faster that the rbtree)
provide a block group which is likely to have a suitably sized free
space extent.

This patch reduces number of block groups used when untarring archive
with medium sized files (size somewhat above 64k which is default
mballoc limit for avoiding locality group preallocation) to about half
and thus improves write speeds for eMMC flash significantly.

Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning")
CC: stable@kernel.org
Reported-and-tested-by: Stefan Wahren &lt;stefan.wahren@i2se.com&gt;
Tested-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
Link: https://lore.kernel.org/r/20220908092136.11770-5-jack@suse.cz
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ext4: use locality group preallocation for small closed files</title>
<updated>2022-09-28T09:32:28+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-09-08T09:21:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=52e8d671393cb45edfb644b750e60a8141230694'/>
<id>urn:sha1:52e8d671393cb45edfb644b750e60a8141230694</id>
<content type='text'>
commit a9f2a2931d0e197ab28c6007966053fdababd53f upstream.

Curently we don't use any preallocation when a file is already closed
when allocating blocks (from writeback code when converting delayed
allocation). However for small files, using locality group preallocation
is actually desirable as that is not specific to a particular file.
Rather it is a method to pack small files together to reduce
fragmentation and for that the fact the file is closed is actually even
stronger hint the file would benefit from packing. So change the logic
to allow locality group preallocation in this case.

Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning")
CC: stable@kernel.org
Reported-and-tested-by: Stefan Wahren &lt;stefan.wahren@i2se.com&gt;
Tested-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Reviewed-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
Link: https://lore.kernel.org/r/20220908092136.11770-4-jack@suse.cz
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ext4: avoid unnecessary spreading of allocations among groups</title>
<updated>2022-09-28T09:32:28+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-09-08T09:21:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=405a609430a6873c23e80b8700c620ce8e0912b9'/>
<id>urn:sha1:405a609430a6873c23e80b8700c620ce8e0912b9</id>
<content type='text'>
commit 1940265ede6683f6317cba0d428ce6505eaca944 upstream.

mb_set_largest_free_order() updates lists containing groups with largest
chunk of free space of given order. The way it updates it leads to
always moving the group to the tail of the list. Thus allocations
looking for free space of given order effectively end up cycling through
all groups (and due to initialization in last to first order). This
spreads allocations among block groups which reduces performance for
rotating disks or low-end flash media. Change
mb_set_largest_free_order() to only update lists if the order of the
largest free chunk in the group changed.

Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning")
CC: stable@kernel.org
Reported-and-tested-by: Stefan Wahren &lt;stefan.wahren@i2se.com&gt;
Tested-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Reviewed-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
Link: https://lore.kernel.org/r/20220908092136.11770-2-jack@suse.cz
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ext4: make mballoc try target group first even with mb_optimize_scan</title>
<updated>2022-09-28T09:32:28+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-09-08T09:21:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b82d312ff30f59ecfabbfd365899d79a9c48fc51'/>
<id>urn:sha1:b82d312ff30f59ecfabbfd365899d79a9c48fc51</id>
<content type='text'>
commit 4fca50d440cc5d4dc570ad5484cc0b70b381bc2a upstream.

One of the side-effects of mb_optimize_scan was that the optimized
functions to select next group to try were called even before we tried
the goal group. As a result we no longer allocate files close to
corresponding inodes as well as we don't try to expand currently
allocated extent in the same group. This results in reaim regression
with workfile.disk workload of upto 8% with many clients on my test
machine:

                     baseline               mb_optimize_scan
Hmean     disk-1       2114.16 (   0.00%)     2099.37 (  -0.70%)
Hmean     disk-41     87794.43 (   0.00%)    83787.47 *  -4.56%*
Hmean     disk-81    148170.73 (   0.00%)   135527.05 *  -8.53%*
Hmean     disk-121   177506.11 (   0.00%)   166284.93 *  -6.32%*
Hmean     disk-161   220951.51 (   0.00%)   207563.39 *  -6.06%*
Hmean     disk-201   208722.74 (   0.00%)   203235.59 (  -2.63%)
Hmean     disk-241   222051.60 (   0.00%)   217705.51 (  -1.96%)
Hmean     disk-281   252244.17 (   0.00%)   241132.72 *  -4.41%*
Hmean     disk-321   255844.84 (   0.00%)   245412.84 *  -4.08%*

Also this is causing huge regression (time increased by a factor of 5 or
so) when untarring archive with lots of small files on some eMMC storage
cards.

Fix the problem by making sure we try goal group first.

Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning")
CC: stable@kernel.org
Reported-and-tested-by: Stefan Wahren &lt;stefan.wahren@i2se.com&gt;
Tested-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Reviewed-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://lore.kernel.org/all/20220727105123.ckwrhbilzrxqpt24@quack3/
Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/r/20220908092136.11770-1-jack@suse.cz
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ext4: limit the number of retries after discarding preallocations blocks</title>
<updated>2022-09-28T09:32:27+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2022-09-01T22:03:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=17eb9845f20f901368f207f40eb6fc6d82f7abf4'/>
<id>urn:sha1:17eb9845f20f901368f207f40eb6fc6d82f7abf4</id>
<content type='text'>
commit 80fa46d6b9e7b1527bfd2197d75431fd9c382161 upstream.

This patch avoids threads live-locking for hours when a large number
threads are competing over the last few free extents as they blocks
getting added and removed from preallocation pools.  From our bug
reporter:

   A reliable way for triggering this has multiple writers
   continuously write() to files when the filesystem is full, while
   small amounts of space are freed (e.g. by truncating a large file
   -1MiB at a time). In the local filesystem, this can be done by
   simply not checking the return code of write (0) and/or the error
   (ENOSPACE) that is set. Over NFS with an async mount, even clients
   with proper error checking will behave this way since the linux NFS
   client implementation will not propagate the server errors [the
   write syscalls immediately return success] until the file handle is
   closed. This leads to a situation where NFS clients send a
   continuous stream of WRITE rpcs which result in ERRNOSPACE -- but
   since the client isn't seeing this, the stream of writes continues
   at maximum network speed.

   When some space does appear, multiple writers will all attempt to
   claim it for their current write. For NFS, we may see dozens to
   hundreds of threads that do this.

   The real-world scenario of this is database backup tooling (in
   particular, github.com/mdkent/percona-xtrabackup) which may write
   large files (&gt;1TiB) to NFS for safe keeping. Some temporary files
   are written, rewound, and read back -- all before closing the file
   handle (the temp file is actually unlinked, to trigger automatic
   deletion on close/crash.) An application like this operating on an
   async NFS mount will not see an error code until TiB have been
   written/read.

   The lockup was observed when running this database backup on large
   filesystems (64 TiB in this case) with a high number of block
   groups and no free space. Fragmentation is generally not a factor
   in this filesystem (~thousands of large files, mostly contiguous
   except for the parts written while the filesystem is at capacity.)

Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth &gt; 0</title>
<updated>2022-09-28T09:32:27+00:00</updated>
<author>
<name>Luís Henriques</name>
<email>lhenriques@suse.de</email>
</author>
<published>2022-08-22T09:42:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2f5e9de15e4f55fbf56f22d4a2ce406246cc462d'/>
<id>urn:sha1:2f5e9de15e4f55fbf56f22d4a2ce406246cc462d</id>
<content type='text'>
commit 29a5b8a137ac8eb410cc823653a29ac0e7b7e1b0 upstream.

When walking through an inode extents, the ext4_ext_binsearch_idx() function
assumes that the extent header has been previously validated.  However, there
are no checks that verify that the number of entries (eh-&gt;eh_entries) is
non-zero when depth is &gt; 0.  And this will lead to problems because the
EXT_FIRST_INDEX() and EXT_LAST_INDEX() will return garbage and result in this:

[  135.245946] ------------[ cut here ]------------
[  135.247579] kernel BUG at fs/ext4/extents.c:2258!
[  135.249045] invalid opcode: 0000 [#1] PREEMPT SMP
[  135.250320] CPU: 2 PID: 238 Comm: tmp118 Not tainted 5.19.0-rc8+ #4
[  135.252067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[  135.255065] RIP: 0010:ext4_ext_map_blocks+0xc20/0xcb0
[  135.256475] Code:
[  135.261433] RSP: 0018:ffffc900005939f8 EFLAGS: 00010246
[  135.262847] RAX: 0000000000000024 RBX: ffffc90000593b70 RCX: 0000000000000023
[  135.264765] RDX: ffff8880038e5f10 RSI: 0000000000000003 RDI: ffff8880046e922c
[  135.266670] RBP: ffff8880046e9348 R08: 0000000000000001 R09: ffff888002ca580c
[  135.268576] R10: 0000000000002602 R11: 0000000000000000 R12: 0000000000000024
[  135.270477] R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000
[  135.272394] FS:  00007fdabdc56740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[  135.274510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.276075] CR2: 00007ffc26bd4f00 CR3: 0000000006261004 CR4: 0000000000170ea0
[  135.277952] Call Trace:
[  135.278635]  &lt;TASK&gt;
[  135.279247]  ? preempt_count_add+0x6d/0xa0
[  135.280358]  ? percpu_counter_add_batch+0x55/0xb0
[  135.281612]  ? _raw_read_unlock+0x18/0x30
[  135.282704]  ext4_map_blocks+0x294/0x5a0
[  135.283745]  ? xa_load+0x6f/0xa0
[  135.284562]  ext4_mpage_readpages+0x3d6/0x770
[  135.285646]  read_pages+0x67/0x1d0
[  135.286492]  ? folio_add_lru+0x51/0x80
[  135.287441]  page_cache_ra_unbounded+0x124/0x170
[  135.288510]  filemap_get_pages+0x23d/0x5a0
[  135.289457]  ? path_openat+0xa72/0xdd0
[  135.290332]  filemap_read+0xbf/0x300
[  135.291158]  ? _raw_spin_lock_irqsave+0x17/0x40
[  135.292192]  new_sync_read+0x103/0x170
[  135.293014]  vfs_read+0x15d/0x180
[  135.293745]  ksys_read+0xa1/0xe0
[  135.294461]  do_syscall_64+0x3c/0x80
[  135.295284]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

This patch simply adds an extra check in __ext4_ext_check(), verifying that
eh_entries is not 0 when eh_depth is &gt; 0.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215941
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216283
Cc: Baokun Li &lt;libaokun1@huawei.com&gt;
Cc: stable@kernel.org
Signed-off-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Baokun Li &lt;libaokun1@huawei.com&gt;
Link: https://lore.kernel.org/r/20220822094235.2690-1-lhenriques@suse.de
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
